The magazine of the Melbourne PC User Group

On Hyphenation - Anarchy of Pedantry
Major Keary

Some newspapers are no-
torious for hyphenation like, 
"It was no coin-
cidence that ear-
lier he of-
fered to show her is dra-
wings, and "'the-
rapists who pre-
ached on wee-
knights may become the subject of leg-
ends" .

It is said that George Bernard Shaw would examine galley proofs of his work and recast sentences, or even whole pages, in order to avoid unsightly word breaks, excessive white space caused by justification, and other typographical difficulties. Of course, he was published at a time when typesetters were sent to the block for committing the abominations above.

Stephen Murray-Smith in his Right Words tells us that, "hyphens are used for many purposes, apart from the very useful one of making peoples names sound grander". One function is to join two or more words together in order to make a compound word, such as mother-in-law and owner-builder. Another function is to avoid potentially misleading combinations of letters in a word, such as co-operate. Some words with identical spelling take a hyphen in order to identify meaning, such as re-form (form again) and reform (something promised by politicians).

Ambiguity can be resolved by joining words, such as more-experienced as against more experienced. The Australian Government Publishing Service (AGPS) Style Manual provides an illustration: "We need more-experienced staff (= staff who are more experienced, not more staff who are experienced)". The need for such constructions should, of course, be avoided by recasting the sentence where practical.

Hyphens are also used to break words at the end of a line. Unhyphenated justified text can result in unsightly gaps between words, or even stretched spacing between characters within words, while ragged right text without hyphenation can produce excessive differences in line length. The sensible use of word division usually produces a more pleasing appearance with minimum interruption to the reader. Sensible is the operative word.

Any respectable word processing package includes a hyphenation facility. Those based on an algorithm, also called logic systems, often break words incorrectly. Part of the problem is that two sets of rules exist; American and British Within each of those systems there are so many exceptions as to make an error free algorithm virtually impossible.

Some Testing Words

To test a word processing program's hyphenation feature try these two words: photograph and photographer. The correct divisions are photo/graph and pho/togra/pher. Reduce the default to the minimum (default settings for various programs are mentioned below). Type in each of the words and push them to the right margin in order to force hyphenation. WordStar 5.5 does recognise the difference, but will divide photographer thus: pho/tog/ra/pher.

Another test is coincidence which is divided, co/inci/dence. One of the prime rules of hyphenation is that the first part of the word should be recognisable before the reader's eye moves to the second part of the word on the next line. To break coincidence thus: coin/cidence is wrong because the first part, coin, is misleading. WordStar 5.5 failed that test, its algorithm blindly following the break-between-consonants rule. Coinage is another, some programs breaking it thus: co/inage.

The AGPS Style Manual sets out basic rules for word division and refers one to Hart's Rules and Collins Gem Dictionary of Spelling and Word Division. The Collins dictionary is pocket-sized and costs about $6.00; it contains some 60,000 words with hyphenation points marked. Oxford now publishes a dictionary similar in size, content, format, and price.

There is also an Australian spelling guide, with hyphenation, published by Macquarie: The Macquarie Spelling Guide has 127,00 entries and costs about $5.00.

American and British Usage

American English relies on the sound of a word to determine where it should be broken. The English first turn to etymology, and then to sound. That may seem to be a very minor difference, but there can be marked variations in pronunciation which produce quite different points of division. Progress is an example. In American English the first syllable rhymes with frog and is the accented part of the word. Progress as a noun is pronounced in the U.K. and Australia with equal emphasis on both syllables and the first rhymes with throw. The result is that prog/ress is the correct U.S. point of division, but pro/geess is correct British usage. One American style manual quotes pro-gress as an example of wrong and confusing hyphenation. Variations in spelling can also affect division.

American rules are pretty simple: refer to Webster's Dictionary where each word listed has hyphenation points marked. There are some which are hard to follow, such as o/rigi/nal/ly which permits the word's first letter to hang at the end of a line. There are other peculiarities, such as ap/pearance, where a large chunk of the word is required to remain intact. That situation may have been changed since the edition I am using.

An Anarchist Plot?

Hyphenation does not lend itself to any set of unequivocal rules. Indeed, the many exceptions and disagreements suggest it is all something dreamed up at an anarchists' convention.

For example atmosphere should be broken thus, atmo/sphere, according to British rules. But it is not as simple as that. Collins Gem Dictionary of Spelling & Word Division has, since it was first published in 1968, been used as a guide to correct hyphenation breaks and is cited as a principal reference by the AGPS Style Manual. Collins gives atmos/phere which is contrary to the UK authority, Hart's Rules. The Oxford Minidictionary of Spelling and Word-division agrees with Hares, but The Macquarie Spelling Guide follows Collins.

They cannot even agree on the word, hyphenation! Hy/phen/ation in Collins and Macquarie, but hyphena/tion in Oxford.

There is no consistency in the way those authorities line up with each other. Hart's gives some divisions which, regardless of any rules, are obligatory by convention. One is cele/brate; Macquarie agrees, but Oxford gives celeb/rate, while Collins shows cel/ebrate. Oxford, incidentally, makes no distinction between photograph and photographer, which may be a typographical error.

Another of the special cases is corre/spon/dence (Hart's), but cor/res/pond/ence (Oxford and Macquarie), and co/re/spone/dence (Collins).

Problems of Automation

Early word processing software relied on algorithms for word division, which resulted in hyphenation such as coin-cidence or co-inage. In both those cases the reader is misled by the position of the break. They are not hypothetical examples, but were thrown up by recent versions of leading WP programs. Current packages behave better, but require a lot more house training.

WordStar still relies solely on an algorithm which produces such oddities as, ex-acting (a former acting minister?), coin-cidence, read-just, and leg-ends. It also threw up reap-ear, which suggests the programmer might have been thinking of Mimizuka in Kyoto, a mound of some 40,000 pickled ears and noses of invading enemies from a war in the 1590's.

The problem is one of balance between serving the perceived practical requirements of users and achieving the perfect 10. How, one might wonder, is any automated system to know when record should be divided as re/cord (when it is a verb) and rec/ord (when it is a noun)? Collins and Oxford make the distinction, but Macquarie does not (perhaps a typographical error).

There are two ways of handling hyphenation in a computer program. By way of an algorithm and by dictionary. The dictionary system is like a spelling checker; a coded list of words is held and compared with text as it is entered. The logic system takes less space, but there are so many exceptions to the rules that some wrong hyphenation is inevitable.

Some software employs both methods, first looking up a dictionary and reverting to an algorithm when the word is not found. WordPerfect (and, it is understood, MS Word will soon follow) uses the Macquarie Dictionary for spelling checks; it would be logical to employ Macquarie's Spelling Guide as their hyphenation dictionary. It seems sensible to have dual purpose spelling dictionaries coded for word division - there may be such systems either in use or being developed, but most software publishers include very little about hyphenation criteria in their documentation.

Program Defaults

In most cases hyphenation defaults can be set by the user, but the degree and nature of user-control is variable. WordStar decides whether or not to divide a word according to the number of letters in it; the installed default is six, which is in accord with the AGPS Style Manual guidelines.

WordPerfect differs, offering two values: a percentage of any given word which can be left on the first line, and a percentage which can be carried over to the next line.

Microsoft Word has yet another approach: a hot zone can be set. The default is .25 inches. Presumably the user has to calculate how much space is occupied by a given number of characters in a particular font. The hot zone applies to only that part of a word which will be left on the first line. 

Ventura is, as one would expect, more sophisticated. It enables control over the minimum number of letters left on the first line and the minimum number which can be carried over. The default is two and three respectively, which is a generally accepted standard. 

Ventura users can also decide if hyphenation should occur on two or more successive lines, a situation normally undesirable but which may be necessary where there is narrow measure. Ventura's hyphenation dictionary requires a megabyte of hard disk and can be set for U.K. or U.S. usage as well as for other European languages. A supplementary dictionary of word divisions, which takes precedence over entries in the main dictionary, can be created.

A Pedants' Picnic?

The WordPerfect percentage system does not really help towards observing the convention that two letters can be left on the first line and not less that three carried over to the next. For example, co-incidence leaves two letters on the first line, and coin-age carries three over. However, how does one arrive at an appropriate percentage? In the first example a setting of 18% is required to leave two letters on the first line. In the second example 48% is required to carry three letters over. Now, there are three possible points of division in coincidence: co/in/cid/ence. The 18% and 48% setting would deny the last break, -ence, because 48% represents five letters.

"Pedantic nit-picking", you say? Well, give some thought to what happens when words such as dihydroxyphenylalanine occur. To accommodate the first (dt-) and last (-nine) points of division the settings would have to be 9% and 18% in WordPerfect. But how is that going to affect other words in the same text file?

The point is that if there is a need to employ such a word, the user is likely to want it divided correctly. The solution to such problems is to look up a hyphenation dictionary and manually insert soft hyphens.

There is no agreement between authorities on the minimum number of letters which can be left on the first line or carried over to the second line. Hart's makes no suggestion; Macquarie advocates a minimum of three letters left on the first line or carried over to the next (but it lists numerous words divided so as to carry two letters over); Word opts for a minimum of two letters left on the first line and three carried over to the next, which is the Ventura default.

The Commandments

The commandments of word division, like another familiar set, are largely negative:

Thou shalt not divide words of one syllable. 
Thou shalt not divide words less than six letters.
Thou shalt not break words of two syllables unless required by narrow measure. 
Thou shalt not separate vowels which are part of one syllable (e.g., trea/sure, not tre/asure). 
Thou shalt not leave a hyphen on the last line of a right-hand page. 
Thou shalt not have hyphens as endings on two consecutive lines. 
Thou shalt not take less than three letters of a word to the next line. 
Try to have a consonant at the beginning of the second part of the word, but not if it would mislead (e.g., mark/ing, not mar/king and le/gends, not leg/ends). 
Divide between consonants (but not in cases such as, pass/ing
Divide words with three adjoining vowels according to sound (e.g. cre/ator and crea/ture).

There are also commonsense rules, such as:

If a word already includes a hyphen, then divide it at the hyphen (e.g., attorney-general. and words such as re-cover (provide a new cover)). 
Divide compound words between the component parts (e.g., bare/foot). 
Keep together letters which form one sound (e.g. beauti/ful). 
Divide after a prefix (e.g., co/belligerent and proto/type).

The Correct Way

There is no correct way per sť. There are acceptable ways of dividing words depending on the purpose of the writing. Formal or academic work calls for closer attention to the division of words than, say, casual wilting.

Where long or complex words are being used, it is wise to ensure hyphenation is consistent and avoids possible confusion to the reader. Consistency is important, whether to do with punctuation, hyphenation, or whatever.

Not many users are likely to be confronted with the need to use a term like pneumonoultramicroscopicsilicovolcanoconiosis, the longest word listed in non-specialist English dictionaries. It appears in the latest edition of the Oxford English Dictionary' and means a respiratory condition caused by very fine mineral particles, such as might be generated by an erupting volcano.

The Liang algorithms is used in the TeX system as a back-up to an extensive hyphenation dictionary; Donald Knuth in his excellent The Texbook uses that word as an example of how the Liang algorithm copes. It produced this: pneu/monoul/tra/mi/cro/scop/tc/sil/i/co/vol/canoco/nio/sis.

One may be excused for having difficulty seeing the word for the letters, but it illustrates the effect of automatic application of two rules: divide between consonants, and having a consonant as the first letter of the second part of the divided word. Ordinary users should have no trouble in handling such big words by applying the commonsense rules of dividing between component parts and applying does-it-sound-right test:
peumon(o)/ultra/micro/scopic/silico/volcano/coniosis. There is room for further points of division, but the sensible way to break such a monster is between recognisable component parts which is much easier on the reader.

If you want to send your WP or DTP program into fit, try pnewreonultra .... on its hyphenation system.

Simply writing without any hyphenation may be a workable alternative for most of the time in ragged right text - a format becoming more widely used, particularly in user manuals. However, if an unusually long word is encountered there may be no alternative but to break it. A Commonsense Approach

The essential rule of hyphenation is to avoid interfering with the flow. A reader should be led from the first to the second part of a divided word without any disturbance of concentration caused by an awkward break. In other words, the initial part of a divided word should be recognisable before the reader's eye moves to the next line.

Thus, going back to the example of hyphenation, the technically correct way of dividing the word is to carry over -tion. But it is more sensible to make the break after hyphen, which provides a whole word at the end of one line and prepares the reader for what is carried over to the next,

The AGPS Style Manual sums it up thus: "The division of words should follow certain rules of sound and sense. In general, if the proposed division sounds right when spoken aloud, it is probably acceptable. As far as possible, the part of the word before the hyphen at the end of the line should suggest the remainder of the word so that the reader's thought is carried on logically."

In other words, it is important not to break a word in such a way as to mislead the reader. Leg-ends instead of le-gends, the-rapist instead of therap-ist, being examples. Macquarie has therap-ist, which might cause the reader to stumble, or reach for the bottle and get sloshed with thera.

If work is being prepared for an audience which uses American English, then hyphenate accordingly. Use Webster as the authority, but beware of the occasional hanging single letter, as in o/rigi/nal/ly.

Choose a word division dictionary and stick to it for any doubtful cases. Do not rely on hyphenation systems which form part of word processing packages. That is not to say they shouldn't be used at all; just keep an eye out for errors, particularly where narrow measure is necessary. Ventura provides for avoiding hyphens at the end of successive lines, but WP packages (at least, the ones I have seen) lack that degree of sophistication. According to most style authorities there should not be more than two successive lines ending with hyphens unless narrow measure is necessary.

The reason for automation s problems is that the rules were laid down for human typesetters who understood when to bend, ignore, or otherwise modify the commandments. They kept in their minds a list of traditional divisions which do not conform to the rules. To them the readers interests were paramount.

The Hart Conventions

According to Hart's Rules the word divisions in the table at the left "are obligatory".

The table shows where Ventura, Macquarie, Oxford, and Collins agree or disagree with Hart. Ventura has two disagreements, Macquade and Oxford have four each, and Collins has three. The comparison is made to illustrate the wide variation between authorities. American usage, as given by Webster, is also shown when it varies from Hart.

Esoteric Nonsense?

Hyphenation is neither anarchy nor the sole province of pedants and pedagogues. Used in moderation it can make a printed page more visually pleasing. If used indiscriminately it can have the opposite effect, either putting the reader off or causing unnecessary distraction. 

If the intended audience is bound to read the work (a user manual, for example) poor hyphenation practice may not matter. If the author wants to attract and hold an audience, then hyphenation needs just as careful attention as any other aspect of presentation. 

The best examples of how not to hyphenate can be seen in the tabloid press on any day, or in some computer magazines when they work in unnecessarily narrow columns. 

Remember that authorities cant agree, so your opinion is as good as any - but there is no excuse for leg-ends, ex-acting, and the like. If someone else is typesetting your work and you care, then proof read for poor hyphenation as well as typos dropped lines, etc. Use soft, or provisional, hyphens in any unusual or exceptionally long words in order to maintain control. That, of course, does not always help if the typesetting or DTI' system does not understand the soft hyphen codes generated by the originating software.

Reprinted from the December 1991 issue of PC Update, the magazine of Melbourne PC User Group, Australia

[About Melbourne PC User Group]