The magazine of the Melbourne PC User Group

Will the Real ASCII Please Stand Up?
Major Keary

In English-speaking countries it is often assumed that all computer users are Anglophone or, at the very least, employ a Latin alphabet of twenty-six characters, each with upper and lower case forms, and which conforms to something called 'ASCII'. Latin, incidentally, is used in preference to Roman so as to avoid confusion with the typo-graphical term, roman.

For better or worse ASCII has entered the lexicon of computer users and writers to mean 'a table for any 256-character set'. Manuals for software, hardware, and peripherals frequently show the IBM PC set (also described as the 'IBM extended character set) with so-called ASCII codes.

In particular, positions 1 - 31 often display graphic characters, whereas those positions are allocated to various non-character codes in true ASCII. The use of the term in its broadest sense is a convenient adaptation, but one should be aware of that ASCII has a specific and quite narrow meaning. Indeed, original ASCII is not used by manufacturers of computers and peripherals.

ASCII, either in the specific or loose sense, is not engraved on a silicon tablet carried down from Mount Rom by a latter day Moses. Some time after ASCII was introduced it was adopted by the American National Standards Institute (ANSI) as ANSI X3.4 which was in turn adopted as the basis for ISO 6461 which provides for 128 code positions.

Like ANSI X3.4, it is a seven-bit code set, but ISO 646 varies from ANSI X3.4 in that the tilde (~) is replaced by a macron (¯) and the hash mark (#) by the pound sign (£). Furthermore, there are versions of ISO 646.

Numerous national standards exist, including AS 17762, many of which are variations of ISO 646. Indeed, ISO 646 is anything but standard, having provision for twelve characters to be substituted by others according to particular national needs. The standard contains the remarkable observation that "Code sets obtained by modifying the standard... or by other replacements are non-standard" something which AS 1776 also goes to the trouble of telling us. 

The twelve variable characters are: #, $, @, [, \, ], ^, ', {, }, |, and ~

Now, that is all very well, but there are at least thirty-two different standards. Australia, New Zealand, and a number of other countries substitute a macron, (¯) for the tilde , (~-), at position 1263. Switzerland and the UK each have no less than three versions, and Yugoslavia has four. 

The twelve variable symbols would arrive in Canada, for example, as: #, $, à, â, ç, ê, î, ô, é, ù, è, and û unless the International Reference Version (IRV) of ISO 646 is used. In that case '$' comes out as a currency sign, 'a'.

Most printers now provide alternate international character sets which contain variants used by some countries. Epson, for example, provides twelve selectable sets and their manuals include a table showing the variable characters.

However, printer and computer manufacturers do not always conform to national standards. For example, the French standard (NZF 62-101) provides for '£' at position 35, but printer manufacturers seem to use '#' instead.

The Birth of ASCII

The primary reason for ASCII was teletype transmission, but between conception and birth events had already overtaken it. ASCII, an acronym for Standard Code for Information Interchange, was conceived as a replacement of Baudot code for teletype applications.

Baud, a term familiar to - and generally misused by - computer users, comes from the name of Jean-Emile-Maurice Baudot whose Baudot code was the standard for teletype printers until computer technology created a quantum leap in the volume of data generated by business machines.

The first internationally accepted standard for transmitting, or transferring, data electronically was the Morse Code. Morse's invention of 1838, designed for hard-wire communications, was adapted to wireless telegraphy as well as visual signalling and is still in use.

The growth of telegraphic traffic required something more efficient and teletype machines were developed. Baudot code was introduced in the 20s and remained the standard until replaced by ASCII in 1966.

In its original implementation ASCII had seven levels as against Baudot's five (the term 'level', for all practical purposes, equates with 'bit'). Baudot had 32 (25) available codes as against 128 (27) for ASCII, which is why old telegrams, cables, and teleprinter messages used only upper case letters. Without diverting into an explanation of its mechanics, the Baudot code was designed for use with mechanical equipment able to switch (like the effect of a shift key) between two sets. The result was an available sixty-four codes, but still not enough for upper and lower case characters, numerals, and punctuation marks.

ASCII requires seven bits for definition, but the standard includes an eighth bit which was used to provide an even parity check. Thus, the number of bits could always be made an even number by either leaving the first bit as zero or setting it to one.

Ordinary folk may well ponder on why the ANSI people opted for a seven-bit code when utilisation of eight bits would have doubled the number of available characters from 128 to 256. The reason, according to the records of ASA4 Subcommittee X3.2, American Standard Code for Information Interchange, they decided in 1963 that, "This coded character set is to facilitate the general interchange of information among information processing systems, communication systems, and associated equipment .... An 8-bit set was considered but the need for more than 128 codes in general applications was not yet evident."

The need for a common code was essential if computer generated data was to be portable between different operating systems and ANSI persuaded manufacturers other than IBM to co-operate in the development of ASCII as for computers and peripherals. It is not clear if IBM was excluded or simply declined. If they did decline one can understand why: it is commercially impossible to wait through a standard's gestation period, which might be several years, for something which is needed now. IBM was using an eight-bit character set well before seven-bit ASCII was established.

Indeed, eight-bit, extended ASCII, code sets were already in use before ISO 646 was published.

Teleprinter Tape

Just in case you take to peering at teleprinter tape, the holes represent seven-bit binary equivalents and can vary by one digit from the eight bit numbers used in computer applications. For example, in the ASCII table familiar to computer users 101001102 (binary) = A616 (hex) = 16610 (decimal) ='&'. The teletype punch pattern for '&' (where 0 = no hole and 1 = hole punched) is 10100110. The first hole is used for even parity check, being punched if the holes otherwise add up to an odd number. In this example the leading ' 1' in the teletype example is in fact the hole punched to make the number of bits even and represents the most significant bit (MSB).

In computer usage the eight-bit binary numbers representing the first 128 codes all have '0' (zero) as the MSB or first digit, starting at 000000002 and finishing at 011111112. In order to cater for older hardware which can transmit characters using only seven bits, many printers provide codes which enable the MSB to be set to 1 or 0.

The Mysterious 0 - 31 

ASCII codes 0 - 31 retain mysterious and, to computer users, apparently useless codes like SOH and ETX. They represent mechanical movements and information necessary to telegraphic transmissions. A number have been retained for their original purpose (CR, LF, BS, etc.) and are sometimes described as Format Effecters.

True seven-bit ASCII, as defined in various national standards, classifies codes 0 - 31 as Format Effecters (FE), Information Separators (IS), Device Controls (DC), and Transmission Control Characters (TC). ACK, DLE, ENQ, EOT, ETB, ETX, SOH, STX, and SYN are all transmission control characters.

Format Effecters are intended for control of layout and positioning of information, either on a printer or VDU, and are the principal standard control codes used in computer applications.

ASCII-8

While the parity error detection system was quite adequate for filtered lines and slow transmission speeds, use of telephone lines for data transmission posed problems. Noise could contaminate data sufficiently to make parity error detection pointless. Data generated by computers is now generally transmitted in blocks with built-in error detection and rectification. A certain error level is tolerated until the find-and-repair process decides the rate is too high and aborts.

The eighth bit thus became spare5 enabling extension to 256 (28) codes. IBM embraced ASCII-8 (as the eight-bit set is sometimes called) when its first PCs were introduced. The IBM extended character set became a de facto standard now supported by most printer manufacturers. However, usage covering 128 onwards is still chaotic.

ASCII Chaos

For example, code 161 produces À in Hewlett-Packard Roman 8, í in Ventura and IBM PC, and in ISO 8859 and IBM Code Page loo4. Code 197 produces a in HP, an em dash in Ventura, a graphic symbol in IBM PC, and A in iso 8859 and IBM Code Page 1004. Â is available in Ventura at 183, at 225 in HP, at 195 in lso 8859 and IBM ioo4, and not at all in IBM PC or IBM Code Page 850 (Multilingual). To make things even more complicated, Lotus has its own character set.

It is a source of great frustration for anyone who has to use the Latin alphabet for transliterating languages written in some other form. For example, Sanskrit where there is an accepted system of transliteration which observes particular usages of diacritics. A similar situation occurs where a language has no native written form and requires special character-diacritic combinations to reflect pronunciation. That introduces a serious impediment to portability of files, especially when the author is not aware of the software or hardware which may be used for output. Not knowing what software was used to prepare an on-disk manuscript creates a problem for typesetters and printers. Just getting the name of our friend, Jean-Emile-Maurice Baudot, to come out right requires some thought.

So, when when you hear someone spouting 'ASCII', or come across a so-called ASCII code set in some manual or reference book, keep in mind that there is a real ASCII, even though it may be of only academic interest to computer users.


1 ISO stands for International Organization for Standardization; how the initials acquired their present order is not known. The organisations standards traditionally carry the prefix, 'ISO'.

2 AS stands for Australian Standard, Australian standards are published by the Standards Association of Australia (which calls itself Standards Australia), but which used to be known as the Australian Standards Association - which seems to explain the use of AS and sometimes 'ASA
' as a prefix.

3 Code positions can be identified in various ways; decimal numbers are used to define them here. 

4 In this case, American Standards Association.


5 An article, Unicode: ,Beyond Ascii in the July 1981 issue of Byte says, "The ISO later added a 1-bit extension to ISO 646 ... (to create a code) known as Latin 1 ... ". Eight-bit character sets were well and truly to use even be6oce ISO 646 was fist published as a standard. The ISO established more than one eight-bit standard in response to what was already happening. Latin 1 is one of five Latin sets included in ISO 8859.

Reprinted from the November 1992 issue of PC Update, the magazine of Melbourne PC User Group, Australia

[About Melbourne PC User Group]