The Great Curly Brace Trace Chase

Computer History Vignettes

By Bob Bemer

On my site is a pointer to an interesting page by Jukka Korpela of Finland [6], called "Character histories: Notes on some ASCII code positions". It tells how various interesting characters (then new to many computer character sets of the time) came into membership in the ASCII set.

Such research is difficult now, in an 8-bit byte, ASCII code, and soft-copy screen world, where typewriters, codes of different length, and computer word lengths have been pretty much forgotten. And their documentation is mostly in hardcopy libraries; not on the Web.

This vignette came about because I started to wonder:

If the curly braces exist in ASCII because of my efforts and examples, and/or
If I had been first to put curly braces, via IBM's Stretch, into the internal character set of any computer.

Here the only words subject to misinterpretation are "internal character set".
Webopedia helps us out by giving this definition:

"character set" -- "a defined list of characters recognized by the computer hardware and software".

That is not "hardware and/or software", but "hardware and software". Both. And the character set must have the same content, and size in bits, for both. For (1) above, remember that ASCII is (ASCI)Interchange.

These definitions controlled my search. The character set or repertoire of computer hardware is not at all necessarily the same as that of input-output equipment that can be used. Conversations on the Web are bloated with instances of combinations of standard characters that are used to indicate a character, of the character set of the computer used, that is not enterable by keyboard. Typical are those of the C language, where "\e" means the single escape character. Granted, that character itself is recognized by the hardware, but hordes of programmers have interrupted the scan code it generates on the keyboard to provide functions not those of the character itself.

Ah. Scan codes. Remember that keys on today's keyboard do not emit ASCII codes; they emit "scan codes", which are converted to ASCII according to the character layout of your keyboard. That is how the French maintain the "azerty", whereas the US has "qwerty" left to right, upper left, on its keyboards. Ditto for Cyrillic keyboards and such.

And seldom do adjacent tracks on magnetic tape match the adjacency of the bits of a character as stored internally as the code of the CPU itself. But there must be a universal 1-to-1 mapping.

Among other aberrations to consider is the possibility of encoding typewriter-like devices so that, although both upper and lower case alphabetic letters would print differently on paper, they would be encoded identically, and thus pass into the computer CPU as the same character.

Stretch

IBM's 7030 (Stretch) was thought to be the first production computer to have a byte size greater than 6 (in fact 8 bits), allowing lower case alphabet and other useful characters to be accommodated. The timing was:

1956 Sep 16-22 -- Large group of IBM people to Los Alamos (including Bemer from NYC Headquarters) (Note 1).
1956 Dec -- Dunwell paper "Design Objectives for the IBM Stretch Computer", EJCC paper
1959 Jan -- First two 7030s began assembly in Poughkeepsie (Note 2).
1960 Jan -- First publication of character set to outsiders. [2]
1961 Apr 16 -- First Stretch delivered (to Los Alamos)

Note 1. The travelog on my website, on the Lockheed-IBM page, shows that as my only visit to Los Alamos during that period, annotated "Stretch Planning". Be assured that the only bit of Stretch planning that I did was the character set. And perhaps advise a bit on software. The 8-bit byte decision had already been made when I got there, and I did say a strong Amen to that!

Note 2. For this to happen, especially for the most powerful machine built to that time, and certainly provided with FORTRAN, the character set would have to have been completely fixed at least 8 months prior to starting to build the CPU. I'm willing to estimate that increment to be cut to 4 months for any other computer, small, with an internal code containing any of [ ] { } \ . (Stretch had no escape character --
I didn't think of it until 1959 October).

Eric Fischer concluded from somewhere that the Stretch set dated from 1959 November, while my formula (above) gives 1958 April. We'll see if any competitors approach this date. The official character set publication [2], has its initial reference as [1], of 1959 September.

My investigation seemed like a good story, so follow along. First let's look at the ASCII characters that are known unequivocally to be due to me:

ESCAPE

See [3], published in 1960 Feb, but made known to workers in the coding field by 1959 Oct.

Four Information Separators

US (Unit Separator), RS (Record Separator), GS (Group Separator), and FS (File Separator) are the only 4 remaining of the 8 information separators (ISi) (or data delimiters) that I put in in 1961 September. As I said in my Interface Age articles on ASCII, I got the idea from the Word Mark in the character-based IBM 1401.

Backslash

We may take it as absolute proof of genesis, from the early limited character sets, that IBM's STRETCH was the first computer to use the backslash character, for Reference [4] was actually published a couple of years after the design of that machine. And that uniqueness continued. [4] shows no 6-bit set with the backslash as a working character.

Here is an excerpt from both [15] and [5] (the latter is the source for Korpela's paper):

"I had called a joint meeting of IBM, SHARE, and GUIDE, to regularize the IBM 6-bit set to become the standard BCD Interchange Code ... Frequency studies of symbol occurrence had been prepared, particularly from ALGOL programs. The meeting of 1961 July 6 produced general agreement on a basic 60-64-character set, which included the two square brackets and the reverse slant, which was chosen in conjunction with "/" to yield 2-character representations for the AND and OR of early ALGOL. This is reflected in the set I proposed to ANSI X3.2 on 1961 September 18."
(Note: I had put the backslash in position 5/15. It enabled the ALGOL "and" to be "/\" and the "or" to be "\/".)

SHARE and GUIDE representatives at the meeting were a little stubborn about accepting my proposed backslash, so I asked for a character more important to have. After much discussion they could not agree on a better candidate.

"At the 61 November 8-10 meeting, X3.2 constructed the first formal proposal, X3.2\1 ..." (which, much modified, was to become ASCII)
(Note: In this proposal the backslash was moved to position 5/12, and there it has remained ever since.)

Square Brackets

Having documented the introduction of square brackets to ASCII, careful research must be done to check for previous existence in any computer set. Here the backslash played a very discriminant role.

[7], [8] and [9] have very good information, but all have a basic flaw. The specific computers for which the coded sets are given are not identified to more than the manufacturer (in some cases not even that), nor are the years of their introduction.

Their author, Dik Winter, admits this and says he will try harder, but much documentation is lost, and it was characteristic of the times that nobody seemed to think character sets a very important feature of computers.

In [7], CD display, General Electric Internal, NCR, and Bull Scientific Internal all show both the brackets and the backslash, which would be a remarkable coincidence if they existed prior to ASCII. CDC Display Scientific shows brackets only, but Winter now says it could only be the 3600, introduced 1963 (too late). BCL Internal shows brackets only, but I am unable to find such a manufacturer in my list of nearly 3000 different computers to date.

In [8], [ and ] were shown for the RPC 4000 (too late at 1960 Nov), and a "MC" Flexowriter (I have to assume that Winter, of the Netherlands, speaking of ALGOL 60, means the computers built by the Mathematical Centre (1962 and later), or the Electrologica X-1 of 1960.

[9] shows two mag tape codes with [\], but they're copycats. One is EBCDIC.

The LGP-30 and Square Brackets

Both [4] and [9] showed the LGP-30 with square brackets, as in a 7-bit set, so it had to be checked. The Web gave very few hits for this 1956 machine. Fortunately one was the museum of Dr. Tim Bergin's home base. He kindly faxed me a few pages from a training manual they had on exhibit. Ref. [14] specified a 32-bit word length, fit only for 8-bit bytes. If so, way ahead of its time.

6-bit encodings, each assigned to two different characters, match what I had in [4], and include key or paper tape shifts to upper case and lower case. Perhaps each shift condition added a 7th bit, selecting between the 6-bit combination assigned to the pairs of characters. The pairs themselves look like the typewriter key pairs. Perhaps the eighth bit was immaterial, or perhaps a parity bit.

But how would 7-bit encodings, formed from 6 input bits augmented by a shift bit (did the CAPS LOCK submit it anew for successive upper case?) drive the Flexowriter on output?

For now I am prepared to cede that the LGP-30 might have been the first and only other computer to have the square brackets in its character set. Until the Stretch computer came along, that is.

Others

Relative to ASCII, I had little to do with introducing the tilde, accent acute, accent grave, less than, greater than, or the standard typewriter special characters of that era, even though I used all except the accents for Stretch. Plus the vertical bar, which went into ASCII at the same time as the curly braces (but many computers had them previously).

So no claims are made.

Placement

I did get X3 to agree to move the alphabets down one position, reserving the three positions after z and Z for international usage in ASCII-alternate sets. I learned that by examining the Copenhagen telephone book while there in 1963 Sep, finding that the names starting with accented vowels followed the regular alphabet of 26.

Curly Braces

Now we get to the mystery. My search started with:

Date: Mon, 3 Dec 2001 14:22:49 -0600 (CST)
From: Eric Fischer
To: bob@bobbemer.com
> Bemer:
> I'm adding a question. Do you have source knowledge about the curly
> braces? I put them in the Stretch set, and now I cannot find any
> previous computer character repertoire that had them. I scarcely
> believe that I was the first to use them, being so common in literature,
> but incorporation in computer sets is a different thing.
Eric:
I remember that you also included them in the 256-character card code that you published in CACM, because I found it interesting that you proposed that they could be used in place of the Algol 'begin' and 'end' keywords, which is exactly how they were later used in the C language.

The next day he added:

Date: Tue, 4 Dec 2001
From: Eric Fischer
To: bob@bobbemer.com
It looks like the Lincoln Writer for the TX-0 at MIT Lincoln Lab probably had the curly braces before you did. Unfortunately the earliest listing of its characters I've seen is from January, 1960, but the letters from its designers that are among the papers you donated to the Smithsonian give the impression that the machine had been designed and the first one was already being built in August, 1958. (I know there was a letter or article in CACM about it at some point too, but I don't have a copy and don't know the date, and I don't think it included the character list.) (Note: It did, but it was for the TX-2, which became operative in 1959.)
That's the only computer code I know of that seems to have included the curly braces before your Proposal.

The only curly braces found in Ref. [4] were indeed for Stretch and the Lincoln Writer, and they appear nowhere else in character sets prior to 1962, at least. But a recently-obtained copy of [16] indicates that linking this device with the TX-0 was erroneous. Vanderburgh states succinctly that "The Lincoln Keyboard was designed for use on TX-2 ... both for preparation of programs on punched paper tape and for direct console communication in program language".

But here the matter gets obscure. I know that Stretch (in [4]) had two lines of characters to accommodate encoding for more than 64 characters that actually were manipulated by the hardware. Thus the "Stretch character set.

And the Lincoln Writer had two lines of characters, without lower case alphabet shown. Except that shift characters for upper case and lower case were included. So the conclusion is that "it is doubtful that this is an internal character set for TX-2, even if it is an input device". It seems that I was too much in a hurry when I wrote [4], and did not make this distinction well enough, for other 2-line groups were for Flexowriters and other typewriters.

The TX-2 Set -- I/O Devices and the CPU

Vanderburgh's CACM paper [12] shows the set membership as:

26 Block English Letters
10 Standard Arabic Numerals
6 Greek Letters
12 Lower-case English letters ???
8 Punctuation Symbols -- , * ? ' ( ) { } (here they are)
11 Formula Symbols
7 Symbols for Symbolic Logic
8 Special Symbols

We can admit now that the incomplete character set, missing 14 lower case letters, was a little strange, but they claimed that the composition was studied carefully at the time, and that was their decision, strongly influenced by text processing needs.

Clues From the Web

"While TX-0 was still in possession of the big memory we wrote a program which allowed us to simulate a typewriter with 200 characters" (Jack Gilmore in [11]).
"A 36 bit operand word can be divided into one 36, one 27 and one 9, two 18, or four 9 bit subwords formed from the 9 bit quarters. The 9 bit quarters can be permutated among themselves. Any or all of the subwords can be used simultaneously" (TX-2 from [13]).
"Channels or tracks on the tape -- 10 Tracks/tape" [13]
"Lincoln Writer input -- 10 6 bit chars/sec" [13]
"Paper Tape Soroban punch -- 180 7 bit lines/sec" [13]
"Xerox printer -- 20 lines/sec, 1300 char/sec -- 88 characters can be printed in 2 sizes. 6 bit vert. & 9 bit horiz. axes resolution." [13]
"Lincoln Writer output -- 10 6 bit chars/sec" [13]
"One-of-a-kind research computer" [13]
"the keyboard's circuitry logic will sense what the case of the typewriter is and if it is not in the case of the selected character then the necessary case code will be generated first and then the selected character's code will be generated." (Jack Gilmore in [11]).
"The overbar and underbar will not cause the platen to be shifted to the right (I think he means left) one space position. ... The same is true of the box and circle symbols. ... These four symbols allow us to modify characters and thereby change their meaning." (Jack Gilmore in [11]).

I admit that all the clues were there, but the mechanics were missing. I didn't know how to link them together until a phone call from Jack Gilmore on 2002 Jun 17.

How It Worked

Characters from the Lincoln Writer entered in 6-bit form. Those that were to be themselves were processed into 9-bit form by prefixing (or appending, which might be better for sorting) a specific 3-bit combination to form their internal 9-bit representation. All other characters entered with their own 6-bit form adjoined to a 3-bit representation derived from:

the shift keys (unlocked or locked)
a superscript key (causing half-size)
a subscript key (causing half-size)
the code of a nonspacing box, circle, underscore, or overscore

These pairs were processed into a 9-bit form made by adding the 3 bits to the regular 6-bit codes to indicate each of these modifiers.

Reasons Why TX-2 Had No Priority on Curly Braces

TX-2 was a "one-of". No interchange was considered. It was not a commercial product offering, and its users were captive.
MIT was never associated with the X3.2 work, and had no connections with the committees.
As Jack Gilmore told me on 2002 Jun 18, most of the work on Whirlwind, TX-0, and TX-2 was pretty secret. Thus there was little likelihood that their character repertoires would have been familiar to the ASCII committees.
It's curly braces could have been replaced with brackets, with new slugs, and nobody would have noticed the difference, least of all the CPU and the software.
In actuality, slugs for curly braces had existed for a long time for the basket of the IBM Model B typewriter.
In Figure 7 of [16] (Lincoln Writer Code), ( and { are assigned as octal 52, ) and } as octal 53. In Figure 8 (The Lincoln Type on the Flexowriter) these same characters are assigned as octal 31 and 21 respectively. Nowhere is it given what 9-bit codes these would have internally, even as their simple graphics, let alone as compound characters as overstruck with circle, square, underline, overline, or as the same characters in reduced size form.
Thus there was no physical internal character set per se for the TX-2. It was ephemeral, put in force by the software used.
And that software used was the equivalent of today's scan codes.

Amelioration

Having established my own priority for point 2 by the fact that there was no internal character set per se for the Lincoln Writer, I must admit that it was a very clever device, and I do not wish this paper to diminish the accomplishment. Early text editing was a major goal and usage, especially for printing reports. So it must have been exceptionally useful for that closed community.

Placing Curly Braces in ASCII

Although X3 had conceded on 1961 Jun 7-9 that ASCII would go from a 6-bit code to 7-bit, almost no characters were added except the lower case alphabet. The curly braces were not added until 1963 Dec 17-18. Here Eric Fischer was able to help again with the X3.2.4 minutes (he must have made copies of every document I deposited with the Smithsonian Institution, for I recognized my own handprinting on the copies).

Document 12 was "Clamons Code Proposal dated 17 Dec 1963". It led to Item 7, which reads:

An ad hoc committee, comprised of Messrs. Clamons, Arne, Davis, Long, and Turner, was formed to consider the three positions following lower case z in the ISO draft proposal. X3.2.4 voted to recommend: ...
b. It is suggested that the three positions following
lower case z be: left brace, vertical line,
right brace. ...

To understand this, one needs to know that Eric Clamons had become a good friend when the ASCII development began, and continued when I went to UNIVAC as Director of Systems Programming where he was Manager of Product Planning for the UNIVAC 1050, which we tried to make the first ASCII-based computer ever. Together we handled all standardsmaking representation for UNIVAC. The proposal he carried to that meeting was mine.

We continued in the same warm mind-interchangeable relationship when I got him hired into GE when I went there and it continued until he died in 1999. So this is the proof for point 1, in addition to [1] and [2].

Conclusion

The above proves that I was the source responsible for placing 11 different characters into ASCII (point 1), and for at least 8 of these (with the possible exception of the vertical bar, and the square brackets of the LGP-30) it was the first placement in the internal character set of any computer (point 2).

Which is a partial explanation of why a parcel marked "Father of ASCII" (sometime about 1968), at General Electric in Phoenix, was forwarded to me, having inside a letter starting "Dear Bob".

REFERENCES

R.W.Bemer, "A proposal for a generalized card code of 256 characters",
Commun. ACM 2, No. 9, 19-23, 1959 Sep
R.W.Bemer, W.Buchholz, "An extended character set standard", IBM Tech.
Pub. TR00.18000.705, 1960 Jan, rev. TR00.721, 1960 Jun
R.W.Bemer, "A proposal for character code compatibility", Commun. ACM 3,
No. 2, 71-7, (1960 Feb
R.W.Bemer, "Survey of coded character representation", Commun. ACM 3,
No. 12, 639-641, 1960 Dec
R.W.Bemer, "A view of the history of the ISO character code", Honeywell
Computer J. 6, No. 4, 274-286, 1972
Jukka Korpela, Finland.
See http://www.cs.tut.fi/~jkorpela/latin1/ascii-hist.html
www.cwi.nl/~dik/english/codes/7tape.html Dik Winter
www.cwi.nl/~dik/english/codes/internal.html Dik Winter
www.cwi.nl/~dik/english/codes/magtap.html Dik Winter
www.bobbemer.com/BACSLASH.HTM
Letter from Alexander Vanderburgh, Jr., to R. W. Bemer, 1958 August 15
(in National Museum of American History, 315 Box 4), as transmittal cover for
Letter from Jack Gilmore to Vanderburgh, re "The Lincoln Writer".
1958 August 14 (also in NMAH, 315 Box 4).
A. Vanderburgh, "The Lincoln Keyboard & a Typewriter Keyboard Designed for Computer Input Flexibility:, CACM 1, No. 7, 1958 Jul, p. 4.
ed-thelen.org/comp-hist/BRL61-1.html
http://www.computer-archiv.de
J.T.Gilmore, Jr., R.E.Savell, "The Lincoln Writer", M.I.T. Lincoln Lab.,
Group Report 51-8, 1959 October 06.

Back to History Index Back to Home Page

The Great Curly Brace Trace Chase

Computer History Vignettes By Bob Bemer

Computer History Vignettes

By Bob Bemer