www.fgks.org   »   [go: up one dir, main page]

The Great Curly Brace Trace Chase

Computer History Vignettes

By Bob Bemer

On my site is a pointer to an interesting page by Jukka Korpela of Finland [6], called "Character histories: Notes on some ASCII code positions". It tells how various interesting characters (then new to many computer character sets of the time) came into membership in the ASCII set.

Such research is difficult now, in an 8-bit byte, ASCII code, and soft-copy screen world, where typewriters, codes of different length, and computer word lengths have been pretty much forgotten. And their documentation is mostly in hardcopy libraries; not on the Web.

 
This vignette came about because I started to wonder:

  1. If the curly braces exist in ASCII because of my efforts and examples, and/or
  2. If I had been first to put curly braces, via IBM's Stretch, into the internal character set of any computer.

Here the only words subject to misinterpretation are "internal character set".
Webopedia helps us out by giving this definition:

"character set" -- "a defined list of characters recognized by the computer hardware and software".

That is not "hardware and/or software", but "hardware and software". Both. And the character set must have the same content, and size in bits, for both. For (1) above, remember that ASCII is (ASCI)Interchange.

These definitions controlled my search. The character set or repertoire of computer hardware is not at all necessarily the same as that of input-output equipment that can be used. Conversations on the Web are bloated with instances of combinations of standard characters that are used to indicate a character, of the character set of the computer used, that is not enterable by keyboard. Typical are those of the C language, where "\e" means the single escape character. Granted, that character itself is recognized by the hardware, but hordes of programmers have interrupted the scan code it generates on the keyboard to provide functions not those of the character itself.

Ah. Scan codes. Remember that keys on today's keyboard do not emit ASCII codes; they emit "scan codes", which are converted to ASCII according to the character layout of your keyboard. That is how the French maintain the "azerty", whereas the US has "qwerty" left to right, upper left, on its keyboards. Ditto for Cyrillic keyboards and such.

And seldom do adjacent tracks on magnetic tape match the adjacency of the bits of a character as stored internally as the code of the CPU itself. But there must be a universal 1-to-1 mapping.

Among other aberrations to consider is the possibility of encoding typewriter-like devices so that, although both upper and lower case alphabetic letters would print differently on paper, they would be encoded identically, and thus pass into the computer CPU as the same character.

Stretch

IBM's 7030 (Stretch) was thought to be the first production computer to have a byte size greater than 6 (in fact 8 bits), allowing lower case alphabet and other useful characters to be accommodated. The timing was:

Note 1. The travelog on my website, on the Lockheed-IBM page, shows that as my only visit to Los Alamos during that period, annotated "Stretch Planning". Be assured that the only bit of Stretch planning that I did was the character set. And perhaps advise a bit on software. The 8-bit byte decision had already been made when I got there, and I did say a strong Amen to that!

Note 2. For this to happen, especially for the most powerful machine built to that time, and certainly provided with FORTRAN, the character set would have to have been completely fixed at least 8 months prior to starting to build the CPU. I'm willing to estimate that increment to be cut to 4 months for any other computer, small, with an internal code containing any of [ ] { } \ . (Stretch had no escape character --
I didn't think of it until 1959 October).

Eric Fischer concluded from somewhere that the Stretch set dated from 1959 November, while my formula (above) gives 1958 April. We'll see if any competitors approach this date. The official character set publication [2], has its initial reference as [1], of 1959 September.

My investigation seemed like a good story, so follow along. First let's look at the ASCII characters that are known unequivocally to be due to me:

 
ESCAPE

See [3], published in 1960 Feb, but made known to workers in the coding field by 1959 Oct.

Four Information Separators

US (Unit Separator), RS (Record Separator), GS (Group Separator), and FS (File Separator) are the only 4 remaining of the 8 information separators (ISi) (or data delimiters) that I put in in 1961 September. As I said in my Interface Age articles on ASCII, I got the idea from the Word Mark in the character-based IBM 1401.

Backslash

We may take it as absolute proof of genesis, from the early limited character sets, that IBM's STRETCH was the first computer to use the backslash character, for Reference [4] was actually published a couple of years after the design of that machine. And that uniqueness continued. [4] shows no 6-bit set with the backslash as a working character.

Here is an excerpt from both [15] and [5] (the latter is the source for Korpela's paper):

"I had called a joint meeting of IBM, SHARE, and GUIDE, to regularize the IBM 6-bit set to become the standard BCD Interchange Code ... Frequency studies of symbol occurrence had been prepared, particularly from ALGOL programs. The meeting of 1961 July 6 produced general agreement on a basic 60-64-character set, which included the two square brackets and the reverse slant, which was chosen in conjunction with "/" to yield 2-character representations for the AND and OR of early ALGOL. This is reflected in the set I proposed to ANSI X3.2 on 1961 September 18."

     (Note: I had put the backslash in position 5/15. It enabled the ALGOL "and" to be "/\" and the "or" to be "\/".)

SHARE and GUIDE representatives at the meeting were a little stubborn about accepting my proposed backslash, so I asked for a character more important to have. After much discussion they could not agree on a better candidate.

"At the 61 November 8-10 meeting, X3.2 constructed the first formal proposal, X3.2\1 ..."     (which, much modified, was to become ASCII)

      (Note: In this proposal the backslash was moved to position 5/12, and there it has remained ever since.)

Square Brackets

Having documented the introduction of square brackets to ASCII, careful research must be done to check for previous existence in any computer set. Here the backslash played a very discriminant role.

[7], [8] and [9] have very good information, but all have a basic flaw. The specific computers for which the coded sets are given are not identified to more than the manufacturer (in some cases not even that), nor are the years of their introduction.

Their author, Dik Winter, admits this and says he will try harder, but much documentation is lost, and it was characteristic of the times that nobody seemed to think character sets a very important feature of computers.

In [7], CD display, General Electric Internal, NCR, and Bull Scientific Internal all show both the brackets and the backslash, which would be a remarkable coincidence if they existed prior to ASCII. CDC Display Scientific shows brackets only, but Winter now says it could only be the 3600, introduced 1963 (too late). BCL Internal shows brackets only, but I am unable to find such a manufacturer in my list of nearly 3000 different computers to date.

In [8], [ and ] were shown for the RPC 4000 (too late at 1960 Nov), and a "MC" Flexowriter (I have to assume that Winter, of the Netherlands, speaking of ALGOL 60, means the computers built by the Mathematical Centre (1962 and later), or the Electrologica X-1 of 1960.

[9] shows two mag tape codes with [\], but they're copycats. One is EBCDIC.

The LGP-30 and Square Brackets

Both [4] and [9] showed the LGP-30 with square brackets, as in a 7-bit set, so it had to be checked. The Web gave very few hits for this 1956 machine. Fortunately one was the museum of Dr. Tim Bergin's home base. He kindly faxed me a few pages from a training manual they had on exhibit. Ref. [14] specified a 32-bit word length, fit only for 8-bit bytes. If so, way ahead of its time.

6-bit encodings, each assigned to two different characters, match what I had in [4], and include key or paper tape shifts to upper case and lower case. Perhaps each shift condition added a 7th bit, selecting between the 6-bit combination assigned to the pairs of characters. The pairs themselves look like the typewriter key pairs. Perhaps the eighth bit was immaterial, or perhaps a parity bit.

But how would 7-bit encodings, formed from 6 input bits augmented by a shift bit (did the CAPS LOCK submit it anew for successive upper case?) drive the Flexowriter on output?

For now I am prepared to cede that the LGP-30 might have been the first and only other computer to have the square brackets in its character set. Until the Stretch computer came along, that is.

Others

Relative to ASCII, I had little to do with introducing the tilde, accent acute, accent grave, less than, greater than, or the standard typewriter special characters of that era, even though I used all except the accents for Stretch. Plus the vertical bar, which went into ASCII at the same time as the curly braces (but many computers had them previously).

So no claims are made.

Placement

I did get X3 to agree to move the alphabets down one position, reserving the three positions after z and Z for international usage in ASCII-alternate sets. I learned that by examining the Copenhagen telephone book while there in 1963 Sep, finding that the names starting with accented vowels followed the regular alphabet of 26.

Curly Braces

Now we get to the mystery. My search started with:

Date: Mon, 3 Dec 2001 14:22:49 -0600 (CST)
From: Eric Fischer
To: bob@bobbemer.com

> Bemer:

> I'm adding a question. Do you have source knowledge about the curly
> braces? I put them in the Stretch set, and now I cannot find any
> previous computer character repertoire that had them. I scarcely
> believe that I was the first to use them, being so common in literature,
> but incorporation in computer sets is a different thing.

Eric:

I remember that you also included them in the 256-character card code that you published in CACM, because I found it interesting that you proposed that they could be used in place of the Algol 'begin' and 'end' keywords, which is exactly how they were later used in the C language.

The next day he added:

Date: Tue, 4 Dec 2001
From: Eric Fischer
To: bob@bobbemer.com

It looks like the Lincoln Writer for the TX-0 at MIT Lincoln Lab probably had the curly braces before you did. Unfortunately the earliest listing of its characters I've seen is from January, 1960, but the letters from its designers that are among the papers you donated to the Smithsonian give the impression that the machine had been designed and the first one was already being built in August, 1958. (I know there was a letter or article in CACM about it at some point too, but I don't have a copy and don't know the date, and I don't think it included the character list.) (Note: It did, but it was for the TX-2, which became operative in 1959.)

That's the only computer code I know of that seems to have included the curly braces before your Proposal.

The only curly braces found in Ref. [4] were indeed for Stretch and the Lincoln Writer, and they appear nowhere else in character sets prior to 1962, at least. But a recently-obtained copy of [16] indicates that linking this device with the TX-0 was erroneous. Vanderburgh states succinctly that "The Lincoln Keyboard was designed for use on TX-2 ... both for preparation of programs on punched paper tape and for direct console communication in program language".

But here the matter gets obscure. I know that Stretch (in [4]) had two lines of characters to accommodate encoding for more than 64 characters that actually were manipulated by the hardware. Thus the "Stretch character set.

And the Lincoln Writer had two lines of characters, without lower case alphabet shown. Except that shift characters for upper case and lower case were included. So the conclusion is that "it is doubtful that this is an internal character set for TX-2, even if it is an input device". It seems that I was too much in a hurry when I wrote [4], and did not make this distinction well enough, for other 2-line groups were for Flexowriters and other typewriters.

The TX-2 Set -- I/O Devices and the CPU

Vanderburgh's CACM paper [12] shows the set membership as:

  1. 26 Block English Letters
  2. 10 Standard Arabic Numerals
  3. 6 Greek Letters
  4. 12 Lower-case English letters ???
  5. 8 Punctuation Symbols -- , * ? ' ( ) { } (here they are)
  6. 11 Formula Symbols
  7. 7 Symbols for Symbolic Logic
  8. 8 Special Symbols
We can admit now that the incomplete character set, missing 14 lower case letters, was a little strange, but they claimed that the composition was studied carefully at the time, and that was their decision, strongly influenced by text processing needs.

 
Clues From the Web

I admit that all the clues were there, but the mechanics were missing. I didn't know how to link them together until a phone call from Jack Gilmore on 2002 Jun 17.

How It Worked

Characters from the Lincoln Writer entered in 6-bit form. Those that were to be themselves were processed into 9-bit form by prefixing (or appending, which might be better for sorting) a specific 3-bit combination to form their internal 9-bit representation. All other characters entered with their own 6-bit form adjoined to a 3-bit representation derived from:

These pairs were processed into a 9-bit form made by adding the 3 bits to the regular 6-bit codes to indicate each of these modifiers.

Reasons Why TX-2 Had No Priority on Curly Braces

  1. TX-2 was a "one-of". No interchange was considered. It was not a commercial product offering, and its users were captive.
  2. MIT was never associated with the X3.2 work, and had no connections with the committees.
  3. As Jack Gilmore told me on 2002 Jun 18, most of the work on Whirlwind, TX-0, and TX-2 was pretty secret. Thus there was little likelihood that their character repertoires would have been familiar to the ASCII committees.
  4. It's curly braces could have been replaced with brackets, with new slugs, and nobody would have noticed the difference, least of all the CPU and the software.
  5. In actuality, slugs for curly braces had existed for a long time for the basket of the IBM Model B typewriter.
  6. In Figure 7 of [16] (Lincoln Writer Code), ( and { are assigned as octal 52, ) and } as octal 53. In Figure 8 (The Lincoln Type on the Flexowriter) these same characters are assigned as octal 31 and 21 respectively. Nowhere is it given what 9-bit codes these would have internally, even as their simple graphics, let alone as compound characters as overstruck with circle, square, underline, overline, or as the same characters in reduced size form.
  7. Thus there was no physical internal character set per se for the TX-2. It was ephemeral, put in force by the software used.
  8. And that software used was the equivalent of today's scan codes.

Amelioration

Having established my own priority for point 2 by the fact that there was no internal character set per se for the Lincoln Writer, I must admit that it was a very clever device, and I do not wish this paper to diminish the accomplishment. Early text editing was a major goal and usage, especially for printing reports. So it must have been exceptionally useful for that closed community.

Placing Curly Braces in ASCII

Although X3 had conceded on 1961 Jun 7-9 that ASCII would go from a 6-bit code to 7-bit, almost no characters were added except the lower case alphabet. The curly braces were not added until 1963 Dec 17-18. Here Eric Fischer was able to help again with the X3.2.4 minutes (he must have made copies of every document I deposited with the Smithsonian Institution, for I recognized my own handprinting on the copies).

Document 12 was "Clamons Code Proposal dated 17 Dec 1963". It led to Item 7, which reads:

An ad hoc committee, comprised of Messrs. Clamons, Arne, Davis, Long, and Turner, was formed to consider the three positions following lower case z in the ISO draft proposal. X3.2.4 voted to recommend: ...

b. It is suggested that the three positions following
    lower case z be: left brace, vertical line,
    right brace. ...

To understand this, one needs to know that Eric Clamons had become a good friend when the ASCII development began, and continued when I went to UNIVAC as Director of Systems Programming where he was Manager of Product Planning for the UNIVAC 1050, which we tried to make the first ASCII-based computer ever. Together we handled all standardsmaking representation for UNIVAC. The proposal he carried to that meeting was mine.

We continued in the same warm mind-interchangeable relationship when I got him hired into GE when I went there and it continued until he died in 1999. So this is the proof for point 1, in addition to [1] and [2].

Conclusion

The above proves that I was the source responsible for placing 11 different characters into ASCII (point 1), and for at least 8 of these (with the possible exception of the vertical bar, and the square brackets of the LGP-30) it was the first placement in the internal character set of any computer (point 2).

Which is a partial explanation of why a parcel marked "Father of ASCII" (sometime about 1968), at General Electric in Phoenix, was forwarded to me, having inside a letter starting "Dear Bob".

REFERENCES

  1. R.W.Bemer, "A proposal for a generalized card code of 256 characters",
    Commun. ACM 2, No. 9, 19-23, 1959 Sep
  2. R.W.Bemer, W.Buchholz, "An extended character set standard", IBM Tech.
    Pub. TR00.18000.705, 1960 Jan, rev. TR00.721, 1960 Jun
  3. R.W.Bemer, "A proposal for character code compatibility", Commun. ACM 3,
    No. 2, 71-7, (1960 Feb
  4. R.W.Bemer, "Survey of coded character representation", Commun. ACM 3,
    No. 12, 639-641, 1960 Dec
  5. R.W.Bemer, "A view of the history of the ISO character code", Honeywell
    Computer J. 6, No. 4, 274-286, 1972
  6. Jukka Korpela, Finland.
     See http://www.cs.tut.fi/~jkorpela/latin1/ascii-hist.html
  7. www.cwi.nl/~dik/english/codes/7tape.html Dik Winter
  8. www.cwi.nl/~dik/english/codes/internal.html Dik Winter
  9. www.cwi.nl/~dik/english/codes/magtap.html Dik Winter
  10. www.bobbemer.com/BACSLASH.HTM
  11. Letter from Alexander Vanderburgh, Jr., to R. W. Bemer, 1958 August 15
    (in National Museum of American History, 315 Box 4), as transmittal cover for
  12. Letter from Jack Gilmore to Vanderburgh, re "The Lincoln Writer".
    1958 August 14 (also in NMAH, 315 Box 4).
  13. A. Vanderburgh, "The Lincoln Keyboard & a Typewriter Keyboard Designed for Computer Input Flexibility:, CACM 1, No. 7, 1958 Jul, p. 4.
  14. ed-thelen.org/comp-hist/BRL61-1.html
  15. http://www.computer-archiv.de
  16. J.T.Gilmore, Jr., R.E.Savell, "The Lincoln Writer", M.I.T. Lincoln Lab.,
    Group Report 51-8, 1959 October 06.

 

Back to History Index        Back to Home Page