On my site is a pointer to an interesting page by Jukka Korpela of
Finland [6], called "Character histories: Notes on some ASCII code
positions". It tells how various interesting characters (then new to
many computer character sets of the time) came into membership in the
ASCII set.
Such research is difficult now, in an 8-bit byte, ASCII code, and
soft-copy screen world, where typewriters, codes of different length,
and computer word lengths have been pretty much forgotten. And their
documentation is mostly in hardcopy libraries; not on the Web.
This vignette came about because I started to wonder:
- If the curly braces exist in ASCII because of my efforts and
examples, and/or
- If I had been first to put curly braces, via IBM's Stretch, into
the internal character set of any computer.
Here the only words subject to misinterpretation are "internal character
set".
Webopedia helps us out by giving this definition:
"character set" -- "a defined list of characters
recognized by the computer hardware and software".
That is not "hardware and/or software", but "hardware and
software". Both. And the character set must have the same content, and
size in bits, for both. For (1) above, remember that ASCII is
(ASCI)Interchange.
These definitions controlled my search. The character set or repertoire
of computer hardware is not at all necessarily the same as that of
input-output equipment that can be used. Conversations on the Web are
bloated with instances of combinations of standard characters that are
used to indicate a character, of the character set of the computer
used, that is not enterable by keyboard. Typical are those of the C
language, where "\e" means the single escape character. Granted, that
character itself is recognized by the hardware, but hordes of
programmers have interrupted the scan code it generates on the keyboard
to provide functions not those of the character itself.
Ah. Scan codes. Remember that keys on today's keyboard do not emit ASCII
codes; they emit "scan codes", which are converted to ASCII according
to the character layout of your keyboard. That is how the French
maintain the "azerty", whereas the US has "qwerty" left to right, upper
left, on its keyboards. Ditto for Cyrillic keyboards and such.
And seldom do adjacent tracks on magnetic tape match the adjacency of
the bits of a character as stored internally as the code of the CPU
itself. But there must be a universal 1-to-1 mapping.
Among other aberrations to consider is the possibility of encoding
typewriter-like devices so that, although both upper and lower case
alphabetic letters would print differently on paper, they would be
encoded identically, and thus pass into the computer CPU as the same
character.
Stretch
IBM's 7030 (Stretch) was thought to be the first production computer to
have a byte size greater than 6 (in fact 8 bits), allowing lower case
alphabet and other useful characters to be accommodated. The timing
was:
- 1956 Sep 16-22 -- Large group of IBM people to Los Alamos (including
Bemer from NYC Headquarters) (Note 1).
- 1956 Dec -- Dunwell paper "Design Objectives for the IBM
Stretch Computer", EJCC paper
- 1959 Jan -- First two 7030s began assembly in Poughkeepsie (Note 2).
- 1960 Jan -- First publication of character set to outsiders. [2]
- 1961 Apr 16 -- First Stretch delivered (to Los Alamos)
Note 1. The travelog on my website, on the Lockheed-IBM page, shows
that as my only visit to Los Alamos during that period, annotated
"Stretch Planning". Be assured that the only bit of Stretch planning
that I did was the character set. And perhaps advise a bit on software.
The 8-bit byte decision had already been made when I got there, and I
did say a strong Amen to that!
Note 2. For this to happen, especially for the most powerful machine
built to that time, and certainly provided with FORTRAN, the character
set would have to have been completely fixed at least 8 months prior to
starting to build the CPU. I'm willing to estimate that increment to be
cut to 4 months for any other computer, small, with an internal code
containing any of [ ] { } \ . (Stretch had no escape character --
I didn't think of it until 1959 October).
Eric Fischer concluded from somewhere that the Stretch set dated from
1959 November, while my formula (above) gives 1958 April. We'll see if
any competitors approach this date. The official character set
publication [2], has its initial reference as [1], of 1959 September.
My investigation seemed like a good story, so follow along. First let's
look at the ASCII characters that are known unequivocally to be due to me:
ESCAPE
See [3], published in 1960 Feb, but made known to workers in the coding
field by 1959 Oct.
Four Information Separators
US (Unit Separator), RS (Record Separator), GS (Group Separator), and FS
(File Separator) are the only 4 remaining of the 8 information separators
(ISi) (or data delimiters) that I put in in 1961 September. As I said in
my Interface Age articles on ASCII, I got the idea from the Word Mark in
the character-based IBM 1401.
Backslash
We may take it as absolute proof of genesis, from the early limited
character sets, that IBM's STRETCH was the first computer to use the
backslash character, for Reference [4] was actually published a couple
of years after the design of that machine. And that uniqueness
continued. [4] shows no 6-bit set with the backslash as a working
character.
Here is an excerpt from both [15] and [5] (the latter is the source for
Korpela's paper):
"I had called a joint meeting of IBM, SHARE, and GUIDE, to regularize
the IBM 6-bit set to become the standard BCD Interchange Code ...
Frequency studies of symbol occurrence had been prepared, particularly
from ALGOL programs. The meeting of 1961 July 6 produced general
agreement on a basic 60-64-character set, which included the two square
brackets and the reverse slant, which was chosen in conjunction with "/"
to yield 2-character representations for the AND and OR of early ALGOL.
This is reflected in the set I proposed to ANSI X3.2 on 1961 September
18."
(Note: I had put the backslash in position 5/15. It enabled the ALGOL
"and" to be "/\" and the "or" to be "\/".)
SHARE and GUIDE representatives at the meeting were a little stubborn
about accepting my proposed backslash, so I asked for a character more
important to have. After much discussion they could not agree on a
better candidate.
"At the 61 November 8-10 meeting, X3.2 constructed the first formal
proposal, X3.2\1 ..." (which, much modified, was to
become ASCII)
(Note: In this proposal the backslash was moved to position 5/12,
and there it has remained ever since.)
Square Brackets
Having documented the introduction of square brackets to ASCII, careful
research must be done to check for previous existence in any computer
set. Here the backslash played a very discriminant role.
[7], [8] and [9] have very good information, but all have a basic
flaw. The specific computers for which the coded sets are given are not
identified to more than the manufacturer (in some cases not even that),
nor are the years of their introduction.
Their author, Dik Winter, admits this and says he will try harder, but
much documentation is lost, and it was characteristic of the times
that nobody seemed to think character sets a very important feature of
computers.
In [7], CD display, General Electric Internal, NCR, and Bull Scientific
Internal all show both the brackets and the backslash, which would be a
remarkable coincidence if they existed prior to ASCII. CDC Display
Scientific shows brackets only, but Winter now says it could only be
the 3600, introduced 1963 (too late). BCL Internal shows brackets only,
but I am unable to find such a manufacturer in my list of nearly 3000
different computers to date.
In [8], [ and ] were shown for the RPC 4000 (too late at 1960 Nov), and
a "MC" Flexowriter (I have to assume that Winter, of the Netherlands,
speaking of ALGOL 60, means the computers built by the Mathematical
Centre (1962 and later), or the Electrologica X-1 of 1960.
[9] shows two mag tape codes with [\], but they're copycats. One is EBCDIC.
The LGP-30 and Square Brackets
Both [4] and [9] showed the LGP-30 with square brackets, as in a 7-bit
set, so it had to be checked. The Web gave very few hits for this
1956 machine. Fortunately one was the museum of Dr. Tim Bergin's home
base. He kindly faxed me a few pages from a training manual
they had on exhibit. Ref. [14] specified a 32-bit word length, fit
only for 8-bit bytes. If so, way ahead of its time.
6-bit encodings, each assigned to two different characters, match what I
had in [4], and include key or paper tape shifts to upper case and lower
case. Perhaps each shift condition added a 7th bit, selecting between
the 6-bit combination assigned to the pairs of characters. The pairs
themselves look like the typewriter key pairs. Perhaps the eighth bit
was immaterial, or perhaps a parity bit.
But how would 7-bit encodings, formed from 6 input bits augmented by a
shift bit (did the CAPS LOCK submit it anew for successive upper case?)
drive the Flexowriter on output?
For now I am prepared to cede that the LGP-30 might have been the first
and only other computer to have the square brackets in its character set.
Until the Stretch computer came along, that is.
Others
Relative to ASCII, I had little to do with introducing the tilde, accent
acute, accent grave, less than, greater than, or the standard typewriter
special characters of that era, even though I used all except the
accents for Stretch. Plus the vertical bar, which went into ASCII at the
same time as the curly braces (but many computers had them previously).
So no claims are made.
Placement
I did get X3 to agree to move the alphabets down one position, reserving
the three positions after z and Z for international usage in
ASCII-alternate sets. I learned that by examining the Copenhagen
telephone book while there in 1963 Sep, finding that the names starting
with accented vowels followed the regular alphabet of 26.
Curly Braces
Now we get to the mystery. My search started with:
Date: Mon, 3 Dec 2001 14:22:49 -0600 (CST)
From: Eric Fischer
To: bob@bobbemer.com
> Bemer:
> I'm adding a question. Do you have source knowledge about the curly
> braces? I put them in the Stretch set, and now I cannot find any
> previous computer character repertoire that had them. I scarcely
> believe that I was the first to use them, being so common in literature,
> but incorporation in computer sets is a different thing.
Eric:
I remember that you also included them in the 256-character card code
that you published in CACM, because I found it interesting that you
proposed that they could be used in place of the Algol 'begin' and
'end' keywords, which is exactly how they were later used in the C
language.
The next day he added:
Date: Tue, 4 Dec 2001
From: Eric Fischer
To: bob@bobbemer.com
It looks like the Lincoln Writer for the TX-0 at MIT Lincoln Lab
probably had the curly braces before you did. Unfortunately the earliest
listing of its characters I've seen is from January, 1960, but the
letters from its designers that are among the papers you donated to the
Smithsonian give the impression that the machine had been designed and
the first one was already being built in August, 1958. (I know there
was a letter or article in CACM about it at some point too, but I don't
have a copy and don't know the date, and I don't think it included the
character list.)
(Note: It did, but it was for the TX-2, which became
operative in 1959.)
That's the only computer code I know of that seems to have included
the curly braces before your Proposal.
The only curly braces found in Ref. [4] were indeed for Stretch and the
Lincoln Writer, and they appear nowhere else in character sets prior to
1962, at least. But a recently-obtained copy of [16] indicates that
linking this device with the TX-0 was erroneous. Vanderburgh states
succinctly that "The Lincoln Keyboard was designed for use on TX-2 ...
both for preparation of programs on punched paper tape and for direct
console communication in program language".
But here the matter gets obscure. I know that Stretch (in [4]) had two
lines of characters to accommodate encoding for more than 64 characters
that actually were manipulated by the hardware. Thus the "Stretch
character set.
And the Lincoln Writer had two lines of characters, without lower case
alphabet shown. Except that shift characters for upper case and lower
case were included. So the conclusion is that "it is doubtful that this
is an internal character set for TX-2, even if it is an input device".
It seems that I was too much in a hurry when I wrote [4], and did not
make this distinction well enough, for other 2-line groups were for
Flexowriters and other typewriters.
The TX-2 Set -- I/O Devices and the CPU
Vanderburgh's CACM paper [12] shows the set membership as:
- 26 Block English Letters
- 10 Standard Arabic Numerals
- 6 Greek Letters
- 12 Lower-case English letters ???
- 8 Punctuation Symbols -- , * ? ' ( ) { } (here they are)
- 11 Formula Symbols
- 7 Symbols for Symbolic Logic
- 8 Special Symbols
We can admit now that the incomplete character set, missing 14 lower case
letters, was a little strange, but they claimed that the composition was
studied carefully at the time, and that was their decision, strongly
influenced by text processing needs.
Clues From the Web
- "While TX-0 was still in possession of the big memory we wrote a
program which allowed us to simulate a typewriter with 200 characters"
(Jack Gilmore in [11]).
- "A 36 bit operand word can be divided into one 36, one 27 and one 9,
two 18, or four 9 bit subwords formed from the 9 bit quarters. The
9 bit quarters can be permutated among themselves. Any or all of the
subwords can be used simultaneously" (TX-2 from [13]).
- "Channels or tracks on the tape -- 10 Tracks/tape" [13]
- "Lincoln Writer input -- 10 6 bit chars/sec" [13]
- "Paper Tape Soroban punch -- 180 7 bit lines/sec" [13]
- "Xerox printer -- 20 lines/sec, 1300 char/sec -- 88 characters can be
printed in 2 sizes. 6 bit vert. & 9 bit horiz. axes resolution." [13]
- "Lincoln Writer output -- 10 6 bit chars/sec" [13]
- "One-of-a-kind research computer" [13]
- "the keyboard's circuitry logic will sense what the case of the
typewriter is and if it is not in the case of the selected character
then the necessary case code will be generated first and then the
selected character's code will be generated." (Jack Gilmore in [11]).
- "The overbar and underbar will not cause the platen to be shifted to
the right (I think he means left) one space position. ... The same
is true of the box and circle symbols. ... These four symbols allow
us to modify characters and thereby change their meaning." (Jack
Gilmore in [11]).
I admit that all the clues were there, but the mechanics were missing. I
didn't know how to link them together until a phone call from Jack
Gilmore on 2002 Jun 17.
How It Worked
Characters from the Lincoln Writer entered in 6-bit form. Those that were
to be themselves were processed into 9-bit form by prefixing (or appending,
which might be better for sorting) a specific 3-bit combination to form
their internal 9-bit representation. All other characters entered with
their own 6-bit form adjoined to a 3-bit representation derived from:
- the shift keys (unlocked or locked)
- a superscript key (causing half-size)
- a subscript key (causing half-size)
- the code of a nonspacing box, circle, underscore, or overscore
These pairs were processed into a 9-bit form made by adding the 3 bits
to the regular 6-bit codes to indicate each of these modifiers.
Reasons Why TX-2 Had No Priority on Curly Braces
- TX-2 was a "one-of". No interchange was considered. It was not a
commercial product offering, and its users were captive.
- MIT was never associated with the X3.2 work, and had no connections
with the committees.
- As Jack Gilmore told me on 2002 Jun 18, most of the work on Whirlwind,
TX-0, and TX-2 was pretty secret. Thus there was little likelihood
that their character repertoires would have been familiar to the ASCII
committees.
- It's curly braces could have been replaced with brackets, with
new slugs, and nobody would have noticed the difference, least of
all the CPU and the software.
- In actuality, slugs for curly braces had existed for a long time
for the basket of the IBM Model B typewriter.
- In Figure 7 of [16] (Lincoln Writer Code), ( and { are assigned as
octal 52, ) and } as octal 53. In Figure 8 (The Lincoln Type on
the Flexowriter) these same characters are assigned as octal 31 and
21 respectively. Nowhere is it given what 9-bit codes these would
have internally, even as their simple graphics, let alone as
compound characters as overstruck with circle, square, underline,
overline, or as the same characters in reduced size form.
- Thus there was no physical internal character set per se
for the TX-2. It was ephemeral, put in force by the software used.
- And that software used was the equivalent of today's scan codes.
Amelioration
Having established my own priority for point 2 by the fact that there
was no internal character set per se for the Lincoln Writer, I must
admit that it was a very clever device, and I do not wish this
paper to diminish the accomplishment. Early text editing was a major
goal and usage, especially for printing reports. So it must have been
exceptionally useful for that closed community.
Placing Curly Braces in ASCII
Although X3 had conceded on 1961 Jun 7-9 that ASCII would go from a 6-bit
code to 7-bit, almost no characters were added except the lower case
alphabet. The curly braces were not added until 1963 Dec 17-18. Here
Eric Fischer was able to help again with the X3.2.4 minutes (he must have
made copies of every document I deposited with the Smithsonian Institution,
for I recognized my own handprinting on the copies).
Document 12 was "Clamons Code Proposal dated 17 Dec 1963". It led to Item
7, which reads:
An ad hoc committee, comprised of Messrs. Clamons, Arne, Davis, Long,
and Turner, was formed to consider the three positions following lower
case z in the ISO draft proposal. X3.2.4 voted to recommend: ...
b. It is suggested that the three positions following
lower case z be: left brace, vertical line,
right brace. ...
To understand this, one needs to know that Eric Clamons had become a good
friend when the ASCII development began, and continued when I went to
UNIVAC as Director of Systems Programming where he was Manager of Product
Planning for the UNIVAC 1050, which we tried to make the first ASCII-based
computer ever. Together we handled all standardsmaking representation for
UNIVAC. The proposal he carried to that meeting was mine.
We continued in the same warm mind-interchangeable relationship when I
got him hired into GE when I went there and it continued until he died
in 1999. So this is the proof for point 1, in addition to [1] and [2].
Conclusion
The above proves that I was the source responsible for placing 11
different characters into ASCII (point 1), and for at least 8 of these
(with the possible exception of the vertical bar, and the square
brackets of the LGP-30) it was the first placement in the internal
character set of any computer (point 2).
Which is a partial explanation of why a parcel marked "Father of ASCII"
(sometime about 1968), at General Electric in Phoenix, was
forwarded to me, having inside a letter starting "Dear Bob".
REFERENCES
- R.W.Bemer, "A proposal for a generalized card code of 256 characters",
Commun. ACM 2, No. 9, 19-23, 1959 Sep
- R.W.Bemer, W.Buchholz, "An extended character set standard",
IBM Tech.
Pub. TR00.18000.705, 1960 Jan, rev. TR00.721, 1960 Jun
- R.W.Bemer, "A proposal for character code compatibility",
Commun. ACM 3,
No. 2, 71-7, (1960 Feb
- R.W.Bemer, "Survey of coded character representation",
Commun. ACM 3,
No. 12, 639-641, 1960 Dec
- R.W.Bemer, "A view of the history of the ISO character code",
Honeywell
Computer J. 6, No. 4, 274-286, 1972
- Jukka Korpela, Finland.
See
http://www.cs.tut.fi/~jkorpela/latin1/ascii-hist.html
- www.cwi.nl/~dik/english/codes/7tape.html Dik Winter
- www.cwi.nl/~dik/english/codes/internal.html Dik Winter
- www.cwi.nl/~dik/english/codes/magtap.html Dik Winter
- www.bobbemer.com/BACSLASH.HTM
- Letter from Alexander Vanderburgh, Jr., to R. W. Bemer, 1958
August 15
(in National Museum of American History, 315 Box 4),
as transmittal cover for
- Letter from Jack Gilmore to Vanderburgh, re "The Lincoln Writer".
1958 August 14 (also in NMAH, 315 Box 4).
- A. Vanderburgh, "The Lincoln Keyboard & a Typewriter Keyboard Designed
for Computer Input Flexibility:, CACM 1, No. 7, 1958 Jul, p. 4.
- ed-thelen.org/comp-hist/BRL61-1.html
- http://www.computer-archiv.de
- J.T.Gilmore, Jr., R.E.Savell, "The Lincoln Writer", M.I.T. Lincoln Lab.,
Group Report 51-8, 1959 October 06.
Back to History Index
Back to Home Page