Why are arrays zero-based in so many languages?
This article is a summary of a wonderful lightning talk my friend and former colleague Kent Spillner gave a while ago.
Lua is an embeddable scripting language. One of its interesting features/quirks is that its arrays are 1-based -- flouting the convention of so many other programming languages that have 0-based arrays.
The reasoning is three-fold. One-third -- the hard, mathematical part -- is well-known in computer circles. Arrays in C are really just syntactical sugar atop memory pointers. A 0-based indexing scheme makes the pointer arithmetic simpler:
a[0] = *a // first element
a[1] = *(a + 1) // second element
...
a[n] = *(a + n) // n+1'th element
Since many of the languages with 0-based arrays came after C, they simply kept the convention.
However, this reasoning is weak. After all, with only minimal added complexity, the compiler could translate 1-based arrays in the source-code into correct pointer arithmetic in the machine-code.
The second part of the reasoning is that compiler authors were influenced by the (admittedly very influential) opinion of Edsger Dijkstra, who reasoned in a paper that to iterate over a sequence of numbers (say) 2, 3, ..., 12, the following convention was most preferable as to the choice of iterating variable i:
2 ≤ i < 13
This clearly favors a 0-based indexing scheme for arrays.
However, there is one problem with this neat reasoning: Dijkstra's paper was published in 1982; so it is unlikely that Richie, Thomson and Kernighan were influenced by Dijkstra's views when they were designing and refining C between 1972 and 1978.
Here's where we must resist insisting that the decision was based on computer-science or even notions of mathematical elegance. Most of the problems in computer science are people problems, and even when you think you have a computer-problem, look harder and you'll find a people-problem. This gets us to the the third part of the reasoning, which is a story.
In the early 1960s, IBM gave MIT a new IBM 7094 computer (price tag: USD 3.5 million in 1960s dollars) at a discount. Part of the deal was that MIT got to use the computer for 8 hours per day, the other universities in the northeastern United States for 8 hours, and IBM for the remaining 8 hours. Part of that IBM time-slot was dedicated to computing handicaps for yacht races; as the president of IBM was rather fond of yacht racing in Long Island sound. "There was a special job deck kept at the MIT Computation Center, and if a request came in to run it, operators were to stop whatever was running on the machine and do the yacht handicapping job immediately." [1]
The 7094 ran a language called BCPL, which was a rather unique language. It was designed so that small and efficient compilers could be written for it: some compilers were as small as to require only 16 KB of memory. Because of the stringent computing time-allocations, programmers optimized compilers so that run-time was minimal. One of these optimizations was to make arrays 0-based. Because if your program wasn't done and the president of IBM felt like yachting, you were out of luck!
Technologist, Business Leader, Problem Solver, Optimizer
5 å¹´Ah, so many conventions in computer science have roots that hearken back to early IBM creations. Interesting, then, that arrays (tables, actually) in COBOL used 1-based subscripts...
Compassionate Leadership driving inclusiveness, innovation and best practices
5 å¹´Randy Brown
software consultant specializing in b2b e-commerce solutions | venture builder
5 å¹´Good story, Saleem Siddiqui?- and a great reminder to look beyond the seemingly-obvious for the truth.