The Portable Document Format (PDF) file format is one of the most ubiquitous standards in the computing world. Even the most naive computer user is likely to know what a PDF is—a way of publishing documents that preserves all of the author’s formatting choices, no matter which system is used to view them. The story behind how PDFs came to be is actually longer and more interesting than you might think. It starts with the Cold War.
Vannevar Bush was born in 1890. An electrical engineer, he was instrumental in developing military applications radar (RAdio Detection And Ranging). He was a prolific inventor and futurist, and predicted everything from hypertext (as in HTML) to tablet computing (as in iPads).
In 1945, just before the beginning of the Cold War, he published a futurist article entitled As We May Think. In it, he described a device he named the memex.
Bush envisioned the memex as something like Wikipedia on an iPad—in other words, hypertext documents on a screen that read like paper. This should be starting to sound familiar already.
It’s a funny coincidence that Bush was a radar expert, because the chain of events that led to PDF started not only with the concepts from Bush’s memex, but was also deeply rooted in radar hardware that was developed at MIT, where Bush was on the board of governors, after having done much of his early work as a professor there.
By 1951 the Cold War was in full swing, and both the United States and the Soviet Union were frantically investing in war technology, with an emphasis on intercontinental ballistic missiles (ICBMs) and strategic bombers armed with nuclear warheads. As it would be another ten years before ICBMs would be practically feasible, the main focus was on the bombers, specifically identifying and intercepting the enemy’s ones. The main technology in this arms race was radar. The North American Air Defense Command (NORAD) sponsored a research project at MIT called Whirlwind, which would later develop into something called SAGE (Semi-Automatic Ground Environment).
In an era when all other computers were programmed by punching holes in paper tapes or cards, and the results were printed out on line printers that were, for all practical purposes, typewriters, having a screen and a mouse-like pointer was revolutionary. The computers that were designed for the Whirlwind project lived on for many years at MIT (as the Lincoln TX-0 and TX-2), and Ivan Sutherland used this technology when he was working on his PhD at MIT, during which time he created a program called Sketchpad.
The dawn of computer graphics
As a student at MIT, Sutherland was inspired by Bush’s memex and chose “computer drawing” as his PhD thesis. Using the Lincoln TX-2 computer, which was the transistorized version of the computers developed for Project Whirlwind (SAGE), he basically invented the form of human-computer interaction that we take for granted today by writing a program called Sketchpad. Take a look at this film of Sutherland demonstrating it—impressive, even by today’s standards!
Sutherland gained a professorship at the University of Utah, and in 1968, along with fellow professor David C. Evans, founded Evans & Sutherland, a company specializing in computer graphics. Most of the employees were current or former students, and included Jim Clark, who started Silicon Graphics, Inc., Ed Catmull, co-founder of Pixar, and Scott P. Hunter of Oracle. And also, most importantly for this story, John Warnock and Alan Kay.
While at Evans & Sutherland, Warnock worked on a three-dimensional graphics database of New York Harbor. He conceived the Design System language as a way to process and display the graphics from that database.
Kay would move on to lead Xerox’s Palo Alto Research Center (PARC), where they were busy inventing the laser printer and looking for a way to send text and images from the computer screen to the printer.
Kay brought Warnock onto the Xerox team and, along with other colleagues, they built upon Design System to create the Interpress page description language to drive the Xerox laser printer.
Two guys in a garage
Because they could not convince management at Xerox to commercialize Interpress, Warnock and Charles Geschke left the company, and in December 1982, formed Adobe in order to develop and sell the PostScript page description language.
Warnock initially set out to build a networked printer that ran on PostScript, but Steve Jobs caught wind of this and was able to convince Warnock to instead license PostScript and allow Apple to manufacture the printer. Then a programmer named Paul Brainerd heard about that collaboration and created a desktop publishing program called Aldus PageMaker (an electronic typesetting and page layout program that knew how to “speak PostScript”) just in time for the release of the LaserWriter in 1985.
PageMaker and its “desktop publishing” branding completely assured the success of the new LaserWriter and established PostScript as the de facto standard for electronic typesetting and page layout in one fell swoop.
Adobe made a lot of money from licensing PostScript (enough to buy Aldus for almost $450m only nine years later), and used that money to develop more products, notably Illustrator and Photoshop, which are now de facto standards in graphic design.
The Camelot Project
In 1991, Warnock wrote a paper titled The Camelot Project. In it, he describes expanding the market for PostScript, which at the time, required a fairly high-powered processor to interpret and render it. Usually it needed a dedicated high-powered processor that was part of the printer. Warnock wanted to explore ways of making interpreting and rendering possible on the relatively underpowered personal computers of the time. His vision was to create a universal graphical page renderer (which was revolutionary for a time in which each printer manufacturer used a different method for sending from screen to printer, and frequently they could not render text and graphics at the same time). In the paper, Warnock wrote:
“There are at least two technical approaches to the Camelot project. Both solutions depend on the PostScript technology. One approach is to try to make Display PostScript and PostScript implementations smaller and faster so that they can run on the vast majority of today’s machines. This approach has been tried and is extremely difficult.
“A second approach is to divide the problem into smaller problems. This approach would allow each piece to run independently on the smaller machines while achieving acceptable performance and a solution for the complete problem. This latter approach requires that the problem be divided in a way that is natural for users, and provides a solution for every user. An approach to the Camelot project will now be described that will divide the problem into smaller pieces. This solution depends on a unique property of the PostScript language.”
He then goes on to describe a programming concept he refers to as “rebinding,” by which he meant exploiting the ability of PostScript to redefine operators in such a way that they generate intermediate renderings (which he called Interchange PostScript, or IPS) that do not require a full PostScript interpreter in order to complete the rendering.
To be more specific, rebinding redefines functions so that when a PostScript file is fed to an interpreter, instead of producing a pixel rendering for a specific device at a specific resolution (very processor intensive), it produces another PostScript file (the IPS) that uses a very reduced set of operators (higher-level functions are compiled to statements using more basic operators, the same way a language like C is compiled to simpler machine instructions).
What this does is split the pixel-rendering process into two, first compiling complex PostScript to simple PostScript (IPS), and then, in a separate step, rendering the IPS to pixels. The first step can use a special-purpose interpreter/compiler that is “substantially simpler, and smaller than full PostScript interpreters.” The second step uses the full PostScript interpreter, but having been preprocessed it places a much lighter load on the processor, making it possible to run the full PostScript interpreter to render on screen on smaller, lower-powered personal computers.
What Warnock calls IPS in the paper was renamed PDF (Portable Document Format) for its official release in June 1993. So, let’s take a look at what the computing world looked like in 1993:
- “Personal” computers were common in business environments, but uncommon at home. Only 1 out of 5 US households (99,000) had a personal computer.
- Most of those computers were running Microsoft Windows.
- Fewer than half of those computers had Internet access.
- For Windows and Mac users, Internet access meant email, FTP, and newsgroups. There were only 130 websites, and the only web browsers available at that time were for Unix computers.
When Adobe Acrobat (including Acrobat Reader) version 1 was released that year, both the authoring tool and the reader were commercial software with a cost of $50 for the reader, and available only for the MS-DOS operating system. Adoption was mostly in larger businesses, where MS-DOS was still to be found in large numbers, and where the license costs were not an obstacle.
It was not until a year later that version 2 became available on Macintosh and Windows, and the Reader was made free, and the first web browser for Mac and Windows appeared. And even a year after that, the Pew Research Center reported in 1995 that “14% of US adults (had) Internet access” and that “42% of US adults had never heard of the Internet.” This meant that most people would not think of going to a website to download a software application for free. Software was something you bought at a store, on a CR-ROM disk.
Therefore, initial adoption was among large organizations, who used the software for the integrity it provided: Documents could not be easily altered once created, making it attractive for official electronic communications. One of those organizations was the US Internal Revenue Service (IRS), which started to use it internally and to distribute tax forms in 1994.
In 1996 PDF gained a further boost in popularity when Adobe added a number of features critical to the printing industry. Between 1996 and 1999, PDF began to gain acceptance in the printing industry as a tool for proofing and production of printed material.
By 1999 the IRS was starting to use PDF to distribute fillable tax forms, which made sense, considering that US household computer ownership was now more than 50% and Internet access was more than 40%. The world was beginning to be the kind of place where PDF could become popular. It had hit the big time.
Most popular software languages take between 10 and 15 years before they achieve widespread popularity, and PDF was right on schedule. In 2008, 15 years after its introduction, PDF moved from being a de facto standard to an official standard: Adobe released PDF as an open standard with the International Organization for Standardization as ISO 32000-1:2008.
This resulted in an explosion of PDF usage. Now, any software application could offer PDF export, and many 3rd-party alternatives to Acrobat Reader became available.
Most people think of PDF as a file format and not a programming language, but it is one. The only reason why it is not Turing complete is by design—there is no recursion or unbounded loops, these constructs having been made deterministic (pre-compiled down to bounded statements) during the creation of the PDF.
Who knows what the future holds and what creative uses for this powerful programming language are yet to be discovered?
This article is part of Behind the Code, the media for developers, by developers. Discover more articles and videos by visiting Behind the Code!
Want to contribute? Get published!
Follow us on Twitter to stay tuned!
Illustration by Victoria Roussel