The digital library’s main building in San Francisco. Phillip Bond / Alamy

Editor’s note: For a 30pin feature on preserving CD-ROM publications, we’ve talked with Jason Scott, an archivist at the Internet Archive. During a lengthy call, he shared his views on preservation issues at large. Considering Scott’s expertise, we’ve decided to additionally publish his remarks on an op-ed basis.

The thing we are more concerned about are mechanisms disappearing. The number one thing that’s very hard to get back, going back 100 years, are machines that read media. People always say, “well, there are millions of them, we’ll be fine.” Another ten years, and another ten years, and suddenly, they are disappearing.

I don’t distinguish anything anymore. I have stacks of floppies and writings and CDs, and I’m imaging video tapes right now. If I can do this right now when the technical process exists, that’s great, because the technical process won’t exist in 20 to 30 years. It would be very, very hard, and it'd have to justify itself.

If you were to rescue something from 40 years ago, you’ll have to say, “Well, it’s gonna cost us $200. Do we really need to do this?” As opposed to right now, when it’s not free, but it’s much less than $200 apiece, and you can do it. That’s where I currently stand on it.

• • •

One of the reasons we are working so hard to make everything available as quickly as possible is so that others can find where mistakes were made, so they can be grabbed again. If you make digital copies of things and then store them away for 25 years, you may find later that you made some terrible mistakes.

I have cases where I have grabbed a disc—there’s one right now—and somebody told me about the disc that I have grabbed four years ago that it doesn’t work. It works to me, but they’re like: “No, no, no, it’s missing this part and this part.” So, I have to go back to the bins, find it, and use a different method to grab it.

Those kinds of surprises will catch you a lot, which is why I don’t get rid of the originals… In general, turning it into a single digital file that lives on a hard drive works very well, except for where it really does not. And sometimes you don’t get to know until later, which is why I always insist that people keep originals.

Jason Scott is the Internet Archive archivist, a director of GET LAMP and BBS: The Documentary, and a co-founder of Archive Team.
Dennis van Zuijlekom / Flickr, CC BY-SA 2.0, cropped 

There’s a huge difference in how well different types of software are archived. Game software gets a much better treatment—although it’s not enough, which is what’s scary to me; it’s the best, and it doesn’t get enough. Then you get business software like word processors, then you get industrial software, which was written for a very small number of customers… And then, the worst one: custom software, which companies are making and setting up for one or two customers. When that company decides to get rid of it, there’s no clearing house, there’s nowhere it’s going to go to.

Each of these have a problem. Industrial software, which provides access to databases, catalogues or retrieval of historical items, is almost forgotten… Occasionally, I would get a collection from somebody who ran a business, and they’ll have a stack of word processing software, or art or utility software. I will personally image it as fast as I would’ve imaged any game—in fact, faster.

But very rarely I’d get my hands on industrial work, because it usually comes with a contract and an agreement not to distribute. And it makes the person very nervous, because they feel they’re not allowed to and that’s the company’s property.

• • •

The dream we have is being able to put in a DVD reader in the browser and have it play a DVD. It would have menus, it would have the subtitles, it’d have everything! It would have features! What a fantastic idea. We have tried it, I have tried prototyping, and in every single case, you have to take full contents of the disc and re-compress it into a proprietary package. Not proprietary like “secret,” but as its own special version, with a lot it gotten wrong.

And here’s the other thing: DVDs changed their format a lot. You’ll need the power of a VLC [a long-running media player software project which supports many existing formats. — Ed.] or a lot of DVD readers in software to be able to read them because of all of these secret, one-off issues they all had to do their research on. You cannot just write one from scratch anymore: it’s too complicated.

• • •

It’s one thing to know what you are getting. Like, you want to study this famous typewriting program, so you go to it, you find it, you run it, now you’re researching or testing it—you knew that going in. But what do you do when there’s a whole bunch of programs that you don’t know what they are, and you don’t even know where you’re looking?

Part of what we’re trying to do is making sure that the time from thinking that you want to look at something to looking at it is very short. <…> At the Archive, we have a browser that you can click on, and it will show you the contents of a disc as a clickable directory. You can link to the file within a disc image or a ZIP file. Some people use that, some people don’t, but we’re trying.

Sometimes it’s inconsistent, and when you’re dealing in tens of thousands and millions of items, the potential of getting it wrong is huge. I think it’s worth doing. If somebody is coming to an archive like this expecting to be disappointed, they will find plenty to be disappointed. But if someone comes hoping to get better and faster access to some of this history, we’ve done a pretty good job at getting a large amount of that history very accessible.

Interviewed and edited for length and clarity by Yuri Litvinenko.

The Opposite the Editorial section publishes guest columns by authors not affiliated with 30pin’s regular operations. Views expressed in this section are personal and do not have any effect on the 30pin coverage.