Internet

The Internet Archive is a digital library of all things NPR



BREWSTER CULL. Boy, it’s all digital, it’s just completely transient, whether it’s outdated formats like those floppy disks, just try to run it or a CD. Find someone with a DVD player. I mean, it’s just starting to be that stuff that’s really recent just disappears.

MANOUSH ZOMORODI, presenter.

This is Brewster Cal.

KALE: The average lifetime of a website before it is changed or deleted is one hundred days. that’s it. I think it was a cruel joke to call web pages pages because you’d think they were long, you know, the Gutenberg Bible and all that, and no.

(THE SOUND OF MUSIC)

ZOMOROD: Brewster knew this was going to be a problem, with websites disappearing, internet heads missing. He knew this way back in 1996, and that’s why he created the Internet Archive. Caper In The Castro can still be found here, we hear, and over the years the archival mission has expanded to preserve old books and movies, TV shows and music.

KALE: The idea is to try to build a library of everything – the Library of Alexandria for the digital age. We can make every book, music, video, web page, software, anything ever published by humans available to anyone curious enough to want to use it. That was the Internet’s dream, and the Internet Archive is part of making that dream come true.

ZOMOROD: What a lofty goal and a monumental effort, because how does one begin to build the web? Brewster began building something he called the Wayback Machine.

KAHLE: Yes, probably the most used and important part of the Internet Archive right now is the Wayback Machine, where we’ve collected web pages by going and basically clicking every web link on every web page every two months. So if you go to archive.org and type in a URL, we’ll show you different versions of that URL over time. We collect about a billion URLs every day, and we find that really important for journalists trying to find out what really happened. Lawyers love it because they can use it to say, hey! You said this before, but now you don’t. It is often the only record.

ZOMORODI: Yes. And you can dig up information that someone has deleted.

KALE: Yes. Take Donald Trump’s tweets. During Donald Trump’s presidency, most of the political influence on our country was through his Twitter feed, then it was taken down. So it all just kind of disappeared. So we have a copy that we made available through the Wayback Machine that makes it so we can see what it is. Or when the company goes under Geocities and just everybody’s sites disappear, there’s endless sites that disappear or different business decisions are made and people say, God, I’m glad I can take this over.

ZOMORODI: That’s right.

KALE: But we come across things like locked files, databases that you can’t access. Some of them are to make things real challenges. We work with various websites to try to make things available. The web also has parts that are obsolete, so you can no longer reproduce old sites. So there are challenges every day.

(THE SOUND OF MUSIC)

ZOMOROD: Ever worry about things getting lost in the past? I mean, I can imagine it would make you neurotic…

KALE: Oh…

ZOMORODI: …Like, oh, we missed something.

KALE: Oh, yeah. We missed Napster.

ZOMORODI: Oh, really?

KALE: So Napster was probably the best, biggest music library ever built by humans, and it shut down. We didn’t make it. And if you just take the libraries in Ukraine that are being targeted, just like the Nazis targeted the library in Belgrade, that’s a way to wipe out a culture. You go after their libraries. So yes, we worry about this all the time.

ZOMORODI: That’s right. So what are you doing? Trying to go back and fix things you missed? Maybe you have an example?

KALE: Well, at Wikipedia we’ve tried to take all the footnotes, all the citations and turn them blue into little links. So we went and tried to fix the broken Wikipedia links. We have now fixed over 15 million broken links. We prioritized the books referenced in Wikipedia and acquired those books; bought them or got them donated and we digitize them and then put them back so that if there’s a page number you can click and turn right to the right. p. We did a big project on Ukrainian Wikipedia, trying to collect all the books that were referenced and make them clickable.

ZOMOROD: So how much harder is it to collect everything that’s on the Internet now compared to, say, a decade ago or 15 years ago, simply because it’s behind paywalls or you can’t access it without entry.

KALE: Yes. So we’ve got robots going around and collecting a million URLs, and luckily there’s over a hundred people working for the Internet Archive trying to work on keeping it all alive. We don’t collect all YouTube videos. It’s just too big. But we try to collect the ones that are linked a lot or that are linked from Twitter pages, say. So we can’t collect everything, but we collect a lot.

And if we’re not collecting the right stuff, go to archive.org. There’s a “save-page-now” feature and you can paste a URL, and people do that all the time. It is used about 80 times per second. So even anyone can go and participate in making things available all the time. I just did this for my aunt’s obituary. I went to the website, made sure that the obituary of that funeral service was archived. So I did it this morning. So you can also go and participate in creating web archives.

(THE SOUND OF MUSIC)

ZOMOROD: A new challenge facing Brewster and the Internet Archive in a minute. a legal battle between them and the biggest book publishers. The question is whether e-book archiving is digital piracy or preserving the best of humanity for all. I’m Manush Zomorodi and you’re listening to the TED Radio Hour on NPR. Stay with us.

It’s the TED Radio Hour from NPR. My name is Manush Zomorodi. On the show today, for all eternity. And we were talking to Brewster Kahl, founder of the Internet Archive, a nonprofit that seeks to digitize everything we humans create, from websites to music to old movies and, of course, books.

(THE SOUND OF MUSIC)

KALE: The Library of Congress has about 28 million books. We’ve digitized maybe 6 or 7 million. We are probably physically owned by that order, so we still have a long way to go.

ZOMOROD: And it gets harder to keep up because e-books present very special problems. For example, you don’t actually own the e-book you downloaded.

KALE: So it turns out that the major publishers don’t sell e-books. They license them. So your e-book that’s on your Kindle or whatever, you don’t really have that, not in the same sense that you had a physical book. You cannot pass it on to your child. And anytime they want to change it, they can change it at any time or make it go away.

ZOMORODI: It is a matter of licensing. And Brewster and the Internet Archive started trying to get around that by buying physical copies of the books, scanning them, and making their own e-books to give them away.

CALL: So we started it in 2011. And at the beginning of the pandemic, four major publishers decided to sue the Internet Archive to say that they don’t allow you to digitize and credit.

ZOMOROD: What the archive calls equal access, those publishers say, is digital piracy.

KALE: And that lawsuit continues. We’ll probably hear the district court next year, and probably appeal, but we’ll see. The big concept that I never really imagined would work is digital ownership. When you buy a digital file, do you own it in the same sense that you own a physical item? You can’t just go and post it and give it to everyone. It is understood. OK. But can you keep it up? And what the big publishers are saying is, no, there’s never been a digital property again. So it’s the exact opposite of what we were doing with the Internet in the early days, when we were trying to democratize access, democratize creation.

ZOMOROD: I actually went back to the TED archives and watched your talk from 2007 where you presented your vision. And you knew then that there would be conflicts, even if you didn’t know what they were.

(THE VOICE OF TED LUNG)

KALE: Beyond all this, there is a political and social issue. as we go digital, will it be public or private? There are some big companies that have seen this vision and are doing large-scale digitization, but they are blocking the public domain. The question is, is that the world we really want to live in? What is the role of the public and the private as things move forward? How do we go about having a world where we both have libraries and publishing houses in the future, just like we mostly benefited from growing up? Thus, universal access to all knowledge. I think it could be one of humanity’s greatest achievements, like the man on the moon or Gutenberg or the Library of Alexandria. It may be something we are remembered for having achieved over the millennia.

I don’t think people have an idea of ​​the heroics that not only the people at the Internet Archive, but now a thousand other organizations that we work with on the web collection, about 500 libraries and book collections, how hard they are trying to build it. so that the web we accept to a certain extent works so that you can reach past versions that you can get; you used it because it’s just woven into everything.

ZOMOROD: Which brings me to one last kind of existential question, Brewster. If everything digital eventually becomes obsolete, how do you archive the archive so it doesn’t become obsolete too?

KALE: Boy, libraries, you know, they’re being destroyed all the time, and the question is how. And often governments or large powerful organizations like corporations try to destroy them. So you want more than one copy in more than one place. Then you also want to make it so that it can still be used so that it can be taken care of. Our collections are almost entirely on spinning disk, so we have to replace them every 5-10 years or they’re gone. So we need people to want it to stick around. Fortunately, there are many, many, many people and many young people who see this as a way forward.

ZOMORODI: It is not an easy road.

KALE: No.

ZOMORODI: (Laughter).

KALE: Building a library of everything is a challenge, but it starts one web page, one book at a time. And if we see ourselves saving history together, we will all make it happen.

(THE SOUND OF MUSIC)

ZOMOROD. Brewster Kale. He is the founder of the Internet Archive and you can see his full talk on ted.com. Many thanks also to CM Ralph, artist and creator of the video game Caper In The Castro, and Adrienne Shaw, Professor of Media Studies and Production at Temple University.

Copyright © 2023 NPR. All rights reserved. For more information, visit our website terms of use and permissions pages at www.npr.org.

NPR transcripts are generated on a rush basis by an NPR contractor. This text may be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the transcript.

Related Articles

Sorry, delete AdBlocks

Add Ban ads I wish to close them