Web archives have arrived, at least in the pages of high-profile publications such as the Washington Post and the New Yorker.
An especially fascinating exchange took place in mid-February. Gareth Millward, a research fellow in the Centre for History in Public Health at the London School of Hygiene and Tropical Medicine, published “I tried to use the Internet to do historical research. It was nearly impossible” with the Washington Post. In it, he explained the difficulties of navigating extremely large web archives: search queries returned useless results, not sorted in an ideal fashion (or at all), and that instead historians may need to find smaller circumscribed corpuses or explore metadata.
The response by Andy Jackson, Web Archiving Technical Lead at the British Library, on the British Library’s Web Archive blog was equally illuminating. His piece, “Building a ‘Historical Search Engine’ is No Easy Thing,” is a must-read. He pointed out the different use cases that historians have: simply replicating Google (which excels at letting us know what we need to know in an extremely contemporary context) won’t make sense when querying large bodies of web archived material. He walks us through the various steps of the search engine, and concludes by arguing that we need to think of Macroscopes rather than of search engines (sidenote: having just finished copyedits on a co-authored book subtitled The Historian’s Macroscope, I’m inclined to agree with this metaphor!).
These two pieces join a third high-profile piece, “The Cobweb: Can the Internet be Archived?” by Harvard historian Jill Lepore. This was a fascinating exploration of the current state and recent history of web archiving, and is well worth your time.
This paper was given at the American Historical Association’s annual meeting in New York City on January 5th, 2015. It was part of the Text Analysis, Visualization, and Historical Interpretation panel. My thanks to my co-presenters and especially Micki Kaufman who organized the panel.
The text that follows may not be exactly what I said, but is based on my speaking notes with a bit of memory filling in here and there.
Hello everybody, I’d like to begin with a somewhat provocative opening:
I believe that historians are unprepared to engage with the quantity of digital sources that will fundamentally transform their trade. Web archives are going to transform the work we do for a few main reasons: Continue reading Milligan Presentation: “The Promise of WebARChive Files”
By Peter Webster and Ian Milligan
The first stage of our project, Web Archives for Historians, has concluded. In just under a year, we’ve amassed a healthy bibliography (about twenty works) that fall within the scope of our bibliography – works written by historians covering topics such as: (a) reflections on the need for web preservation, and its current state in different countries and globally as a whole; (b) how historians could, should or should not use web archives; (c) examples of actual uses of web archives as primary sources.
We’ve probably reached the ceiling on this front, however! There aren’t that many historians who are actively working in this area (yet, we dare say). And so we now want Web Archives for Historians to transition into an active blog that will:
- Aggregate content by historians or for historians on web archives (similar to the Web Archiving Roundtable) – some of this will come from our own blogs (Peter and Ian), but also from a list of blogs that we’ll be following;
- draw attention to talks and slides that we spot at scholarly conferences or in publication venues;
- carry commissioned posts (eventually).
Our mandate will be to include:
- examples of work done using web archives;
- historical method in the web archive;
- news of significant new projects, tools, data or web services;
- contemporary history using the live web (as core source material, rather than just incidentally).
We hope that you join us by following along with your RSS feed, on Twitter, or just by popping by now and again.