Web archives have arrived, at least in the pages of high-profile publications such as the Washington Post and the New Yorker.
An especially fascinating exchange took place in mid-February. Gareth Millward, a research fellow in the Centre for History in Public Health at the London School of Hygiene and Tropical Medicine, published “I tried to use the Internet to do historical research. It was nearly impossible” with the Washington Post. In it, he explained the difficulties of navigating extremely large web archives: search queries returned useless results, not sorted in an ideal fashion (or at all), and that instead historians may need to find smaller circumscribed corpuses or explore metadata.
The response by Andy Jackson, Web Archiving Technical Lead at the British Library, on the British Library’s Web Archive blog was equally illuminating. His piece, “Building a ‘Historical Search Engine’ is No Easy Thing,” is a must-read. He pointed out the different use cases that historians have: simply replicating Google (which excels at letting us know what we need to know in an extremely contemporary context) won’t make sense when querying large bodies of web archived material. He walks us through the various steps of the search engine, and concludes by arguing that we need to think of Macroscopes rather than of search engines (sidenote: having just finished copyedits on a co-authored book subtitled The Historian’s Macroscope, I’m inclined to agree with this metaphor!).
These two pieces join a third high-profile piece, “The Cobweb: Can the Internet be Archived?” by Harvard historian Jill Lepore. This was a fascinating exploration of the current state and recent history of web archiving, and is well worth your time.