So You’re a Historian Who Wants to Get Started in Web Archiving

By Ian Milligan (University of Waterloo)

(Cross-posted and adapted from an earlier post I wrote for the IIPC’s blog)

The web archiving community is a great one, but it can sometimes be a bit confusing to enter. Unlike communities such as the Digital Humanities, which has developed aggregation services like DH Now, the web archiving community is a bit more dispersed. But fear not, there are a few places to visit to get a quick sense of what’s going on. Here I just want to give a quick rundown of how you can learn about web archiving on social media, from technical walkthroughs, and from blogs.

I’m sure I’m missing stuff – let us all know in the comments!

Social Media

A substantial amount of web archiving scholarship happens online. I use Twitter (I’m at @ianmilligan1), for example, as a key way to share research findings and ideas that I have as my project comes together. I usually try to hashtag them with: #webarchiving. This means that all tweets that people use “#webarchiving” with will show up in that specific timeline.

For best results, using a Twitter client like Tweetdeck, Tweetbot, or Echofon can help you keep appraised of things. There may be Facebook groups – I actually don’t use Facebook (!) so I can’t provide much guidance there.

Technical Walkthroughs

You actually need some technical skills to work with web archives. For starters, if you just want to look at web pages, you can just use the Wayback Machine. But if you’ve got a directory of ARCs or WARCs, or reach out to a librarian who gives you a dump of them, you might want to work out on some workshops or walkthroughs:

  • The warcbase wiki: I’ve been involved with Jimmy Lin, a computer scientist at the University of Waterloo, in trying to make his web archiving management and analytics platform a go-to- resource for historians. Come check it out here, and leave comments.
  • The Archive Research Services Workshop: This is another top-notch workshop that comes with sample data, great instructions, and beyond. Definitely check it out to get your web archiving on.

Many of these require knowledge of a command line, but fear not – the Programming Historian is on it!


I’m wary of listing blogs, because I will almost certainly leave some out. Please accept my apologies in advance and add your name in the comments below! But a few are on my recurring must-visit list (in addition to this one, of course!):

  • Web Archiving Roundtable: Every week, they have a “Weekly web archiving roundup.” I don’t always have time to keep completely caught up, but I visit roughly weekly and once in a while make sure to download all the linked resources. Being included here is an honour.
  • The UK Web Archive Blog: This blog is a must-have on my RSS feed, and it keeps me posted on what the UK team is doing with their web archive. They do great things, from inspiring outreach, to tools development (i.e. Shine), to researcher reflections. A lively cast of guest bloggers and regulars.
  • Web Science and Digital Libraries Research Group: If you use web archiving research tools, chances are you’ve used some stuff from the WebSciDL group! This fantastic blog has a lively group of contributors, showcasing conference reports, research findings, and beyond. Another must visit.
  • Web Archives for Historians: This blog, written by Peter Webster and myself, aims to bring together scholarship on how historians can use web archives. We have guest posts as well as cross-posts from our own sites.
  • Peter Webster’s Blog: Peter also has his own blog, which covers a diverse range of topics including web archives.
  • Ian Milligan’s Blog: It feels weird including my own blog here, but what the heck. I provide lots of technical background to my own investigations into web archives.
  • The Internet Archive Blog: Almost doesn’t need any more information! It’s actually quite a diverse blog, but a go-to place to find out about cool new collections (the million album covers for example) or datasets that are available.
  • The Signal: Digital Preservation Blog: A diverse blog that occasionally covers web archiving (you can actually find the subcategory here). Well worth reading – and citing, for that matter!
  • Kris’s Blog: Kristinn Sigurðsson runs a great technical blog here, very thought provoking and important for both those who create web archives as well as those who use them.
  • DSHR’s Blog: David Rosenthal’s blog on digital preservation has quite a bit about web archiving, and is always provocative and mind expanding.

Again, I am sure that I have missed some blogs so please accept my sincerest apologies.

In-Person Events

The best place to learn is in-person events, of course, which are often announced at places like this blog or in many of the above mediums! I hope that the IIPC blog can become a hub for these sorts of things.


I hope this is helpful for people that are starting out in this wonderful field. I’ve just provided a small slice: I hope that in the comments below people can give other suggestions which can help us all out!

About Ian Milligan

Ian Milligan is Associate Vice-President, Research Oversight and Analysis and professor of history at the University of Waterloo.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s