One of the prime motivations behind Web Archives for Historians is our consciousness of how quickly the web changes and decays, and of the particular shape this gives to the archived web with which historians will need to work. However, just how this happens is not very well documented, and so I draw attention here to some introductory resources about the problem.
As historians, we need to know about patterns in which content disappears, but also about the rate at which it disappears. One recent paper (2012) by Hany M. SalahEldeen and Michael L. Nelson looked at how quickly resources shared on social media about particular news events had disappeared, and found that:
after the first year of publishing, nearly 11% of shared resources will be lost and after that we will continue to lose 0.02% per day.
Taking a different approach, using ten year’s worth of archived content in the UK Web Archive, the British Library’s technical lead Andy Jackson took a sampling approach to plot not only the rate of disappearance of content, but also the degree to which it had changed between 2004 and 2014. Readers may be interested in both the methods Andy used, and some important caveats about how the selection of content in the archive may have influenced the trends. But, the headline is that the fraction of content that is both still online and unchanged after those ten years is so small it hardly be seen on the graph. Even for content that was archived only a year before, the proportion that is live and unchanged is less than 10%.
In their different ways, both studies point to the same issue: that the live web changes and disappears very quickly. Historians need both to grasp how it happens, as well as to begin to think about what kind of archive this leaves us with.