The Historian of the Web : Crawler, Browser or Lurker?

[A special guest post by Valérie Schafer and Francesca Musiani.]

“They program us, we re-program them. They segment us, we move around. They accelerate, we linger. We can always be smarter than our machines.”
[Louise Merzeau, « Le Flâneur Impatient », Médium, Rythmes, 
n°41, 2014/4, p. 20-29]

In his blog post of January 22, 2015, “The Promise of WebARChive Files”, Ian Milligan noted:

Not only does the Agence nationale de la recherche project Web90 take this idea seriously – but its opposite, as well: “You can’t do justice to the World Wide Web if you do not consider the 1990s”, we argue.

Building on this core idea, the project aims at providing elements of reflection about the context of Web development in the Nineties (e.g. tariffs, strategies and offers put forward by ISPs, the birth of web design, the emergence of e-commerce and personal pages, the transition from the Minitel to the Web, the notorious legal controversies that ISPs and hosting services needed to face, or national State-driven policies). Also, Web90 wishes to map the “French” Web (defined as the .fr domain, even if, we are aware, this does not account for all French websites on its own, let alone French Web browsing patterns), and to reconstruct Web browsing users’ experience in light of such factors as the emergence of “graphic scenarios” that emerged as a result of the evolution of interfaces. These issues call for the simultaneous adoption of several methods, very different but nonetheless complementary, and not equally suited to provide answers to all questions.

Web archives as big data…
This was the title of the conference organised in December 2014 by the Big UK Domain Data for the Arts and Humanities project. The information ‘deluge’ may appear less threatening for the scholar of the Web of the Nineties, despite an important growth, in the second half of the decade, of the number of domain names and hosts. However, what was already an abundance needs to be managed, as do the missing pieces – images or sites that were not preserved, or very fleetingly or superficially so. The Digital Humanities and their tools will prove useful to face the massive amounts of Web data, provided that historians are ready to enter the “black box” of tools and instruments, as Ian Milligan showed, and also the “black box” of Web archiving, as Axis 3 of the Web90 project shows. Indeed, beyond the understanding of tools there is the nature of collecting procedures, its periodicity and its actors to engage with as well as the representations underpinning the constitution of archives.

Black boxes …
Let us take two examples. The first is developed in the article “Quand la communication devient patrimoine…” [When Communication Becomes Heritage], co-written by Camille Paloque-Berges and Valérie Schafer, forthcoming in Hermès. The article addresses, amongst other things, the vision of Web and digital heritage that informs the actions and strategies of the Archive Team. In stating, on its home page, that “History is our future… And we’ve been trashing our history”, and describing itself as “a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage”, Archive Team clearly states its interest in “ordinary” forms of communication that are, nonetheless, already qualified as digital heritage.

Mélanie Dulong de Rosnay, member of the Web90 team, has shown in Réseaux de production collaborative de connaissances, building on Elinor Ostrom’s theories, how the notion of common good is today “materialized” a core feature of peer production networks. We can probably observe, in the communitarian implementation of this preservation infrastructure, an alternative management of informational common goods as heritage, as well as a movement of re-appropriation by users.

Born-digital heritage, as well as digital heritage, as we have recently noted in our Call for Papers for the RESET journal, “ […] call for empirical investigations on both its publics – existing or expected/envisaged – and its promoters, producers, preservers. […] The controversies that these policies raise (e.g. those that concern the ‘right to be forgotten’ and the right to memory), as well as the interactions of public authorities with preservation institutions (or among these institutions themselves), are interesting to analyze for the light they shed on the socio-technical and political dimensions of ‘digital heritage’, as it becomes institutionalized. The practices and procedures contributing to the shaping and the legitimization of digital heritage entail a number of choices, trials, tests, intertwined ‘scales of action’, and a social ‘work’ undertaken by a variety of actors, including professional associations, amateurs, the public at large, libraries, museums, research groups volunteering to be in charge of specific archiving tasks or initiating preservation policies, international institutions or clusters of entities such as UNESCO or the International Internet Preservation Consortium”.

The second example is drawn from Louise Merzeau’s work. On the occasion of the general assembly of the IIPC, in May 2014, she showed the link between archiving models, epistemological models and research models, and the retro-active feedback between these elements. As such, entering the black box seems crucial; however, historians will need to avoid several obstacles.

… and their temptations
Trying to transform historians into computer scientists is, in our opinion, an idea as risky as its contrary, i.e., internalist and machine-focused approaches that informed the early days of computing history written by practitioners. However, not to improve historians’ digital literacy would be just as disastrous.

Similarly, it seems very important to us not to mingle different roles to the point of confusion: if historians and their colleagues from other social and human sciences are not computer scientists, they are not archivists either. Institutional mediation (while frustrating at times, as it implies choices and gaps) guarantees sustainability and accessibility: a long-term vision that researchers’ archiving practices can only partially satisfy, unless access to data, and their deposit is completely re-thought, in history and other social and human sciences – not unlike what other communities have previously done (e.g. GenBank).

The naïve way of data-driven science, which has led some to believe that we are assisting in the “end of theory”, should also be avoided. The related risks have been underlined in other disciplines, for example by Bruno Strasser. As Antoine Prost remarked in his Twelve Lessons on History, there is no document without an underlying question. The questions asked by historians turn the traces left by the past into sources and documents. Big data and computational methods are not always, or not entirely, appropriate to provide answers and more so, to formulate questions.

Web explorers… “Small is beautiful”!
In the Web90 project, so as to account for the set of conditions that have shaped the Web experience of Internet users in the Nineties – especially their browsing habits – we have chosen to study on one hand the general framework and the power of ‘massification’. But on the other hand, and in parallel, we wish to ‘stay close to the archive’, and eventually to follow paths previously traced by others (e.g. directories or closed spaces such as Infonie), open doors such as ISP portals or the Yahoo! Directory, or those sites that were recommended by the press or by guidebooks, such as the 1998 Guide du Routard de l’Internet.) Of course, in this domain, Web archives are precious assets, but the word “browser” takes on here its full original etymological sense: to encompass our mobilization of printed sources (press archives, State-driven reports, guidebooks for the general public) but also audio-visual archives, oral testimonies, or newsgroups. These sources invite historians to become lurkers around exchanges past. But very often, they soon need to emerge from this “passive” status…

A Usenet newsgroups research focused on female presence on the Web of the Nineties (carried out for this conference) has allowed us to retrace a post by a “Guillermito El Loco”, followed by twenty-five other posts, on the subject: “The Web is by far too masculine. What should we do?” With the objective of encouraging feminine presence and visibility on the Web, he proposes to list “girl Web pages” authored by French women. Two days later, as he has contacted the women he wishes to list on his site, reactions are quite nuanced and mixed: “For now, I had around thirty responses, four or five of them negative, sometimes with fairly violent reactions… it might actually disgust you from being a feminist!” The site, the link of which is mentioned in the post, has luckily been preserved by the Internet Archive, and opens a door towards “feminine Web pages” and the profiles of their authors. Guillermito has compiled an alphabetical database of these profiles and collected the links, a fairly good proportion of which are active in the Wayback Machine. There is, we believe, no need to argue further for the potential of such a corpus – peculiar, modest but unique – for our subject of study.

“The historian of tomorrow will be a programmer, or will be no more”, stated historian Emmanuel Le Roy Ladurie in 1973. Soon after, he left for the Occitan village Montaillou, of which he recomposed the day-to-day history, while distancing himself from measures and statistics… Do we, can we, see in this anecdote – as argued by Valérie Schafer and Benjamin G. Thierry in a forthcoming article – a harbinger for the use of Web archives? The promise of digital archives and tools leaves the door wide open for historians as regards methodology and its plurality. We do not wish to assume anything about which approaches will ultimately be privileged. However, between quantitative and qualitative, subjectivism and scientific requirements, sampling or claims of comprehensiveness, will we witness ancient quarrels “reloaded”? Or, on the contrary, will the legacy of historiography allow us to move beyond these dichotomies… and beyond binary oppositions?

About peterwebster

Historian of twentieth century Britain; interested in religious history; digital history, scholarly publishing, web archives. In the fediverse at

2 thoughts on “The Historian of the Web : Crawler, Browser or Lurker?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s