The Internet Archive and the socio-technical construction of historical facts

Anat Ben-David, Adam Amram

Research output: Contribution to journalArticlepeer-review


This article analyses the socio-technical epistemic processes behind the construction of historical facts by the Internet Archive Wayback Machine (IAWM). Grounded in theoretical debates in Science and Technology Studies about digital and algorithmic platforms as “black boxes”, this article uses provenance information and other data traces provided by the IAWM to uncover specific epistemic processes embedded at its back-end, through a case study on the archiving of the North Korean web. In 2016, an error in the configuration of one of North Korea's name servers revealed that it contains 28 websites. However, the IAWM has snapshots of the majority of websites, which have been archived from as early as 2010. How did the IAWM accumulate knowledge about websites that are generally hidden to the world? Through our findings we argue that historical knowledge on the IAWM is generated by an entangled and iterative system comprised of proactive human contributions, routinely operated crawls and a reification of external, crowd-sourced knowledge devices. These turn the IAWM into a repository whose knowing of the past is potentially surplus–harbouring information which was unknown to each of the contributing actors at the time and place of archiving.

Original languageEnglish
Pages (from-to)179-201
Number of pages23
JournalInternet Histories
Issue number1-2
StatePublished - 3 Apr 2018

Bibliographical note

Publisher Copyright:
© 2018, © 2018 Informa UK Limited, trading as Taylor & Francis Group.


  • Internet Archive
  • North Korea
  • Wayback Machine
  • appraisal
  • black box
  • censorship
  • provenance


Dive into the research topics of 'The Internet Archive and the socio-technical construction of historical facts'. Together they form a unique fingerprint.

Cite this