About the Web Archives

The Web at Risk

The Web has revolutionized our access to information. Documents and publications that were once difficult to find are now readily available to anyone with an Internet connection. Federal, state and local government agencies and non-profit organizations now have an inexpensive means for distributing information to the public. When important historical events such as Hurricane Katrina or 9/11 take place, we can see the popular reaction unfold via blogs and personal web sites, and have an unprecedented view into popular culture. All of these materials will serve as valuable resources for researchers for years to come.

But ready access to these publications cannot be taken for granted. Web pages and documents are as easy to change or remove as they are to publish. When sites are redesigned, when new administrations take office, when policies or organizations change, we witness the wholesale disappearance of information. State and local web publications are particularly at risk. In many cases, these documents are no longer available in print, and libraries are challenged to continue their historic role as cultural memory institutions in the digital environment.

As scholars increasingly rely on web citations, it becomes difficult or impossible to verify a scholar's sources. Studies of web citations are showing that up to half of the citations in scholarly journal articles can cease to function within four years. Even if a web citation still returns a page, there is no guarantee that you are looking at the same content the author cited. Furthermore, web content faces the same risks as other digital publications as file formats evolve and change.

In 2005, The National Digital Information Infrastructure and Preservation Program awarded a grant to the California Digital Library and its partners at New York University Libraries and the University of North Texas Library to provide librarians and archivists with the tools to capture, curate and preserve web publications. One result of that grant is the Web Archiving Service, which produced the archives available here. Curators at University of California Libraries, Stanford University Libraries and New York University Libraries along with a growing number of institutions have used these tools to save web publications for researchers.


The Value and Potential of Archives

These archives will provide lasting access to the publications of the State of California at the state and local level, as well as access to a rich array of topics of value to researchers. Searching the archives not only provides a snapshot of each website in time, but also allows researchers to explore those resources in ways they could not do on the live web.

The future holds interesting possibilities for web archives as new tools become available to allow large-scale data analysis on captured web content.


The Tools: The CDL Web Archiving Service

The California Digital Library plays an active role in the development of the web archiving standards and tools that make web archiving possible. The Web Archiving Service, used to create and deliver these archives, was developed at the California Digital Library, and relies on a number of open source tools developed by the Internet Archive with the support of the International Internet Preservation Consortium.

Further information and video demonstrations of the curatorial tools are available for those interested in using the service.

The CDL staff involved in Web Archiving Service Development are:

  • Stephen Abrams, Senior Manager for Digital Preservation Technology
  • Trisha Cruse, Director of Digital Preservation
  • Scott Fisher, Web Archiving Programmer
  • Erik Hetzner, Web Archiving Programmer
  • John Kunze, Preservation Technologies Architect
  • Margaret Low, Senior Development Programmer
  • David Loy, Senior Development Programmer
  • Mark Reyes, Digital Preservation Programmer
  • Tracy Seneca, Web Archiving Service Manager
  • Perry Willett, Digital Preservation Services Manager

Additional input and review was provided by CDL's curatorial partners at:

  • The University of California Santa Barbara
  • The University of California Berkeley
  • The University of California Davis
  • The University of California Santa Cruz
  • The University of California San Diego
  • The University of California San Francisco
  • The University of California Los Angeles
  • The University of California Riverside
  • The University of California Irvine
  • The University of North Texas
  • Stanford University
  • New York University

Support for the development of this service was provided by the National Digital Information Infrastructure and Preservation Program and the University of California.

Quick Stats

59 public archives
22 partners
5550 web sites
701,283,897 documents
36.6 TB of data


The archives were built with the Web Archiving Service from the California Digital Library

Powered by the Web Archiving Service from the California Digital Library
Materials in this web archive are archived copies for private study, scholarship and research.
Copyright © 2007-2013 The Regents of The University of California.