Librarians build Web archives on swine flu, wildfires
Date: 2009-11-17
Contact: Dolores Davies
Phone: (858) 534-0667
Email: ddavies@ucsd.edu

UC San Diego librarians have been collaborating with a team of UC and other librarians to build a series of Web archives on critical subjects such as the swine flu epidemic and the devastating California wildfires of 2007.  Other topics covered in the Web archives include the Guatánamo Bay records, the Myanmar cyclone of 2008, the California recall election of 2003 and the State of California and San Diego County local government sites.

The archives were built with a new Web archiving service (WAS) developed by the University of California's California Digital Library (CDL), which has enabled UC San Diego and other university librarians to capture, curate, and preserve Web sites for the benefit of researchers and the general public. New archives are continually being developed and will be accessible to the public along with the current archives.  The service allows scholars and other users to both access the archives and search and analyze the contents in ways they could not do on the live Web. To date, UC San Diego and UC Librarians, along with other university librarians, have produced 21 Web archives, which include approximately 1,020 Web sites, nearly 68 million documents and 4.2 terabytes of data.

"While the Internet has both revolutionized our access to information and greatly expanded the amount of information we have access to," said Annelise Sklar of the UC San Diego Libraries, "Web sites routinely change, move or disappear with little or no notice. This means that important information is at risk unless we take steps to preserve it."                                 

According to Sklar, who participated in the UC and UC San Diego libraries' efforts to build the 2007 California wildfires archive and currently is building the swine flu epidemic archive, the ephemeral nature of the Web and the staggering amount of data and images that reside on it poses serious challenges to scholars as well as consumers trying to conduct serious research. While books and other printed works are not as instantly accessible as the Web, they are recorded works that can be handed down to future generations and generally are not in danger of disappearing unless misplaced or damaged. Web sites and other digital information are changed and updated constantly, with the average life span of a Web site estimated to be less than three months. Changing file and hardware formats also render many digital documents obsolete in less than four years.

In the past, a political science professor studying a political campaign or a series of elections might collect election direct mail pieces and other printed materials as well as consult a variety of library resources. With a great deal of political campaign activity migrating to the Web, efforts to capture these materials and their contents on a timely basis is extremely challenging. The Obama presidential campaign, for example, was conducted primarily online and will go down in history as the first national presidential campaign to fully utilize e-mail communications as well as social network sites like Facebook and Twitter to connect with younger audiences. 

State and local Web publications are particularly at risk, said Sklar.  In many cases, these documents are no longer available in print, and libraries are challenged to continue their historic role as cultural memory institutions in the digital environment.

"Tools like the WAS give us the ability to preserve important elements of our cultural history," said Sklar.  "When important historical events such as Hurricane Katrina or 9/11 take place, we can see public reactions unfold via blogs, personal Web sites, and other internet outlets, giving us a very valuable window into popular culture.  All of these materials will serve as valuable resources for scholars and researchers for years to come."

The California wildfire archives of 2007 document the most devastating fire season in California's history. According to Sklar, the site includes various state of California agency sites and federal government sites, as well as numerous news, blogs, and social networking sites. The archive contains 161 sites that can be browsed or searched easily.

Sklar began building the swine flu epidemic archive last spring and expects the archive to be available to scholars and members of the public by February 2010. The site is a collection of government, news, scientific, and cultural Web sites relating to the 2009 H1N1 swine flu epidemic. To date, the archive includes 59 sites.

The California Digital Library's efforts to develop the Web archiving service have been supported by a grant from the Library of Congress' National Digital Information Infrastructure and Preservation Program. The grant was awarded to CDL and its partners at the New York University Libraries and the University of North Texas Library to provide librarians and archivists with the tools to capture, curate, and preserve Web publications. 

According to Tracy Seneca, Web archiving service manager for CDL, the archives will provide lasting access to the publications of the state of California at the state and local level, as well as access to a rich array of topics of value to researchers.

"Searching the archives not only provides a snapshot of each website in time, but also allows researchers to explore those resources in ways they could not do on the live Web," said Seneca. "The future holds interesting possibilities for Web archives as new tools become available to allow large-scale data analysis on captured Web content."

In addition to providing librarians with a WAS toolkit to build the Web archives, the California Digital Library plays an active role in the development of the Web archiving standards and tools that make Web archiving possible. The Web archiving service used to create and deliver the archives relies on a number of open source tools developed by the Internet archive with the support of the International Internet Preservation Consortium.

To view the Web archives visit http://webarchives.cdlib.org.

The UC San Diego Libraries, ranked among the top 25 public academic research libraries in the nation, play an integral role in advancing and supporting the university's research, teaching, patient care, and public service missions. The nine libraries that comprise the UC San Diego Library system provide access to more than 7 million digital and print volumes, journals, and multimedia materials to meet the knowledge demands of scholars, students, and members of the public. Each day, more than 7,300 people stream through one of the university's nine libraries. The libraries' vast resources and services are accessed more than 87,500 times each day via the UC San Diego Libraries' Web site.