DataONE to deal with data deluge
Date: 2009-11-18
Contact: Patricia Cruse
Phone: (510) 987-9016
Email: patricia.cruse@ucop.edu

Researchers at the University of California have partnered with dozens of other universities and agencies to create DataONE, a global data access and preservation network for Earth and environmental scientists that will support breakthroughs in environmental research. DataONE (Data Observation Network for Earth) is one of two $20 million awards made this year as part of the National Science Foundation's (NSF) DataNet program. The collaboration of universities and government agencies coalesced to address the mounting need for organizing and serving up vast amounts of highly diverse and inter-related but often incompatible scientific data. Resulting studies will range from research that illuminates fundamental environmental processes to identifying environmental problems and potential solutions.

The National Center for Ecological Analysis and Synthesis (NCEAS) at UC Santa Barbara, the department of computer science and genome center at UC Davis and the California Digital Library at the UC Office of the President are integrally involved in the NSF DataONE initiative. Across these UC partners, the several million dollar award will drive advanced research and data acquisition, storage, mining, integration and visualization for DataONE. The resulting computing and processing "cyberinfrastructure" will be made permanently available for use by the broader UC community and international science communities. DataONE is led by the University of New Mexico and includes additional partner organizations across the United States as well as from Europe, Africa, South America, Asia and Australia.

"Scientists have spent hundreds of years collecting environmental data — measuring temperature, counting fish and butterflies," says Stephanie Hampton, deputy director of NCEAS. "We already know quite a lot, when you estimate the volume of scientific data that must exist out there, but the challenge is to find those data sets and then put them together in a manner that helps to address the important questions for science and society. DataONE will be that portal for environmental data."

The DataONE team will study how a vast digital data network can provide secure and permanent access into the future, and also encourage scientists to share their data. The team will help determine data and data citation standards, as well as create the tools for organizing, managing, and publishing data.

As one of five DataNet collaborations envisioned by the NSF, DataONE will build a set of geographically distributed Coordinating Nodes that play an important role in facilitating all of the activities of the global network. The initial three coordinating nodes will be at UC Santa Barbara (housed at the Davidson Library), the University of New Mexico and the University of Tennessee/Oak Ridge National Laboratory.

"Institutions have made extensive investments in infrastructure for managing data at their local institutions and in discipline-specific consortia, but these systems generally don't interoperate," says Matthew Jones, director of informatics research and development at NCEAS. "DataONE will provide a critically needed interoperability layer that will allow scientists from diverse domains to collaborate on pressing environmental science challenges."

Scientific data integration and management also occupies computer science researchers who develop methods and tools that support all stages of the data life cycle. "Effective annotation and integration of data, and efficient management of data lineage information are hot research topics in the database and scientific workflows communities," says Bertram Ludaescher, professor of computer science at UC Davis, whose team specializes in scientific workflow and data integration technologies, and storage and querying of data provenance.

Libraries have traditionally played a critical role in preserving and providing access to scholarly materials and recently have begun to focus on the complex challenges associated with managing scientific data. "Libraries don't have the capacity to address these challenges individually. We need to partner with researchers, information technologists, and domain specialists to address these complex problems" says Patricia Cruse, director of the UC Curation Center at the California Digital Library.

DataONE includes experts from library, computer, and environmental sciences explicitly to bridge these worlds and to develop an infrastructure to serve science for many decades to come.

About the National Center for Ecological Analysis and Synthesis
NCEAS was established in 1995. The organization has hosted more than 4,000 scientists from over 50 countries and supported more than 430 collaborative projects in ecology and related fields. NCEAS scientists develop new techniques in informatics and apply general knowledge of ecological systems to specific issues, such as the loss of biotic diversity, global change and sustainability of marine ecosystems. NCEAS is among the top 1 percent of 38,000 institutions evaluated for scientific impact in environmental research. NCEAS is funded by the National Science Foundation, the state of California, the University of California and numerous other donors. For further information contact Stephanie Hampton, deputy director, NCEAS, at hampton@nceas.ucsb.edu or (805) 892-2505; or Matt Jones, director of informatics research and development, NCEAS, at jones@nceas.ucsb.edu.

About the UC Curation Center and the California Digital Library
The UC Curation Center (UC3) of the California Digital Library (CDL) was established in 2009. UC3 is a central preservation and curation service provider addressing the systemwide needs of the 10 campuses of the University of California, one of the pre-eminent public universities of the world. The California Digital Library provides digital library development and support for the University of California libraries and the communities they serve. For further information contact Patricia Cruse, director, UC Curation Center, at patricia.cruse@ucop.edu or (510) 987-9016.

About Professor Ludaescher
Bertram Ludaescher is professor at the department of computer science and a member of the Genome Center, both at UC Davis. Work in his data & knowledge systems (DAKS) lab is focused on scientific workflow design and optimization, data provenance, knowledge representation, and data integration. He is involved in several collaborative R&D projects, including the DOE Scientific Data Management Center project (SciDAC/SDM) and NSF projects to develop workflow technology (Kepler-CORE) and cyberinfrastructure for bioinformatics and environmental observatory applications. Prof. Ludaescher received his M.S. in computer science from the University of Karlsruhe and his Ph.D. from the University of Freiburg, Germany. Until 2004 he was a research scientist at the San Diego Supercomputer Center and an adjunct faculty in the department of computer science and engineering at UC San Diego. He can be contacted at ludaesch@ucdavis.edu.