University Offers Help to Researchers Wrestling with Digital Data Management

July 18, 2011

July 18, 2011 — Preserving research data is nothing like it used to be.

When Charles Darwin was aboard the HMS Beagle in the 1830s, he used notebooks to record hand-written observations and measurements of the species he encountered.

In the past century, other researchers have used typewriters, carbon paper and dot matrix printers to document research results. They often left these physical records to libraries to keep safe for future generations.

Today's data management is more complicated.

"I've read that just one of the detectors at the Large Hadron Collider in Switzerland, the ATLAS detector, can generate a petabyte of data a day," said Michael McPherson, the University of Virginia's associate vice president and deputy chief information officer.

That's a million gigabytes of information, or about as much as 1,333 brand-new, top-of-the-line Apple MacBook Pro laptops can hold in their collective hard drives. Each day.

Managing, analyzing and preserving that data is the most extreme example of a modern data management problem, McPherson said, but it's the sort of issue all research institutions now face to some degree.

In response, the U.Va. Library is pioneering a new resource for researchers confronting data management challenges unique to the digital age.

The University's new Scientific Data Consulting Group works with faculty and graduate student researchers across disciplines to assess their data management needs, then creates and implements plans to store and preserve that data.

"Our single mission here is to improve the way research data is managed at the University," said Andrew Sallans, the library's head of strategic data initiatives, who heads the group.

Research data haphazardly saved on a hard drive – or worse, a disk stored in a desk drawer – might be recoverable now, but there's no guarantee it will be decades down the line, Sallans said.

"If you think about data, there's a whole life cycle to it," Sallans said. "You begin and collect data, and then you have to go in and process it and manage it, and then you analyze and publish results, and then ideally you archive it. That's as true for digital data as it is for other forms."

An important part of the group's work is helping researchers comply with new data-management standards required for some grant applications, McPherson said.

"Increasingly funding agencies, especially federal funding agencies, require that every grant proposal submitted have a data management plan, a plan that talks about what kind of data you're going to collect, what kinds of sensitivity those data have and how you plan to manage those data to make sure they don't get lost, so the results of federally funded research don't disappear because you have a disk crash," he said.

The National Institutes of Health and the National Science Foundation have already adopted data management requirements, though those requirements are somewhat open-ended, Sallans said. The standards are still evolving, and the idea behind the consulting group is to provide a single point of expertise for University researchers facing these issues, he said.

Researchers begin by meeting with the group to do an initial interview.

"We spend about an hour sitting down with a researcher or whoever is responsible for data in that group, and we ask them a series of questions that help us get at what sort of data they have and what to do with it," Sallans said.

The group helps researchers plan for issues ranging from file formats and sizes to questions about who is responsible for the data and the best way to preserve it, Sallans said.

"It's a two-sided thing," he said. "It's for us to understand what they are doing and identify the best ways to provide some direct help for them, and it also helps them to be more aware of these issues."

Next, the group can help researchers put together data management plans that both comply with grant proposal requirements and lay groundwork for managing the data in a methodical and efficient way, Sallans said.

Michael Leon Tuite, a doctoral student in the Graduate School of Arts & Sciences' Department of Environmental Sciences, knew of the consulting group from his previous career as a library employee and made use of its services to put together the data management portion of his NSF proposal for post-doctoral research.

"There's almost no guideline whatsoever from NSF or from my professional organizations about how to do this," Tuite said. "The template that the consulting group had prepared, especially the format that it took – a series of questions – was perfect. It was an ideal solution to help me structure what it was I wanted to put together."

The consulting group can also help researchers implement their plans and deal with questions along the way. In the future, Sallans foresees large searchable digital repositories that could contain all of the data collected in a given field by different researchers at different institutions.

Though digital data preservation techniques are long-established in some fields, like environmental sciences and astronomy, the practice is expanding in the social sciences and humanities, and Sallans said he expects greater demand from those disciplines in the future.

"Science and engineering is just at the forefront," he said. "But everything is data, even if it isn't coming from a scientific instrument in the way it would be in some sciences or engineering."

The Scientific Data Consulting Group also is involved in the creation of DMPTool, a new online tool that helps institutions create their own data management plans.

Beyond the goal of ensuring researchers are able to comply with federal funding requirements, the consulting group helps the University fulfill an ethical requirement, McPherson said.

"Even if we didn't have compliance requirements set by the federal government, the right thing to do would be to assist faculty members and graduate students with dealing with these data they are collecting," he said. "More and more disciplines have become, and are becoming, data-intensive."

University researchers interested in contacting the Scientific Data Consulting Group can email scidac@virginia.edu.

– by Rob Seal

Media Contact

Rob Seal

Director of Media Management and Managing Editor, UVA Today Office of University Communications