Scholars often lament the knowledge that might have been preserved if the great Library of Alexandria had been better protected against the ravages of time and marauding armies.
While the University of Virginia’s own scholarly record is markedly safer from ransacking Romans, by 2011, its digital collection faced a similar threat of steady extinction by way of technological obsolescence.
APTrust – more formally, the Academic Preservation Trust – is a massive, UVA-led initiative meant to remove that threat. Scholars have been creating digital-only materials on a regular basis since the early 1990s, but with every new advance in technology and huge growth in the volume of digital content, these materials are in danger of disappearing or becoming obsolete and inaccessible.
Over the last five years, APTrust has been building a large-scale solution that preserves digital scholarship by storing it across multiple technologies and physical locations.
“The idea for APTrust grew out of conversations between [former UVA Vice President and Chief Information Officer] James Hilton, [former University Librarian and Dean of Libraries] Karin Wittenborg and myself,” said Martha Sites, the current University Librarian, Dean of Libraries and executive lead of APTrust. “James was pursuing a national effort to look at what we call, ‘deep dark preservation.’”
“Deep dark preservation” refers to the multiple layers of protection needed to effectively archive a digital file and the technology it runs on for future use.
Together with Sites, Hilton and Wittenborg helped grow APTrust into a consortium of 17 like-minded universities committed to the creation and management of a digital preservation repository.
APTrust’s primary goal is to package and preserve information in a way that makes it accessible to future generations. Today, any records that UVA or its partner institutions wish to preserve are sent to library specialists who carefully label them with a range of searchable data markers.
“Figuring out how to identify a piece of scholarship so that others can find it later is a significant challenge that involves specific ways of describing it. In old library days, we used the term ‘cataloging.’ We call it ‘metadata’ now,” Program Director Chip German said.
After digital material has been properly described, the next challenge is to preserve it in a way that will remain readable in the years to come.
“Because of this, APTrust is really a phased thing,” German said. “We are working first on the things that we can absolutely address quickly; currently we’re addressing the preservation of the original digital material and the description of that material. The next problem we’re working really hard on is how to ensure that future researchers will have the tools to understand and interact with those digital materials.”
This means that APTrust’s next phase of work will include finding better ways to preserve the software that makes such digital materials accessible. One of the ways they can do this is by building an “emulation environment” where old software is still able to run in a virtual landscape. For example, as MP3 files are replaced by other forms of media storage in the future, APTrust may also develop and preserve emulation environments for them that allow players like iTunes and Windows Media Player to continue operating.
To date, APTrust has already preserved more than 16 terabytes of data from all its partner institutions. Due to its rapidly growing storage-space demands, the group currently uses Amazon Web Services to store and safeguard all of its contents. Every piece of data is protected through multiple levels of redundancy.
Once a new file is properly packaged and labeled at depositing institutions such as UVA, it’s saved at two separate Amazon data centers, one in Virginia and one in Oregon. Inside each center, a copy of the data is stored inside three separate “availability zones.” These zones have independent power supplies, environmental controls and network connections, so if one is disrupted, the others will remain unharmed. Additionally, the Virginia and Oregon centers use different technologies to store their data, so APTrust files are secured across multiple platforms.
“Bicoastal storage protects the data from being wiped out all at once by things like natural disasters or even terrorist attacks,” Sites said. “Storage across multiple platforms helps protect against the failure of future and modern technologies.”
After considering many similar services and the possibility of creating their own cloud service, APTrust chose to use Amazon Web Services at present because it was the most cost-effective and flexible option.
“Part of the APTrust design philosophy is that we want it to be portable,” Sites said. “If we have to use something other than Amazon, our technologies are quickly capable of migrating to some other service or local data center.”
This shared preservation-storage space ensures access by future researchers to scholarship from all 17 institutions. The result is an extensive online archive built from many different perspectives.
“Each of these institutions has a different priority and different types of content to contribute,” Sites said. “That, in and of itself, is valuable. It creates a diverse collective of content here that can be the most representative of what valuable scholarship is and what is valuable for academic pursuits in the future.”