University researchers delve into world of digital archiving

A pioneering service run by six universities is creating a vast, free-to-access 'digital library', writes Karlin Lillington

A pioneering service run by six universities is creating a vast, free-to-access 'digital library', writes Karlin Lillington

From monks preserving books by the Greeks and Romans to modern library collections of letters from historical figures or photographs of a famous event, archiving special collections has a long tradition and a widely recognised cultural value.

But what about a database of archaeological data? Or jpeg images? Or a scholarly website? Or even three-dimensional maps or virtual-reality artworks?

In Britain, such items find a home in the government-funded Arts and Humanities Data Service (AHDS), an organisation spread across six universities and overseen by two higher-education bodies, the Joint Information Systems Committee and the Arts and Humanities Research Council.

READ MORE

"We're basically a digital library," says AHDS communications manager Alastair Dunning, who is based at King's College London. "People are creating databases, images, movies. They give them to us, and then we have lots of collections that can be used by others."

Items are archived under five thematic areas, each managed by a different university. These are: archaeology; history; visual arts; performing arts; and literature, languages and linguistics.

The collection is eclectic and wide-ranging. For example, the history collection includes maps, census data, statistics and numerical data. Collections that have recently gone online include all of the Stormont parliamentary debates and items on the transatlantic slave trade. The latter has 25,000 records relating to the journeys of slave ships.

Arts items include 40 years of images from performances of Shakespeare's plays, thousands of images of British medieval stained glass, and scans of mediaeval manuscripts.

The researchers who were scanning the manuscripts accidentally discovered an unexpected benefit of digitising: when they started to manipulate the images with Photoshop's picture management software, they found a new way of enhancing details that had become invisible to the eye.

The archive was established out of a concern that much digital material was being lost forever. "Lots of people create things that were just getting left on their desk, left behind when they changed jobs, or forgotten about," says Dunning. To avoid this, the AHDS works with academic institutions, small cultural institutions, museums and other organisations that might have valuable or useful collections of digital data.

Materials can also be acquired through different channels. One recent collection of archaeological data came from a town council in London which had decided to sell off its archaeological division. "The data suddenly became orphaned and was sent to us," Dunning says.

"Our real focus is the long-term preservation of data. If you put a book on a library shelf and come back 10 years later, there's a very good chance it will still be there. But if you put a data CD on the same shelf and come back in a decade, there's a good chance it won't work."

People send data collections to the AHDS, which converts them to neutral formats. It transfers data from proprietary formats, such as Microsoft Word, to non-proprietary, open-source formats that are freely available, Dunning explains. Avoiding proprietary formats gives the best chance of information being accessible over the long term, he says. Databases, documents and images can all be moved to non-proprietary formats.

All of this is done on an altruistic rather than a commercial model, says Dunning. Donors sign a release that allows the material to be placed online. This can then be used by others free of charge, provided it is for non-commercial use. Permission must be obtained for commercial use and the AHDS refers the interested party back to the rights holder.

Another part of the remit is to encourage people to follow best practice in storing data. "People don't think about it until you start losing it," says Dunning. Material needs to be conserved properly with attributions, he says.

Nonetheless, copyright and licensing do pose problems. Unlike a book, digital information often has no obvious publisher to manage rights, and it can be unclear where copyright lies.

Digital archiving isn't entirely new. "There have been other kinds of archiving going on since the 1970s and 1980s because, of course, people were still using computers for work before the internet," Dunning says. However, the ability to incorporate interactive items and make collections easily and widely accessible are benefits of the internet.

As the British government moves towards expecting state-funded services to generate income, the AHDS is likely to face challenging decisions on whether it should commercialise at least some aspects of its service, and how it would do this.

One idea is to provide a paid-for archiving service for material that is not donated to the AHDS but needs expert IT management. Many small institutions or academic departments do not have the staff, infrastructure or expertise to archive digital materials, Dunning notes.

Would the AHDS consider creating a business around managing copyright material? Dunning says it goes against the spirit of the open-access archive. The AHDS would prefer to consider business models such as operating a private archive, he says.

So who uses the collection? "You'd think the collection might be very scholarly and academic, but there's so much information that there's really something for all kinds of groups and individuals," he says. "Along with scholars or educators, we get local history societies, tourists interested in researching a particular area, schoolchildren and people who just have really odd interests. The internet really brings in all types.

"I think there's so much that can be done to make information available online. Many think of the net as just a catalogue, but we can add many tools."

While you might think that every academic or scholar has to be familiar with the internet, Dunning says this is not the case.

"A lot of people still fear the internet. There is a significant minority who are really enthusiastic, but in most cases we are really doing a lot of education - explaining the advantages of putting collections online, calming worries [ about copyright and piracy], explaining how important digital preservation is. It's all fragile and needs to be preserved."

One of their biggest challenges is "sustainability", Dunning says. "By that I mean that you can preserve data but in some cases there's not much point in having data and no service to go along with it.

For example, if the data goes with maps on a website, you need to preserve the whole website, the search mechanisms, various tools, and so on. Likewise, there's no point in downloading a million archaeological items with no point of reference."

Another challenge is "paying for all of it". People accept the need for public funding for libraries, he says, but digital data is just as important. And while far more digital data is now being produced than books, very little is being done to consider how it can be preserved, managed and curated.

But just as libraries do not charge patrons for looking at books, Dunning prefers data to be free at the point of use.

Dunning was in Dublin recently to give a lecture as part of Trinity College Dublin's Long Room Hub initiative, which examines ways to expand scholarly and research connections in the arts and humanities.

Dunning says he hopes the AHDS might serve as an example of how such material can be archived and made available to the public.