Innovation: Preserving the digital future of the forgotten format

Mere decades can render a/v content inaccessible. We must ensure content is forever

Storage systems: One day your state-of-the-art archive could be part of a technologically impenetrable world. Photograph: iStock
Storage systems: One day your state-of-the-art archive could be part of a technologically impenetrable world. Photograph: iStock

As part of the negotiations to establish a Science Gallery in Venice a few years ago, I visited the Venetian State Archives, which are housed in a 13th-century Franciscan monastery just a few minutes’ walk from the famous Rialto bridge. The buildings contain an amazing 80km of shelving from floor to ceiling, completely filled with dusty volumes going back over a thousand years.

The Venetian Republic was a sovereign state until 1797 and dominated the Adriatic, the Balkans and Cyprus but the archives also hold some material from northern and western Europe. There are administrative documents in multiple languages, some now forgotten, in the form of government records, tax filings, births, marriages, deaths, court papers, materials from monasteries, papers from guilds, maps and urban plans.

The archives are carefully being digitised via an EU-supported project, the Venice Time Machine. The curators explained that some documents are now so fragile that they dare not open them. They have developed an innovative X-ray scanner that can detect copper, iron and zinc traces in handwritten ink used on each individual page without having to open a book, and so forensically reconstruct the contents.

In 1086, the English and Welsh Domesday survey was completed, primarily for tax-collection purposes. On the 900th anniversary in 1986, the BBC together with Acorn Computers, software company Logica and consumer goods giant Philips, published a new Domesday in multimedia.

READ MORE

The BBC Domesday project was undertaken largely by schoolchildren who submitted floppy disks with contributions about their local geography, history, amenities and social issues, augmented by photos and maps. The entire result of some million submissions was then published on a pair of laser discs, which could only be read and browsed on a BBC Acorn computer equipped with an expensive Philips laser disc drive. But, by 2002, concerns grew that the entire BBC Domesday project could be lost, since the storage system, disk format and software had become obsolescent after a mere 16 years. A major project by the Universities of Leeds and Michigan barely managed to resurrect the content by writing custom software to emulate the old system and then to reformat it for modern technology.

9/11 footage

The demise of digital archival material was illustrated even more recently just last month with the 20th anniversary of 9/11. Much of the original graphics and video footage were published at the time in a digital format designed by Adobe Inc, a software company specialising in digital publishing. The Adobe “Flash” system for web content then became widely deprecated by 2017. As a result last month, some archival content from major US news outlets was not easily accessible and had to be resurrected and rebuilt.

The changes in media technology and formats are in turn reflected in the development of consumer devices. There are videos on YouTube showing teenagers and millennials puzzled by old rotary telephones (youtube.com/watch?v=oHNEzndgiFI). Some websites present simulations of old TV sets, and so if you remember how to operate a set they will then play archival content from the 1960s, 1970s and 1980s of old TV shows, adverts, cartoons and news (my60stv.com/). Touch screens are now so pervasive that young digital natives may struggle with old formats: there are videos of toddlers frustrated that tapping and gesturing on printed magazines does not work the same way as smart tablet devices (youtube.com/watch?v=aXV-yaFmQNk).

Decommissioned data centres

Relentless innovation will doubtless continue to rapidly evolve the presentation of digital content. Augmented and virtual reality are now becoming mainstream with a new generation of consumer smart glasses and heads up displays for computer-augmented guidance, social media and general entertainment. The metaverse beckons, in which our physical world seamlessly merges with virtual worlds. We may soon have all our digital content in the cloud, remotely stored by the internet giants in data centres across the planet, and thus we will no longer need any local digital storage.

In the long term, will the current internet giants survive and will our precious digital content be preserved when the data centres are inevitably decommissioned?

If you are old enough, maybe you still have a drawer of floppy disks, CDRoms and digital cassette tapes: but for how long will floppy disk drives, CD and tape readers be widely available? Even if you can still buy an old reader, will the data structures stored on these media be readily understood by today’s software? Are there not similar risks in the future for the data we today preserve on memory sticks and memory cards?

The researchers in the Venice Time Machine are going to extraordinary lengths to recover long-lost local histories of much of Europe from fragile manuscripts. In a thousand years, how can we ensure that our own digital culture and social artifacts will be available to historians? Innovation is needed not just to preserve digital content but also to archive the detailed description of all data formats and storage structures, so that future software can deduce and rebuild digital artifacts stored since the start of the computer era in the 1950s.