Search or browse the IGeLU 2018 programme
Feeding PDF files to Rosetta: favourite food?
Conference or Developers Day
Conference
Abstract
Do you have PDF files in your collections? Do you think they solve all your archival problems? Think again. PDF files are mighty difficult and can be broken in myriad ways. Nevertheless, PDF is the go-to format for text-based content in digital preservation. Therefore the Digital Archive has to be able to detect file errors and repair them to ensure long-term availability. The ZBW library has been using Rosetta since 2010 and 10% of our ingested files are PDF files. Thus, we are currently dealing with 134,000 PDF files in our Archive (16 different flavors: PDF 1.1–1.7, PDF/A, PDF/X etc.). Rosetta checks the file validity using the validation tool JHOVE. However, JHOVE states that 22% of our PDF files are invalid. We urgently need post-processing for these PDF files. We want to share our lessons learned and show what works and what does not (yet) work in Rosetta; ingesting PDF-files as well as migrating them within Rosetta’s preservation planning module.
Main Topic
Rosetta
Presenters
Yvonne Tunnat, ZBW
Presenter's job title
Preservation Manager
Co-Presenters
Moderator: Dave Allen, State Library of Queensland, Australia
Co-presenter's job title
Lead, Enterprise Architecture