The desire of the Digital.Bodleian Project was to migrate multiple silos of content – content which had been put online by the Bodleian Library at various times over the previous 15 years – into a single storage architecture, and a single delivery interface so that they could continue to sustainably support delivery of digitised content into the future.
This storage platform and user interface was built so it could also be used for new digitised content, but the initial motivation was to have a place to preserve and deliver digitised content that we already had online.
The project took approximately 3.5 years, but there was quite a long pause in the middle while the team worked on its hardware infrastructure, and made some general systems improvements across a range of projects, not just Digital.Bodleian.
The project is ongoing, and runs as a service of the Bodleian Library, managed by Bodleian Digital Library Systems and Services.
“The Bodleian has many millions of digitised images,” says Dr Matthew McGrattan Collections Delivery Architect & Acting Head of Digital Research, Bodleian Digital Library Systems and Services. “The total count is continually changing, but we have between 2.5 million and 3 million images that we have digitised ourselves, and we store at least the same number of other images of our collections that were digitised elsewhere.”
The Bodleian Library’s Imaging Services department digitises content for internal projects; for externally funded scholarly projects (such as the Polonsky Digitization Foundation); and on-demand for scholars, and for commercial organisations such as publishers and print and broadcast media.
The categories vary, and while much of the internal digitisation efforts have concentrated on special collections material such as manuscripts and rare printed books, there are also a relatively large number of images of printed books, and scans from microfilm in our archive.
Projects are managed on a number of bases with on-demand digitisation, and smaller projects, are often managed by our Imaging Services department. Larger projects such as the Polonsky Digitization Foundation project, which was co-ordinated with the Vatican library, may have a dedicated project manager and may make use of additional project management resource within Imaging Services – handling the photography – or within Bodleian Digital Library Systems and Services – handling software development, digital archiving and preservation, and image and metadata delivery.
Digital.Bodleian currently delivers around 250,000 images and it is expected that this number will rise to around 700,000 in the next six months as the team migrate the bulk of the Polonsky Digitization Foundation images to Digital.Bodleian, and pick up the remainder of its legacy of on-line collections.
The remainder of the library’s archive will be put online on a collection by collection basis, depending on the copyright status of the material, the quality of the source images, and the availability of catalogue records and machine-readable metadata.
Since the library began digitising its collections 20 years ago the technology has changed and photography wise has moved from using slow scanning backs on large format cameras to using a mixture of medium format and 35mm-size full-frame digital SLRs. “This makes the image taking process quicker, but digitisation of special collections material is still a relatively cost and labour-intensive process because of the special care required when working with old or fragile material,” says Dr McGrattan. “In recent years we’ve improved the level of automation in our digitisation workflow, and we are working to roll out that workflow to a wider range of our digitisation efforts.”
For Digital.Bodleian a mixture of large format scanning backs (PhaseOne and Betterlight) have been used, medium format digital backs (primarily PhaseOne P45 and P60s), and full-frame dSLRs, with a smaller number of images coming from flatbed or film scanners.
The user interface makes use of the iNQUIRE software, developed in collaboration with Armadillo Systems, and the team makes use of IIP image servers that can provide tiled images ‘on-the-fly’ from lossless JPEG2000s, which it uses to store the deliverable images in our on-line repository. The tiling image servers make use of the DeepZoom and IIIF (http://iiif.io) Image APIs to provide a seamless pan and zoom user experience so that users of Digital.Bodleian can view high resolution images quickly and easily. The Bodleian Library also provides access to the metadata for its content using the IIIF Presentation API, and provide embeddable viewing via the Universal Viewer (developed by Digirati for the Wellcome Library and British Library).
“The user interface provides the ability for the general enthusiast to browse our content, without necessarily requiring any specialist academic knowledge of our collections,” he says. “Experts who know the collections, can of course search the metadata or access material by Shelfmark or other identifier. Both get access to very high resolution images. Our use of the IIIF APIs (http://iiif.io) provides interoperability, and our content can be embedded elsewhere using the Universal Viewer, or combined with other institutions’ content using the IIIF APIs.”
The Bodleian Library’s initial target with Digital.Bodleian was to get approximately 150,000 images that were already had online migrated to a single platform. That goal has largely been met says Dr McGrattan although there are approximately 20,000 images from its legacy online collections yet to be migrated.
In addition, the team had planned to have the 500,000 images digitised for the Polosnky Digitization Foundation in Digital.Bodleian by the end of that project, and are still on target.
“We have successfully set up a pipeline that makes it easy for us to migrate newly digitised content into Digital.Bodleian, and we have been quicker than we had originally hoped to provide full IIIF access to our digitised content,” he says.
The next steps are to complete the migration of legacy digitised content, to migrate the remaining Polonsky Digitization Foundation images, and to continue to develop Digital.Bodleian as a sustainable service. To achieve this the team are continuing to explore additional technologies and ways to make Digital.Bodleian a better service – particularly for mobile and tablet devices – and to make as much use of interoperability with peer institutions via IIIF as possible. The team also continues to work with Imaging Services and with external donors and funding agencies to digitise more material.
There were two main challenges for Digitial.Bodleian: firstly to provide a robust hardware and software infrastructure to store and deliver digitised content. Secondly to build a sustainable pipeline for ingesting new content into that infrastructure.
“Both challenges were overcome through making best use of our in-house technical expertise in BDLSS – particularly in our Digital Research software development team, and systems administration team,” he says. “And by making use of commercial partnerships (with Armadillo Systems for iNQUIRE, and with Intranda GmbH for our digitisation workflow) where necessary, and open source software (such as the Universal Viewer, and IIP Image) where possible.”
Work on Digital.Bodleian is ongoing, and there are plans to migrate systems to an even faster hardware platform, while continuing to explore improvements in the software ‘stack’ that underlies Digital.Bodleian, and the range of services that Digital.Bodleian can provide.
In Focus - Digitising Collections
This case study is part of an In Focus feature on Digitising Collections. Click here to see the introduction and links to three more case studies