As a Royal Warrant holder, Max Communications continuously strives to maintain and, where possible, improve its offering.
It achieves these improvements through innovation and the early adoption of new technologies and also listens very closely to its customers. In recent years, and in direct response to customers’ requests to do so, Max Communications has diversified into offering end-to-end digitisation services that ensure archive holders maximise and protect the full value of their digital assets. This article provides a brief introduction to three of these services, all of which will be showcased at this year’s Museums + Heritage Show:
SOTERIA offers comprehensive digital preservation and storage services for archives. It brings together the expertise of two established and well-regarded companies within the archives management sector: The Stockroom, a FACT-certified storage solutions provider and Max.
In Max Communications’ experience digital preservation can seem like such a large and forbidding problem to tackle that customers are uncertain where to begin. This in turn makes it difficult to communicate the imperative to do so to colleagues and stakeholders. In this situation it is useful to consider why digital preservation is important, both in terms of the benefits if it is adopted and the risks if it isn’t.
Here are a few examples:
- Guarantees long term accessibility and usability of content
- Guarantees long term safeguarding of digital assets
- Mitigates corruption, obsolescence or loss of content
- Supports compliance obligations
- Supports questions of provenance
- Facilitates new uses for the digital content in the future as these uses emerge
- Facilitates new opportunities and partnerships in the future as they emerge
- Maximises the benefits of future technological, legislative or procedural changes
- Protects an organisation’s investment in its digital strategy
Max Communications offer a three-tiered approach to accommodate all archives, large or small. The advanced option utilises Archivematica, an open-source software solution from Artefactual.
Max’s Archive Management Service, DRYAD, is built on AtoM, a modern-day web-based archive management system used around the world by hundreds of archives. AtoM is an open source product, freely available under a General Public Licence.
Although AtoM is open source and freely available, the effort and expertise required to make it operational and fit for purpose is considerable. Therefore, to help customers achieve a seamless transition, DRYAD provides a complete AtoM solution that includes installation, data migration, hosting, training and ongoing support. DRYAD also offers plug-ins and applications for specific tasks not included within the core AtoM platform. This includes our Crosswalker tool, typically used to migrate data from other software such as CALM.
DRYAD is also the means by which digitally preserved material can be retrieved from SOTERIA.
THEMIS is both a production and a project management tool. Fully hosted and managed by Max, THEMIS ensures that all stages of a digitisation project are managed from within a single platform. It also allows customers to approach digitisation projects differently if appropriate: for instance, using THEMIS it is possible to capture and ingest the digitised images prior to cataloguing and indexing the material. In this sense Max believes THEMIS is changing the way it digitises, opening up access far more quickly and efficiently.
For structured data Max has developed automated algorithms to extract specific fields from templated documents based on the customers spatial relationship to “marker” text floats. Text floats are the blocks of text with their bounding box coordinates that OCR programmes such as Tesseract produce. For example, if a series of printed forms have the word “Invoice Number” at the top of a column, individual text blocks that fall within a range below this “marker text” float can be identified as invoice numbers.
Once data has been separated into fields, content specific heuristic checks can be made against format. For example, analysis can show that an invoice number should be in a specific format e.g. XXX000, and THEMIS can mark records that don’t match this for review and QA.
As part of Max’s solution, it has developed a number of methodologies for QA and post production of OCRed material. Max recognises that the percentage accuracy reports of programs such as Abbyy Fine Read and Tesseract only give a figure for presumed correct interpretations based on the number of definite fails. Therefore, to improve OCR results Max develops content specific heuristic programming strategies and routines. It also utilises expandable dictionaries of proper nouns and jargon, especially for unstructured content.