Digitizing large collections of Cultural Heritage resources and providing tools for their management, analysis and visualization is critical to DCH research. A key element in achieving the above goal is to provide user-friendly software offering an abstract interface for interaction with a variety of digital content types. To address these needs, the Medici content management system has been developed as a Web 2.0 environment integrating analysis tools for the auto-curation of un-curated digital data, allowing automatic processing of input datasets, and visualization of both data and collections. It offers a simple user interface for dataset preprocessing, previewing, automatic metadata extraction, user input of metadata and provenance support, storage, archiving and management, representation and reproduction. It is a scalable, flexible, robust distributed framework with wide data format support (including 3D models, GigaPan images and Reflectance Transformation Imaging-RTI) and metadata functionality.

Datasets and files are uploaded to the web server using regular HTML forms (both when creating a new dataset to upload its first file as well when viewing a dataset to add files to the dataset). However, files can also be uploaded in other ways, including uploading individual files that do not belong to a dataset. Preprocessors and scripts (i.e. previewers) running on users’ browsers communicate with the server using a REST API. Preprocessors are used for both extracting metadata from datasets and files as well as generating previews for them, e.g., Obj & X3D to 3D extractor; PTM/RTi to 3D extractor. Extractors in general use integrated libraries or external system calls to third-party software to process dataset files to generate the previews or extract the metadata. The third-party software include for example 3D model and video processing programs that are installed to the extractor’s environment and are called through Java command-line calls.

Medici can accept, persist and process two kinds of metadata for each dataset or file: a) automatically-generated; and, b) community-generated (CIDOC-CRM). In terms of database operation, the NoSQL MongoDB is used as the system’s database management system (DBMS).