Data and metadata structures for time-resolved and multidimensional STEM
Scanning transmission electron microscopy (STEM) has become an increasingly versatile and sophisticated instrument for studying materials at the atomic scale, due to advancements in in situ capabilities, novel imaging and spectroscopy modalities and ultrafast detectors. The large multidimensional datasets that are produced are enormously rich in quantitative information about the sample, but they call for new approaches in terms of data and metadata management. At present, the multitude of proprietary data formats developed by instrument manufacturers hinder easy access to the raw data. Each format also has their own metadata representation. In light of FAIR (Findable, Accessible, Interoperable, Reusable) principles, it is becoming increasingly important to standardize (meta)data representation.
The goal of this project is to develop universal, instrument and experiment independent TEM data and metadata formats.
The formats must be self-descriptive, complete, yet flexible to be extended for novel experimental modalities. The data must interface easily with open source python tools that are widely used in the microscopy community, such as hyperspy, libertem, nion swift and pycroscopy. Finally, the aim is to integrate dataset metadata in the NOMAD materials database in order to make experimental datasets searchable, freely accessible and reusable. Translating tools will be written to easily convert proprietary formats into the open format. Currently, HDF5 based formats, such as NeXus, are envisioned due to their inherently nested structure, as shown in Figure 1. Developing a data format only makes sense when the community adopts it. Therefore, we are collaborating with the groups of Christoph Koch and Claudia Draxl at the Humboldt University of Berlin funded through the BiGmax network of the Max Planck Society to develop a joint standard for complex microscopy datasets and their integration into NOMAD.