Report on Metadata Interoperability Workshop at the EBDV Data Week 2021

In the frame of the European Big Data Value Data Week, 32 metadata experts from academia and industry attended the Workshop on Metadata Interoperability on 27 May 2021. The workshop was organised by Rigo Wenning, supported by the European H2020 projects MOSAICrOWN (Multi-Owner data Sharing for Analytics and Integration respecting Confidentiality and OWNer control) and TRAPEZE.

The EBDV Data Week is the spring gathering of the European Big Data Value and Industrial AI research and innovation community. The 2021 Data Week was held online over three days. The well-established event continued in the tradition of promoting opportunities, sharing knowledge and fostering ecosystem development. 

The Metadata Interoperability workshop aimed at exploring ways to make metadata interoperable while ensuring appropriate data protection. Interoperable metadata is crucial to enable data value chains that build up the data economy. The protection of data is key to increase the sharing and processing of data without privacy risks. The workshop chair Rigo Wenning (W3C/ERCIM) introduced a panel of five speakers who presented current research and developments tackling the metadata interoperability challenge.

Pierre-Antoine Champin from Univerisity of Lyon and W3C/ERCIM introduced “Linked Data: principles and perspectives”. He explained how heterogeneous data can be captured into graphs, and why URLs (or IRIs) are a good solution for the disambiguation of labelled graphs. He further explained that Linked Data, as a layer providing interoperability, does not require a change in the underlying metadata production chain. He also briefly presented work in progress at the World Wide Web Consortium – W3C, which includes Decentralized Identifiers (DIDs), content negotiation by profile, and RDF-star (an extension of RDF to make RDF more flexible by allowing metadata of edges).  

Víctor Rodríguez Doncel from Universidad Politécnica de Madrid presented “Metadata operations in Lynx”. Lynx is a European H2020 project that has built a service platform for ontologies applied to different use cases, such as labour law (https://www.lynx-project.eu/). This was achieved by developing a multilingual legal knowledge graph built on an RDF Data model, enriched with annotations, and compliant with the NLP Interchange Format NIF. As a lesson learned, he mentioned the huge effort invested to build the data model. In the future, he will be concentrating on methods for import and export of data and applying a more pragmatic approach for internal operations. This raised the question about how to make data models reusable and to whom public data models should be reported. 

Albert Zilverberg from ATB Bremen GmbH gave an example from the automotive industry with his talk on “Standardization challenges in cross-sectorial data streams”. The European CROSS-CPP project developed an ecosystem for services based on integrated cross-sectorial data streams (https://www.cross-cpp.eu/). The goal was to give data customers access to cyber-physical products (CPP) data streams to build sectorial and cross-sectorial services. This allows data owners to exploit their CPP CPS data, which is their most valuable asset. In a brand-specific data format environment, data customers need one single access point to get access to CPP data with one interface. A solution is the common industrial data model (CIDM) providing one common standard for all kind of CPP data. Albert then presented in detail the CIDM specifications, designed in a layered structure taking into account different types of sensor signals, CIDM measurement channels, etc.  CIDM Data packages contain complex metadata and he discussed the question of what level of harmonization is achievable by CIDM.

Svetla Boytcheva from Sirma AI (Ontotext) spoke about “Metadata in the health care sector”. The EU project EXAMODE – Extreme scale analytics via multimodal ontology discovery and enhancement (https://www.examode.eu) develops prediction and analysis tools for clinical settings and research. Clinical data is highly heterogeneous. The project investigates a digital workflow for a hospital information system using the health interoperability standard HL7 Int, semantic data interoperability standards such as RDF, RDF Star, OWL and SKOS, and technical interoperable standards like JSON and JSON-LD. Ontologies standard classifications are the key challenges for combining heterogeneous data. The development of new ontologies was needed to integrate these ontologies in a portal. New ontologies have been developed for four different diseases.

Piero Bonatti from Università di Napoli Federico II, gave a presentation entitled “Metadata, Policy and Reasoning” summarising the work carried out in the European TRAPEZE project (/). The goal of the project is to give the users control over their data by assuring transparency while legal compliance is automatically checked. He presented the architecture of the TRAPEZE method where privacy policies and consent are considered as metadata. He explained the many requirements for data usage policies and how these policies are being developed satisfying all requirements, leveraging OWL2 and JSON.

The presentations were followed by a lively discussion. The panellists pointed out that the considerable and impressive efforts put into each project clearly show that there is a need for exchanging methods and practices and sharing results on metadata work. The panellists and workshop participants were invited to contact the speakers to discuss and exchange current and future challenges and solutions of metadata interoperability, best practices and developments.

The sessions have been recorded and will be available on BDVA’s YouTube channel at https://www.youtube.com/channel/UC5XVReZ5BY4pcsWJY0nJGvw as well as on the Data Week’s web site https://www.big-data-value.eu/dw21-agenda/

by Peter Kunz