Replies: 7 comments 5 replies
-
There is a lot of focus on the semantic web with annotated data sets serialized as RDF, JSON-LD, or XML. However, I don't think we can expect every scientist to search through ontologies and connect them. This is rather a task for data stewards but the unfortunate reality is that most groups don't have access to one, and won't for the foreseeable future. I therefore believe that we need a way to Find, Interoperate, Access, and Reuse not only the definitions but the collection of the these in schemas/forms. |
Beta Was this translation helpful? Give feedback.
-
One of the goals I have for MADICES-2025 is to understand how do I need to annotate my NetCDF datasets so that the annotations are useful downstream. By this I mean the mechanics of it, not figuring out the ontology labels etc. I am happy to provide some example files, and prepare a spreadsheet matching data headers to ontology entries. These annotations could be implemented into either |
Beta Was this translation helpful? Give feedback.
-
My aim is similar to Peter's, how can I best annotate public or published entries (e.g. samples, devices, their relationships and attached measurements) in datalab instances so that it can be used downstream, even if semantic annotations are missing or incomplete. |
Beta Was this translation helpful? Give feedback.
-
We are also keen to discuss interoperability between ELNs! There are clearly a few different efforts here, on the open source side there is the ELN Consortia, with the ELNFileFormat, and I'm seeing a lot of discussions about this at the Industry based conferences that I go to about the Allotrope Foundation and various companies working with the large-scale commercial ELNs to provide an archive format to pull data between them. Also interested in discussing common templates across ELNs |
Beta Was this translation helpful? Give feedback.
-
We (https://github.com/open-reaction-database, ORD) are interested in this topic. The ORD is an open-access repository and schema for chemical reaction data, particular experimental data of small molecule reactions. We use the Protocol Buffers (similar to XML) for storing and exchanging reaction datasets, but also provide an Object Relational Mapper to unpack the data into a postgreSQL database. Our interest is in working with other reaction data formats to ensure interoperability, but also thinking how we could fit into upstream and downstream data standards, and applications. |
Beta Was this translation helpful? Give feedback.
-
We really need a dedicated working group focused on advancing standardization efforts to improve the interoperability of systems, particularly with respect to data formats and associated tools. As many of you have already mentioned, either directly or indirectly, we are facing a fragmented landscape. Moreover, placing the burden of dataset annotation solely on individual scientists is not a realistic approach. In this context, I believe it is essential for the working group to:
Based on this analysis, the group should then develop informed recommendations regarding formats and tools for future adoption. I would be glad to contribute to this effort, and I am willing to take a leading role in organizing and steering the working group. |
Beta Was this translation helpful? Give feedback.
-
We should acknowledge that the ELNs have different goals with different advantages and disadvantages (like rock-paper-scissor none is superior to all). If the ELNs have different functionalities, they have necessarily different requirements when it comes to exchange format, extractors. ... As such a exchange-module for all ELN does likely not exist. We should accept these differences and not force alignment, where it is difficult / impossible. We also should acknowledge that there are different budgets; some elns have 1 person doing development, design, teaching, ... and on the other hand we have multi-billion euro companies. What can be achieved with one might not be feasible for the other, including participating in all meetings and working groups. Plus there is a difference in data-structure: some that go along the RDF and some that are not; and all 256 shades in between. I think we aware of xkcd.927 and want to not explicitly create new formats. But that also means that there are already 14 formats and people chose different versions for different reasons, and that is ok. I am aware of 3 ELN exchange formats, in alphabetical order: allotrope (multi-billion company, quite strong RDF), eln (no funding, open consortium, semantic annotation is possible but not necessary) and the openBIS suggestion from Andreas (quite strong RDF). Please correct me if there are more or you do not agree with my categorization. What triggers me often in these discussions are general statements that focus on the own solution. e.g. "FAIR research data is important and hence we have to use my product XYZ" (I haven't heard this specific argument and do want to not put anybody on the block) because it implies that other products do not have the same goal. I think we all have almost 100% goals but try different paths to get there. We should openly learn, ask: why did you not chose this/that solution. But let's accept the differences and refrain from such statements. The open discussion should also lead to an intermediate review: as a community we have been in this RDM-process for ~5 years. Somethings were successful others not (my TAPIR project on extractors). We should try to identify the reasons for success / failure (eg not polished product due to insufficient funding) and try to learn from them. That could lead to some best practices for RDM project developers, which I would find really interesting. Peace. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
MADICES 2024 Recap
We delved into the realm of semantic annotation, emphasizing the adoption of RDF serialization formats for data representation, using JSON-LD as a working example, to enhance the interpretability and reusability of research datasets. Participants explored methodologies for incorporating metadata, handling missing values, and standardizing units for measurements. Tools and guidelines were developed to streamline the annotation process and ensure compatibility across different knowledge domains.
MADICES 2025 Focus Area
We want to focus on further refinement of protocols and guidelines for semantic annotation and interoperable data exchange. Recent discussions with members at the Future Labs Live 2024 event in Basel illuminated the interest in semantic interoperability of research data and the need for standards and tooling to facilitate ease-of-use and reduce the burden/barrier to semantically annotate datasets.
Please include links, comments, and discuss plans for this focus area below.
Beta Was this translation helpful? Give feedback.
All reactions