Replies: 1 comment
-
An alternative approach that I see some logic in doing is to not store the abstract at all. I see that Zotero does not seem to properly re-display JATS XML abstracts in the Crossref metadata. For example: doi:10.1101/2020.10.07.329755 and 10.1371/journal.pgen.1009665. Both the preprint DOI (1st) and journal DOI (2nd) have JATS XML in the abstract in the Crossref metadata. Zotero re-displays these abstracts but replaces JATS tags with newlines, making the text obviously mis-formatted. This begs the question as to how many human readers actually rely on reading content derived from JATS XML abstracts in Crossref metadata. If this is an incomplete data channel with human readers not relying on it, then it can be treated as optional until some point in the future when human readers do rely on a more complete and reliable data channel. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Converting a JATS abstract into commonmeta's internal format for description text drops many formatting features that are supported by Crossref desposit XML for abstracts: https://www.crossref.org/documentation/schema-library/markup-guide-metadata-segments/abstracts/ For example, paragraphs, JATS subsections, unordered lists, hyperlinks, and MathML to name a few.
A concrete example for abstract formatting that is lost is the unordered list that appears in the abstract of:
https://popgen.es/D9qSdCY6GPrxthT3ZnFouEU35ow/1.1/
The JATS XML for this Baseprint document has the JATS
<list>
and<list-item>
elements in addition to paragraph elements:https://archive.softwareheritage.org/swh:1:cnt:3b6e8fffb09ff7e0dad64930e07cffd1b7762407
Looking at the commonmeta-py sanitize function it looks like currently the only formatting in an abstract that is currently representable by commonmeta are the HTML tags
b
,br
,code
,em
,i
,sub
,sup
, andstrong
. This list of commonmeta-supported HTML tags could be expanded to include sayli
andp
,a
and MathML tags. But JATS subsections will be trickier because they have no HTML equivalent (you'd have to do something like represent them asdiv
with some predefinedclass
or something or maybe just make up a custom HTML tag.Another tricky problem would be converting JATS titles of abstract and subsections since it's not clear which HTML
h1
,h2
tags to use perhapsdiv
s with a specialclass
or yet another custom HTML tag.But there is a bigger deeper question: If one has JATS abstracts and Crossref deposit is the primary (or only) target format to convert to, why not convert directly instead of loosing information converting to an intermediate representation that is not JATS-oriented when Crossref XML supports JATS and MathML XML.
My thinking is that conversion to an intermediate representation has pros and cons. In the above example the pros do not out-weigh the cons. The pros apply little if the primary target is Crossref XML and there are two cons of (1) many XML elements and schema features are dropped or (2) new dev work is needed to add support to commonmeta for all these various JATS XML specific details.
Beta Was this translation helpful? Give feedback.
All reactions