Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output xsd:double #2918

Open
dukesook opened this issue Oct 2, 2024 · 4 comments
Open

Output xsd:double #2918

dukesook opened this issue Oct 2, 2024 · 4 comments
Labels

Comments

@dukesook
Copy link

dukesook commented Oct 2, 2024

I'm trying to write a literal xsd:double in here:

from rdflib import Graph, Literal, Namespace, URIRef, XSD
g = Graph()
example = Namespace("http://example.org/")
subject = URIRef("http://example/24")
double_literal = Literal("12345.678912345", datatype=XSD.double)
g.add((subject, example["foo"], double_literal))
print(g.serialize(format='turtle'))

Output

@prefix ns1: <http://example.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://example/24> ns1:foo 1.234568e+04 .

However, instead of "12345.678912345"^^xsd:double, I'm getting 1.234568e+04. I want to explicitly show the datatype and not lose any precision.

@dukesook dukesook changed the title Handle xsd:double Output xsd:double Oct 2, 2024
@ashleysommer
Copy link
Contributor

Thanks for filing this issue @dukesook
I've looked into it, and what I found is actually alarming.

When serializing any literal to turtle representation (aka, N3 serialization), a method called _literal_n3 is called.
It looks like _literal_n3() can have an argument to serialize in "plain" mode, that will "shorten" the representation of Literals in a Turtle document. That includes dropping datatypes where not required, and in the case of floats and doubles, it shortens them to 6 decimal places of precision.

rdflib/rdflib/term.py

Lines 1528 to 1530 in 0b69f4f

Only limited precision available for floats:
>>> Literal(0.123456789)._literal_n3(use_plain=True)
'1.234568e-01'

I don't know why anyone would want to do that deliberately write incorrect data to the output file.
But it looks like the RDFlib built-in turtle serializer will always use "plain" formatting mode for all literals written out:

if isinstance(node, Literal):
return node._literal_n3(
use_plain=True,
qname_callback=lambda dt: self.getQName(dt, _GEN_QNAME_FOR_DT),
)

I don't why it does this by default, or why its not configurable, but it seems like RDFLib has operated in this way since before the Turtle serializer was moved into the module in RDFLib codebase in RDFLib v3.0.0 in 2010.
0ffb306

@ashleysommer
Copy link
Contributor

ashleysommer commented Oct 3, 2024

I found where the 6 decimal places comes from.
Its Python's built-in scientific-notation format for floats (and doubles)
https://docs.python.org/3/library/string.html#formatspec

When you use the "%e" formatting mark (or ":e" in f-strings), it converts the double to scientific notation, with "p" decimal points. Where if "p" is not given, it defaults to 6.

We can't simply change it to something higher like f"{float(value)}:.9e" because then a simpler double like 3.14 would serialize to "3.140000000e+01".

@ashleysommer
Copy link
Contributor

Making a fix for this would be considered a breaking change, so this is another candidate for something to overhaul for the big RDFLib 8 release later this year.

@dukesook
Copy link
Author

dukesook commented Oct 3, 2024

@ashleysommer
Thanks for looking into this issue, I look forward to it getting fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants
@ashleysommer @dukesook and others