Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema.org moving from HTTP Content Negotiation to JSON-LD 1.1 "Link:" header for context file #85

Open
danbri opened this issue May 21, 2020 · 9 comments
Assignees

Comments

@danbri
Copy link

danbri commented May 21, 2020

This happened faster than planned due to a DOS attack this week, details in schemaorg/schemaorg#2578 (comment)

Schema.org no longer publishes a JSON-LD context file using HTTP content negotiation. Our homepage URL always returns HTML. This affects the parsing of all JSON-LD that expects to get a context definition from URLs "http://schema.org", "https://schema.org", "http://schema.org/", "https://schema.org/".

The URL of our context file is https://schema.org/docs/jsonldcontext.jsonld

We will shortly update the site to declare this URL via a Link header (see above issue for details).

I am filing this issue

  • Firstly to give you background knowledge in case people report JSON-LD parsing problems here
  • To encourage implementation of the JSON-LD 1.1 "Link" header discovery mechanism which AFAIK from my quick tests isn't yet supported in RDFLib
  • To encourage discussion of caching / robustness, since there is no guarantee that this file will remain accessible 24x7 indefinitely.
@danbri
Copy link
Author

danbri commented May 21, 2020

The main Schema.org site should have the headers discussed now, i.e.

  • access-control-allow-credentials: true
  • access-control-allow-headers: Accept
  • access-control-allow-origin: *
  • link: </docs/jsonldcontext.jsonld>; rel="alternate"; type="application/ld+json"

@danbri
Copy link
Author

danbri commented Jul 2, 2020

@hsolbrig can you suggest a workaround, at least for short term use? Can we pass in the context when invoking parser (by URL or by content?) /cc @Gnomus042

@hsolbrig hsolbrig self-assigned this Jul 13, 2020
@westurner
Copy link

westurner commented Sep 18, 2020

Is there no way to do this without requiring a custom HTTP header? Why is that part of the data specified out-of-band from the rest of the document?

(edit) Static files (with no HTTP server configuration dependency) are most scalable and archivable.

@nicholascar
Copy link
Member

nicholascar commented Sep 18, 2020

@rob-metalinkage, is this going to cause problems for JSON- > JSON-LD expansion due to the separate Context?

@nicholascar
Copy link
Member

@danbri, @westurner, @hsolbrig RDFlib maintainers are assembling volunteers to complete this tools' JSON-LD 1.1 implementation and then to merge it into RDFlib core. That should make it easier for all to just "do" JSON-LD with RDFlib.

@rob-metalinkage
Copy link

@nicholascar I dont think it causes any extra problems, as using just a model namespace to perform JSON->JSON-LD expansion is unsafe anyway.

The patterns appearing to be in the wild seems to be:

Data model = X
context URI = .json

i.e. there is no way to discover for a model X the relevant context file.

Or the requirement to perform content negotation is based on a model

Datamodel = X
Context = X (Accept ld+json)

this is being taken off the table as a bad idea according to this issue, but it has a deeper issue IMHO that if your data model is described in OWL , then ld+json should be the data model serialised as JSON-LD, not necessarily a JSON-LD context for the model.

The options for canonical mechanisms to discover the actual URL for a context seems to be:
a) returns Link header for alternates
b) supports a Profile "alt" which can be accessed for by either header or a URL parameter<X?_profile=jsoncontext> where the profile jsoncontext is registered and well-known. (dx-prof-conneg)

if dx-prof-conneg supports the same Link syntax and if a resource chooses to embed the Link headers for all the available profiles and serialisations from the "alt" view by default the two approaches are consistent I think.

I'd always choose the latter, as JSON context is not the only resource I'd want to be able to discover about a model. JSON-schema is also valuable, and SHACL and HTML and maybe other forms.

@westurner
Copy link

Maybe I'm misunderstanding? From https://www.w3.org/TR/json-ld11/#the-context ::

Contexts can either be directly embedded into the document (an embedded context) or be referenced using a URL. Assuming the context document in the previous example can be retrieved at https://json-ld.org/contexts/person.jsonld, it can be referenced by adding a single line and allows a JSON-LD document to be expressed much more concisely as shown in the example below:

{
 "@context": "https://json-ld.org/contexts/person.jsonld",

@rob-metalinkage
Copy link

@westurner you are right it doesnt need necessarily need a custom header, but there are a couple of things that need care here:

  1. the agent that is "adding a single line" somehow has to know the URL "https://json-ld.org/contexts/person.jsonld"

we can say its all client code to tell RDF lib exactly what to include and maybe not think about this - but this issue is about other approaches such as trying to resolve namespaces such as schema.org and getting a context.

  1. contexts may include other contexts - so the behaviour needs to be explicit in terms of exactly how to handle potential conflicts (prefix strings bound to different URIs) and default namespaces (@value, @base) - having been exploring this I find the JSON-LD documentation extremely hard to follow and lacking basic examples, and RDFLib is silent. IMHO RDFlib should encapsulate and explain basic practices here without needing interpretation of JSON-LD specification to get started.

  2. there seem to be quite a lot of ways to bundle a set of object descriptions in JSON-LD - including arrays, @graph constructs, container objects etc. Probably the JSON-LD serialiser needs to be able to handle these if we want to deliver a a serialisation for use in a specific context - such as to meet an API payload requirement. The JSON-LD framing spec makes this clear - see Support JSON-LD Framing  #95

@datamusee
Copy link

I think the following code (failing to load the schema.org context) is linked to the present problem, but doesn't understand the workaround
`from rdflib import Graph, plugin
from rdflib.serializer import Serializer

jsonldSample = """
{
"@context": "https://schema.org",
"@type": "LocalBusiness",
"name": "La Tour Eiffel",
"address": {
"@type": "PostalAddress",
"addressLocality": "Paris",
"addressRegion": "75007",
"streetAddress": "Champ de Mars, 5 Avenue Anatole France"
},
"description": "Monument emblématique de Paris, la tour Eiffel est une tour de fer puddlé de 324 mètres de hauteur construite par Gustave Eiffel à l’occasion de l’Exposition Universelle de 1889 et qui célébrait le premier centenaire de la Révolution française.",
"url": "https://www.toureiffel.paris",
"image": "https://www.toureiffel.paris/sites/default/files/2017-10/monument-landing-header-bg_0.jpg",
"pricerange": "de 2,5 à 25 euros",
"telephone": "08 92 70 12 39"
}
"""

g = Graph().parse(data=jsonldSample, format='json-ld')
print(g.serialize(format='json-ld', indent=4))
print(g.serialize(format='nt', indent=4))`

docuracy added a commit to docuracy/Locolligo that referenced this issue Jan 25, 2022
Updated schema.org context document address in the light of RDFLib/rdflib-jsonld#85
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants