Merge pull request #244 from dachafra/spec

dachafra · web-flow · commit d486c7fa0552 · 2025-10-10T12:52:54.000+02:00
Solving spec problems
diff --git a/spec/dev.html b/spec/dev.html
@@ -194,9 +194,9 @@
 
 <section id="definitions" data-include="section/definitions.md" data-include-format="markdown"></section>
 
-<section id="rdfTerminology" class="informative" data-include="section/rdfTerminology.md" data-include-format="markdown"></section>
+<section id="rdfTerminology" class="informative" data-include="section/terminology.md" data-include-format="markdown"></section>
 
-<section id="xsdTerminology" class="informative" data-include="section/xsdTerminology.md" data-include-format="markdown"></section>
+<!--<section id="xsdTerminology" class="informative" data-include="section/xsdTerminology.md" data-include-format="markdown"></section>-->
 
 </body>
 
diff --git a/spec/section/graphmap.md b/spec/section/graphmap.md
@@ -7,7 +7,7 @@ Any [=subject map=] or [=predicate-object map=] MUST have zero or more associate
 1. using the `rml:graphMap` property, whose value MUST be a [=graph map=],
 2. using the [=constant shortcut property=] `rml:graph`.
 
-[=Graph maps=] are themselves [=term maps=]. When [=RDF triples are generated=], the set of target graphs is determined by taking into account any [=graph maps=] associated with the [=subject map=] or [=predicate-object map=].
+[=Graph maps=] are themselves [=term maps=]. When [=RDF triples=] are generated, the set of target graphs is determined by taking into account any [=graph maps=] associated with the [=subject map=] or [=predicate-object map=].
 
 If a [=graph map=] generates the special IRI `rml:defaultGraph`, then the target graph is the [=default graph=] of the [=output dataset=].
 
diff --git a/spec/section/introduction.md b/spec/section/introduction.md
@@ -1,25 +0,0 @@
-# Base IRIs
-The base IRI of the [=mapping document=] is used to resolve relative [=IRIs=] in the RML document following the specification of the Turtle serialisaiton.
-
-## Base IRI for mapping rules
-
-The [=base IRI=] of the [=Triples Map=] is used in resolving relative [=IRIs=] produced by the [=RML mapping=].
-
-
-<pre class="ex-mapping nohighlight">
-# Triples Map that has a declared base IRI
-<#TriplesMap>
-    a rml:TriplesMap;
-    rml:baseIri <http://example.com/> .
-</pre>
-
-The [=base IRI=] MUST be a valid [=IRI=]. It SHOULD NOT contain question mark ("`?`") or hash ("`#`") characters and SHOULD end in a slash ("`/`") character.
-
-To obtain an absolute [=IRI=] from a relative [=IRI=], the term generation rules of RML use simple string concatenation, rather than the more complex algorithm for resolution of relative URIs defined in Section 5.2 of [RFC3986]. This ensures that the original database value can be reconstructed from the generated absolute [=IRI=]. Both algorithms are equivalent if all of the following are true:
-
-    1. The base IRI does not contain question marks or hashes,
-    2. the base IRI ends in a slash,
-    3. the relative [=IRI=] does not start with a slash, and
-    4. the relative [=IRI=] does not contain any "`.`" or "`..`" path segments.
-
-
diff --git a/spec/section/joinconditions.md b/spec/section/joinconditions.md
@@ -55,9 +55,9 @@ The following RDF triples are generated by the [=RML mapping=] above.
 
 A <dfn data-lt="join">join condition</dfn> is represented by a resource that has exactly one value for each of the following two properties:
 
-* a child map (`rml:childMap`) property, whose value is a [=child map=].<br> A <dfn>child map</dfn> (`rml:ChildMap`) is an [=expression map=] which MUST be evaluated against the [=logical source=] of the [=triples map=] that contains the [=referencing object map=], i.e. the current [=triples map=], or it should have a [=constant value=].
+* a child map (`rml:childMap`) property, whose value is a [=child map=], or a [=constant-valued expression map=].<br> A <dfn>child map</dfn> (`rml:ChildMap`) is an [=expression map=] which MUST be evaluated against the [=child logical source=]. The <dfn>child logical source</dfn> is the [=logical source=] of the [=triples map=] that contains the [=referencing object map=], i.e. the current [=triples map=]..
 
-* a parent map (`rml:parentMap`) property, whose value is a [=parent map=].<br> A <dfn>parent map</dfn> (`rml:ParentMap`) is an [=expression map=], which MUST be evaluated against the [=logical source=] of the [=referencing object map=]'s [=parent triples map=], i.e. the referenced [=triples map=], or it should have a [=constant value=].
+* a parent map (`rml:parentMap`) property, whose value is a [=parent map=], or a [=constant-valued expression map=].<br> A <dfn>parent map</dfn> (`rml:ParentMap`) is an [=expression map=], which MUST be evaluated against the [=parent logical source=]. The <dfn>parent logical source</dfn> is the [=logical source=] of the [=referencing object map=]'s [=parent triples map=], i.e. the referenced [=triples map=].
 
 If the the [=logical source=] of the [=triples map=] that contains the [=referencing object map=] and the [=logical source=] of the [=referencing object map=]'s [=parent triples map=] are not [=effectively equal=], then the referencing object map MUST have one or more [=join conditions=].
 
diff --git a/spec/section/output.md b/spec/section/output.md
@@ -1,8 +1,11 @@
 # The Output Dataset
+The <dfn>output dataset</dfn> of an [=RML mapping=] is an [=RDF dataset=] that contains the [=generated RDF triples=] for each of the [=triples maps=] of the [=RML mapping=]. 
+The [=output dataset=] MUST NOT contain any other [=RDF triples=] or [=named graphs=] besides these. 
+However, [=RML processors=] MAY provide access to datasets that contain additional triples or graphs beyond those in the [=output dataset=], such as inferred triples or provenance information.
 
-The <dfn>output dataset</dfn> of an [=RML mapping=] is an [=RDF dataset=] that contains the [=generated RDF triples=] for each of the [=triples maps=] of the [=RML mapping=]. The [=output dataset=] MUST NOT contain any other [=RDF triples=] or [=named graphs=] besides these. However, [=RML processors=] MAY provide access to datasets that contain additional triples or graphs beyond those in the [=output dataset=], such as inferred triples or provenance information.
-
-Conforming [=RML processors=] MAY rename [=blank nodes=] when providing access to the [=output dataset=]. This means that client applications may see actual [=blank node identifiers=] that differ from those produced by the [=RML mapping=]. Client applications SHOULD NOT rely on the specific text of the blank node identifier for any purpose.
+Conforming [=RML processors=] MAY rename [=blank nodes=] when providing access to the [=output dataset=]. 
+This means that client applications may see actual [=blank node identifiers=] that differ from those produced by the [=RML mapping=]. 
+Client applications SHOULD NOT rely on the specific text of the blank node identifier for any purpose.
 
 <aside class="note">
 RDF syntaxes and RDF APIs generally represent [=blank nodes=] with [=blank node identifiers=]. But the characters allowed in [=blank node identifiers=] differ between syntaxes, and not all characters occurring in the values produced by a [=term map=] may be allowed, so a bijective mapping function from values to valid [=blank node identifiers=] may be required. The details of this mapping function are implementation-dependent, and [=RML processors=] may have to use different functions for different output syntaxes or access interfaces. Strings matching the regular expression `[a-zA-Z_][a-zA-Z_0-9-]*` are valid [=blank node identifiers=] in all W3C-recommended RDF syntaxes (as of this document's publication).
@@ -11,3 +14,137 @@ RDF syntaxes and RDF APIs generally represent [=blank nodes=] with [=blank node
 <aside class="note">
 [=RDF datasets=] may contain empty [=named graphs=]. RML cannot generate such [=output datasets=].
 </aside>
+
+## The Generated RDF Triples of a Triples Map
+
+
+This subsection describes the normative process by which [=RDF triples=] are generated from a [=Triples Map=]. This process contributes [=RDF triples=] to the [=output dataset=]. Each generated triple MUST be placed into one or more graphs of the output dataset.
+
+The generated RDF triples are determined by the following algorithm. [=RML Processors=] MAY employ alternative implementations to compute the generated [=RDF triples=], provided that the resulting output dataset is semantically equivalent to the one obtained by this algorithm.
+
+
+Let:
+
+- **sm** be the [=subject map=] of the [=Triples Map=].
+- **records** be the set of logical records obtained by evaluating the [=logical source=] of the [=Triples Map=] using its declared [=reference formulation=].
+- **classes** be the set of class [=IRIs=] defined in **sm** (via `rml:class`).
+- **sgm** be the set of [=graph maps=] attached to **sm**.
+
+For each logical record **record** in **records**, apply the following steps:
+
+1. Let [=subject=] be the [=RDF term=] resulting from applying **sm** to **record**.
+2. Let **subject_graphs** be the set of [=RDF term=] resulting from applying each graph map in **sgm** to **record**.
+3. For each class [=IRI=] in **classes**, add a triple to the [=output dataset=] as follows:
+
+   | Component | Value |
+   |------------|--------|
+   | Subject | **subject** |
+   | Predicate | `rdf:type` |
+   | Object | class IRI |
+   | Target graphs | If **sgm** is empty → `rml:defaultGraph`; otherwise → **subject_graphs** |
+
+4. For each [=predicate-object map=] of the Triples Map, apply the following steps:
+
+   - Let **predicates** be the set of [=RDF terms=] resulting from applying each predicate map of the predicate-object map to **record**.
+   - Let **objects** be the set of [=RDF terms=] resulting from applying each object map (excluding [=referencing object maps=]) to **record**.
+   - Let **pogm** be the set of graph maps of the predicate-object map.
+   - Let **predicate_object_graphs** be the set of RDF terms resulting from applying each graph map in **pogm** to **record**.
+
+   For each possible combination `<predicate, object>`, where *predicate* ∈ **predicates** and *object* ∈ **objects**, add a triple to the output dataset as follows:
+
+   | Component | Value |
+   |------------|--------|
+   | Subject | **subject** |
+   | Predicate | *predicate* |
+   | Object | *object* |
+   | Target graphs | If both **sgm** and **pogm** are empty → `rml:defaultGraph`; otherwise → union(**subject_graphs**, **predicate_object_graphs**) |
+
+
+For each [=referencing object map=] of a [=predicate-object map=] in the [=Triples Map=] apply the following steps:
+
+- Let **psm** be the [=subject map=] of the [=parent Triples Map=] referenced by [=referencing object map=].
+- Let **pogm** be the set of [=graph maps=] of the [=predicate-object map=].
+- Let **joined_records** be the result of evaluating the [=join conditions=] defined by the [=referencing object map=], combining records from both the child and parent logical sources.
+
+For each pair `<child_record, parent_record>` in **joined_records**, apply the following steps:
+
+1. Let **subject** be the [=RDF terms=] resulting from applying **sm** to **child_record**.
+2. Let **predicates** be the set of [=RDF terms=] resulting from applying each [=predicate map=] of the [=predicate-object map=] to **child_record**.
+3. Let **object** be the [=RDF terms=] resulting from applying **psm** to **parent_record**.
+4. Let **subject_graphs** be the set of RDF terms resulting from applying each graph map in **sgm** to **child_record**.
+5. Let **predicate_object_graphs** be the set of RDF terms resulting from applying each graph map in **pogm** to **child_record**.
+
+For each *predicate* in **predicates**, add a triple to the output dataset as follows:
+
+| Component | Value |
+|------------|--------|
+| Subject | **subject** |
+| Predicate | *predicate* |
+| Object | **object** |
+| Target graphs | If both **sgm** and **pogm** are empty → `rml:defaultGraph`; otherwise → union(**subject_graphs**, **predicate_object_graphs**) |
+
+
+#### Adding Triples to the Output Dataset
+
+“Add triples to the output dataset” is a process that takes the following inputs:
+
+| Input | Description                                     |
+|--------|-------------------------------------------------|
+| **Subject** | an [=IRI=], a [=URI=], [=blank node=], or empty |
+| **Predicate** | an [=IRI=], a [=URI=], or empty                 |
+| **Object** | an [=RDF term=] or empty                        |
+| **Target graphs** | a set of zero or more [=IRIs=]                  |
+
+Execute the following steps:
+
+1. If **Subject**, **Predicate**, or **Object** is empty, **abort** these steps.  
+2. Otherwise, generate an [=RDF triple=] `<Subject, Predicate, Object>`.  
+3. If the set of target graphs includes `rml:defaultGraph`, add the triple to the [=default graph=] of the [=output dataset=].  
+4. For each [=IRI=] in the set of target graphs not equal to `rml:defaultGraph`, add the triple to the [=named graph=] identified by that [=IRI=] in the [=output dataset=].  
+   - If the [=named graph=] does not yet exist, create it.  
+5. RDF graphs MUST NOT contain duplicate triples. Adding multiple identical triples to the same graph has the same effect as adding it once.  
+6. The scope of blank nodes is limited to the output dataset being generated.
+
+### Generated RDF Term of a Term Map
+
+
+A [=term map=] defines how an [=RDF term=] is generated from the evaluation of a [=logical iteration=] over a [=logical source=].  
+The result of evaluating a term map for a given logical record can be one of the following:
+
+- **Empty**, if any referenced value of the [=term map=] evaluates to a null, empty or missing value (each data format defines it in [RML-IO-Registry](https://w3id.org/kg-construct/rml-io-registry/));  
+- **An [=RDF term=]**, when evaluation produces a valid [=RDF term=]according to the [=term generation rules=];  
+- **A data error**, when a valid RDF term cannot be produced.
+
+The [=generated RDF term=] of a [=term map=] for a given logical record is determined as follows:
+
+1. If the term map is a **constant-valued term map**, then the generated RDF term is the term map’s constant value.
+2. If the term map is a **reference-valued term map**, then the generated RDF term is determined by evaluating the [=reference value=] expression over the logical record and applying the *term generation rules* to the resulting value.
+3. If the term map is a **template-valued term map**, then the generated RDF term is determined by evaluating the [=template value=] against the logical record and applying the *term generation rules* to the resulting value.
+
+The <dfn>term generation rules</dfn> define how a concrete RDF term is generated from a given value:
+
+1. **If the value is null, empty or missing**, then no RDF term is generated.
+
+2. **If the term type is `rml:IRI`:**
+   1. Let *value* be the [=natural RDF lexical form=] corresponding to the evaluated value.
+   2. If *value* is a valid [absolute IRI](https://datatracker.ietf.org/doc/html/rfc3987#section-2.2) [[RFC3987]], then return an [=IRI=] generated from *value*.
+   3. Otherwise, prepend *value* with the [=base IRI=]. If the result is a valid [absolute IRI](https://datatracker.ietf.org/doc/html/rfc3987#section-2.2) [[RFC3987]], then return that [=IRI=].
+   4. Otherwise, raise a **data error**.
+
+3. **If the term type is `rml:URI`:**
+   1. Let *value* be the [=natural RDF lexical form=] corresponding to the evaluated value.
+   2. If *value* is a valid [=absolute URI=] [[RFC3986]], then return an [=URI=] generated from *value*.
+   3. Otherwise, prepend *value* with the [=base IRI=]. If the result is a valid [=absolute URI=] [[RFC3986]], then return that [=URI=] .
+   4. Otherwise, raise a **data error**.
+
+4. **If the term type is `rml:BlankNode`:**
+   - Return a blank node that is unique in the target graph.
+
+5. **If the term type is `rml:Literal`:**
+   - If the term map declares a [=language tag=], then return a literal with that language tag and the natural RDF lexical form corresponding to *value*.
+   - Otherwise, if the term map declares a non-empty [=datatype=] different from the natural RDF datatype corresponding to the value’s implicit datatype, then return an RDF literal with the specified datatype.
+   - Otherwise, return the natural RDF literal corresponding to *value*.
+
+
+
+
diff --git a/spec/section/overview.md b/spec/section/overview.md
@@ -72,7 +72,7 @@ Input source: album.json
     "Description": "A collection of stunning cityscape images.",
     "CreatedDate": "2023-10-01",
     "DateFormat": "date",
-    "Author": "John Doe",
+    "Author": "Zoë Krüger",
     "Images": [
       {
         "ID": 116,
diff --git a/spec/section/terminology.md b/spec/section/terminology.md
@@ -1,4 +1,6 @@
-# RDF Terminology
+# Terminology
+
+## RDF Terminology
 
 This section lists some terms normatively defined in [[RDF11-CONCEPTS]] and used in RML:
 
@@ -24,3 +26,21 @@ This section lists some terms normatively defined in [[RDF11-CONCEPTS]] and used
 - <dfn><a data-cite="RDF11-CONCEPTS#dfn-property">property</a></dfn>
 - <dfn><a data-cite="RDF11-CONCEPTS#dfn-resource">resource</a></dfn>
 - <dfn><a data-cite="RDF11-CONCEPTS#dfn-subject">subject</a></dfn>
+
+
+# XML Schema Definition Language (XSD) Terminology
+
+This section lists some terms normatively defined in [[XMLSCHEMA11-2]] and used in RML:
+
+- <dfn><a data-cite="XMLSCHEMA11-2#datatype">XSD Datatype</a></dfn>
+- <dfn data-lt="XSD Canonical mapping"><a data-cite="XMLSCHEMA11-2#canonical-lexical-representation">Canonical mapping</a></dfn>
+
+
+# Uniform Resource Identifier Terminology 
+
+This section lists some terms normatively defined in [[RFC3986]] and used in RML:
+
+- <dfn><a data-cite="RFC3986#section-1.1.3">URI</a></dfn>
+- <dfn><a data-cite="RFC3986#section-5.2">relative URIs</a></dfn>
+- <dfn><a data-cite="RFC3986#section-4.3">absolute URI</a></dfn>
+- <dfn><a data-cite="RFC3986#section-2.1">Percent-encode</a></dfn>
diff --git a/spec/section/termmap.md b/spec/section/termmap.md
diff --git a/spec/section/tooling.md b/spec/section/tooling.md
diff --git a/spec/section/xsdTerminology.md b/spec/section/xsdTerminology.md

Original file line number	Diff line number	Diff line change
`@@ -72,7 +72,7 @@ Input source: album.json`
`72`	`72`	`"Description": "A collection of stunning cityscape images.",`
`73`	`73`	`"CreatedDate": "2023-10-01",`
`74`	`74`	`"DateFormat": "date",`
`75`		`- "Author": "John Doe",`
	`75`	`+ "Author": "Zoë Krüger",`
`76`	`76`	`"Images": [`
`77`	`77`	`{`
`78`	`78`	`"ID": 116,`