-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nested triples maps #6
Comments
Related discussion: kg-construct/rml-questions#11 |
copying conclusion of kg-construct/rml-questions#11 here: what makes this trivial in, e.g., SPARQL-Anything, is the fact that values from the same iteration can be linked with some kind of 'iteration identifier' (that is independent of the actual data values).
Maybe it's a good idea to think of assigning each iteration with a some kind of ID by default, and have some way to refer to that id in the mapping? Or maybe even some additional functionality so we can do things like 'do join condition with the previous iteration' (in cases where order in a data source is actually meaningful), 'do join condition with the parent iteration' (in cases where data sources actually have hierarchy) etc.? That way, we keep a single construct in the RML language (i.e., join conditions), and the engines can optimize however they want. (no-one's to say we can't provide shortcuts as well ofc) In fact, e.g., in JSONpath, part of the spec is to return the normalized path expression, as an alternative to the actual values, which we could see as the iteration ID? (see bottom of https://goessner.net/articles/JsonPath/ ) |
I need that often with JSON sources and when I tried to use RML I was able to do it by preprocessing the json and adding id keys ("_id") to each object and then I can reference them in the mapping to find the appropriate parent. Here is the RML that uses those id keys I added. |
Hi @bjdmeest, I'm not sure I get the whole picture here as I did not follow each of the issues, but I would say yes. The pushDown can be used not only in the rml:iterator but also in any sub-iteration made in a nested term map. Yet if you could send me an example that would help me be more specific. |
Hmmm, reiterating on this, this is getting very complex (see below for some pseudocode), probably it's better to take this into account when thinking about joins (as being discussed at https://github.com/kg-construct/rml-fno-spec/issues/2 ) - triplesmap
- logicalsource
- iterator: $.parent[*]
- subjectmap: "ex:{parentID}"
class: "ex:Parent"
- triplesmap2
- logicalsource
- iterator: $.parent[*].child[*]
- subjectmap: "ex:{childID}"
- po
- predicate: "ex:nestedName"
object:
function: joinvalues
parameters:
value1: "{childID}_"
value2:
referencingobject:
parenttriplesmap: triplesmap
# joinOnSameParentIteration
# from ParentIteration, take value "parentID" |
Ok I think I got it. So the answer is no, pushDown will not be sufficient in this case. I'll try to explain but I'm afraid that's not gonna be clear ;). When you evaluate the iterator The pushDown makes it possible to work one hierarchical level at a time, but it has a limitation: The example below will mix up all children from a given parent: the subject map will generate all terms "ex:{childID}" of a given parent, and those will be mixed them up with all (predicate object) couples for the same parent. triplesmap2
- logicalsource
- iterator: $.parent[*]
- pushdown:
- reference: $.parent[*]
- as: theParent
- subjectmap:
reference: $.child[*]
nestedTermMap: "ex:{childID}"
- po
- predicate: "ex:nestedName"
object:
- reference: $.child[*]
- pushdown:
- reference: $.theParent
- as: theParent
- nestedTermMap:
- parenttriplesmap: triplesmap
... # some join condition involving theParent and parentID I hope I answer your question, I'm still not sure I do ;). |
It does for me :). A possible approach to tackle this, is the Fields approach I assume? Where you basically create your own iterations so you can make sure that they remain 'in sync'? Another solution would be to do everything via join conditions, but then we need some kind of 'iteration identifier' (and in this case, nested iteration identifiers, so (grand)children can refer to (grand)parent iterations), so you can join on iteration instead of on data values, cfr my previous comment? Maybe it's a good idea to pursue both? |
Additional use case in favor of 'iteration identifiers': being able to get the 'accessing' reference formulation per reference is needed at RMLio/yarrrml-parser#184. So, I can imagine for CSV, per iteration you need to be able to identify that iteration (in this case, the So a CSV file like below
Could have a CSV iteration like below
Could actually have following references (relying on https://w3c.github.io/csvw/metadata/#uri-template-properties)
For JSONPath, you could include the actual used path for each iteration and reference It won't create the most elegant mappings, but gives a lot of context for users to hack stuff together |
issue: not possible to scope joins that will only occur within the scope of a single iteration
suggestion: add rml:subTriplesMap (or rml:nestedTriplesMap) as a property of rml:RefObjectMap. This will allow engines to resolve joins within a single iteration, covering the much occurring case where certain related objects only occur with the sub-hierarchy / sub-graph of other objects. This will also improve the streamability of mappings .
The text was updated successfully, but these errors were encountered: