Nested triples maps #6

dachafra · 2020-10-21T18:24:28Z

issue: not possible to scope joins that will only occur within the scope of a single iteration

suggestion: add rml:subTriplesMap (or rml:nestedTriplesMap) as a property of rml:RefObjectMap. This will allow engines to resolve joins within a single iteration, covering the much occurring case where certain related objects only occur with the sub-hierarchy / sub-graph of other objects. This will also improve the streamability of mappings .

bjdmeest · 2022-03-02T07:23:20Z

Related discussion: kg-construct/rml-questions#11

bjdmeest · 2022-03-02T07:57:14Z

copying conclusion of kg-construct/rml-questions#11 here: what makes this trivial in, e.g., SPARQL-Anything, is the fact that values from the same iteration can be linked with some kind of 'iteration identifier' (that is independent of the actual data values).

The issue is that a join needs some kind of join condition, whereas here, the join condition is basically 'be the same iteration'. If there would be a way to define this in RML, there would be no need for the current join condition.

Maybe it's a good idea to think of assigning each iteration with a some kind of ID by default, and have some way to refer to that id in the mapping? Or maybe even some additional functionality so we can do things like 'do join condition with the previous iteration' (in cases where order in a data source is actually meaningful), 'do join condition with the parent iteration' (in cases where data sources actually have hierarchy) etc.? That way, we keep a single construct in the RML language (i.e., join conditions), and the engines can optimize however they want. (no-one's to say we can't provide shortcuts as well ofc)

In fact, e.g., in JSONpath, part of the spec is to return the normalized path expression, as an alternative to the actual values, which we could see as the iteration ID? (see bottom of https://goessner.net/articles/JsonPath/ )

bjdmeest · 2022-03-02T08:03:51Z

I'm now wondering how much #20 (access fields in parent iteration) is actually this (join on iteration) + #29 (join to get literal instead of term) @frmichel do you think pushDown can be seen as syntactic sugar over 'join on parent iteration + get literal from join'?

justin2004 · 2022-03-02T14:11:35Z

do join condition with the parent iteration

@bjdmeest

I need that often with JSON sources and when I tried to use RML I was able to do it by preprocessing the json and adding id keys ("_id") to each object and then I can reference them in the mapping to find the appropriate parent.

e.g. I changed this to this.

Here is the RML that uses those id keys I added.

frmichel · 2022-03-02T16:13:07Z

I'm now wondering how much kg-construct/rml-cc#5 (access fields in parent iteration) is actually this (join on iteration) + kg-construct/rml-core#29 (join to get literal instead of term) @frmichel do you think pushDown can be seen as syntactic sugar over 'join on parent iteration + get literal from join'?

Hi @bjdmeest, I'm not sure I get the whole picture here as I did not follow each of the issues, but I would say yes. The pushDown can be used not only in the rml:iterator but also in any sub-iteration made in a nested term map. Yet if you could send me an example that would help me be more specific.

bjdmeest · 2022-03-04T11:14:49Z

Hmmm, reiterating on this, this is getting very complex (see below for some pseudocode), probably it's better to take this into account when thinking about joins (as being discussed at https://github.com/kg-construct/rml-fno-spec/issues/2 )

- triplesmap
  - logicalsource
    - iterator: $.parent[*]
  - subjectmap: "ex:{parentID}"
    class: "ex:Parent"
- triplesmap2
  - logicalsource
    - iterator: $.parent[*].child[*]
  - subjectmap: "ex:{childID}"
  - po
    - predicate: "ex:nestedName"
      object:
        function: joinvalues
        parameters:
        value1: "{childID}_" 
        value2:
          referencingobject:
            parenttriplesmap: triplesmap
            # joinOnSameParentIteration
            # from ParentIteration, take value "parentID"

frmichel · 2022-03-09T20:00:49Z

Ok I think I got it. So the answer is no, pushDown will not be sufficient in this case. I'll try to explain but I'm afraid that's not gonna be clear ;).

When you evaluate the iterator $.parent[*].child[*] on input documents, you mix up all parents and all children. You don't keep the association of parents to children. So this has to be done one hierarchical level at a time: first an iterator on parent[], then on child[].

The pushDown makes it possible to work one hierarchical level at a time, but it has a limitation:
The first iteration level can be set in the logical source. Such that the fields pushed down from the logical source will be available for all term maps (subject, pred, and object).
The next levels will be set in (nested)term maps. But the fields pushed down from a (nested)term map will be available only for subsequent nested term maps, that is, within the context of either a unique subject map, or a unique object map, but not for both at the same time. Since the iterations in the subject map and the object map are not "in sync", that fails.

The example below will mix up all children from a given parent: the subject map will generate all terms "ex:{childID}" of a given parent, and those will be mixed them up with all (predicate object) couples for the same parent.

 triplesmap2
  - logicalsource
    - iterator: $.parent[*]
    - pushdown:
      - reference: $.parent[*]
      - as: theParent
  - subjectmap:
        reference: $.child[*]
        nestedTermMap: "ex:{childID}"
  - po
    - predicate: "ex:nestedName"
      object:
        - reference: $.child[*]
        - pushdown:
          - reference: $.theParent
          - as: theParent
        - nestedTermMap: 
           - parenttriplesmap: triplesmap
           ... # some join condition involving theParent and parentID

I hope I answer your question, I'm still not sure I do ;).

bjdmeest · 2022-03-14T10:10:05Z

It does for me :). A possible approach to tackle this, is the Fields approach I assume? Where you basically create your own iterations so you can make sure that they remain 'in sync'? Another solution would be to do everything via join conditions, but then we need some kind of 'iteration identifier' (and in this case, nested iteration identifiers, so (grand)children can refer to (grand)parent iterations), so you can join on iteration instead of on data values, cfr my previous comment? Maybe it's a good idea to pursue both?

bjdmeest · 2023-02-09T10:43:06Z

Additional use case in favor of 'iteration identifiers': being able to get the 'accessing' reference formulation per reference is needed at RMLio/yarrrml-parser#184.

So, I can imagine for CSV, per iteration you need to be able to identify that iteration (in this case, the row index would be enough), and per reference you need to identify that reference formulation (in this case, the combination row index / column index would be enough).

So a CSV file like below

lastname	firstname
De Meester	Ben
Chaves	David

Could have a CSV iteration like below

firstname
David

Could actually have following references (relying on https://w3c.github.io/csvw/metadata/#uri-template-properties)

_sourceRow	firstname	firstname_sourceColumn
2	David	1

For JSONPath, you could include the actual used path for each iteration and reference
e.g. iteration $.persons[*] with reference * would give, for the first iteration, identifier $.persons[0] and reference identifiers lastname and firstname.

It won't create the most elegant mappings, but gives a lot of context for users to hack stuff together

dachafra added rml rml issues representation representation issues labels Oct 21, 2020

dachafra assigned bjdmeest Mar 14, 2022

bjdmeest mentioned this issue Feb 9, 2023

Iteration and reference 'identifiers' #43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nested triples maps #6

Nested triples maps #6

dachafra commented Oct 21, 2020

bjdmeest commented Mar 2, 2022

bjdmeest commented Mar 2, 2022

bjdmeest commented Mar 2, 2022

justin2004 commented Mar 2, 2022 •

edited

Loading

frmichel commented Mar 2, 2022

bjdmeest commented Mar 4, 2022

frmichel commented Mar 9, 2022 •

edited

Loading

bjdmeest commented Mar 14, 2022

bjdmeest commented Feb 9, 2023

Nested triples maps #6

Nested triples maps #6

Comments

dachafra commented Oct 21, 2020

bjdmeest commented Mar 2, 2022

bjdmeest commented Mar 2, 2022

bjdmeest commented Mar 2, 2022

justin2004 commented Mar 2, 2022 • edited Loading

frmichel commented Mar 2, 2022

bjdmeest commented Mar 4, 2022

frmichel commented Mar 9, 2022 • edited Loading

bjdmeest commented Mar 14, 2022

bjdmeest commented Feb 9, 2023

justin2004 commented Mar 2, 2022 •

edited

Loading

frmichel commented Mar 9, 2022 •

edited

Loading