Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🤯 Cognitarium: Remove blank node filtering from messages #494

Closed
amimart opened this issue Feb 24, 2024 · 2 comments · Fixed by #499
Closed

🤯 Cognitarium: Remove blank node filtering from messages #494

amimart opened this issue Feb 24, 2024 · 2 comments · Fixed by #499
Assignees
Labels
bug Something isn't working

Comments

@amimart
Copy link
Member

amimart commented Feb 24, 2024

Remove the possibility to target blank nodes through the cognitarium messages interface when using triple pattern VarOrNode & VarOrNodeOrLiteral.

Description

Blank nodes shall have an internal identifier that should not be used externally, they should only represent a link between triples without the possibility to target them directly from external interfaces. Moreover, I think their internal value should not be exposed, the IdentifierIssuer could be used for this purpose.

Proposal

I propose to remove from the VarOrNode and VarOrNodeOrLiteral the possibility of referencing a blank node by its name. And rename the blank nodes when exposing query results in a deterministic manner.

I think this matter should be addressed in conjunction with #434.

@amimart amimart added the bug Something isn't working label Feb 24, 2024
@github-project-automation github-project-automation bot moved this to 📋 Backlog in 💻 Development Feb 24, 2024
@ccamel
Copy link
Member

ccamel commented Feb 26, 2024

Ok this is a tricky one 😅

First, let's agree on the definition of a blank node:

Blank nodes in RDF graphs can be used to represent values known to exist but whose identity remains unknown.

Now, to dissect the scenarios:

  1. Regarding querying

As per our initial understanding, blank nodes, by nature, should not be identifiable by name due to their indeterminate identities. Nevertheless, they can be indirectly addressed using a distinct syntax (this is the purpose of BlankNode defined in our contract):

SELECT ?a ?b
WHERE {
    ?a :predicate _:my_blank_node .
    _:my_blank_node :otherPredicate ?b .
}

In this example, my_blank_node functions merely as a placeholder, not an actual identifier of the node. This behaviour is designed in our contract by the following type, where the String denotes the name of the placeholder:

    /// # BlankNode
    /// An RDF [blank node](https://www.w3.org/TR/rdf11-concepts/#dfn-blank-node).
    BlankNode(String),

Of course, you could say that this is equivalent to using a variable:

SELECT ?a ?b
WHERE {
    ?a :predicate ?my_variable_for_blank_node .
    ?my_variable_for_blank_node :otherPredicate ?b .
}

Although similar, the distinction lies in the scope of a blank node "selector," which narrows the query results to exclusively blank node resources, as opposed to a general variable that encompasses all types of resources (named, blank, and literals).

From this perspective, and with respect to the Sparql specification, my inclination is to retain the BlankNode term in the contract.

  1. Regarding results

The management of blank nodes raises a valid question: what identifiers are assigned to them in the results? The identifiers should be unique per result set and consistent for the same query within the same context.

If you want is to keep their values hidden, I totally subscribe, considering that blank node identifiers are inherently ephemeral and meant to be specific to a particular serialization. Therefore, a serialization strategy that reassigns blank node identifiers within its context is necessary. This could effectively be achieved by implementing a serialization-specific counter, or any similar approach.

@amimart
Copy link
Member Author

amimart commented Feb 26, 2024

First, let's agree on the definition of a blank node:

Blank nodes in RDF graphs can be used to represent values known to exist but whose identity remains unknown.

Thanks to enlighten this confusion I was having 🙏, I understand the need to keep it as a way to express unnamed relations when querying.

Considering the implementation this is an issue as blank nodes are currently managed in a similar manner than named nodes in the query engine.

  1. Regarding results

The management of blank nodes raises a valid question: what identifiers are assigned to them in the results? The identifiers should be unique per result set and consistent for the same query within the same context.

If you want is to keep their values hidden, I totally subscribe, considering that blank node identifiers are inherently ephemeral and meant to be specific to a particular serialization. Therefore, a serialization strategy that reassigns blank node identifiers within its context is necessary. This could effectively be achieved by implementing a serialization-specific counter, or any similar approach.

Totally agree, this would be way better.

This was referenced Feb 27, 2024
@github-project-automation github-project-automation bot moved this from 📋 Backlog to ✅ Done in 💻 Development Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants