-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/blank node identifier #499
Conversation
WalkthroughThe updates across various modules in the OKP4 Cognitarium contracts focus on enhancing blank node handling, refining triple pattern structures, and improving query and storage mechanisms. These changes aim to ensure unique blank node identification per insertion and remove the ability to directly reference blank nodes through message interfaces, aligning with the objectives to prevent blank node conflicts and remove blank node filtering from messages. Changes
Assessment against linked issues
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
size-limit report 📦
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #499 +/- ##
==========================================
- Coverage 96.29% 96.22% -0.08%
==========================================
Files 55 55
Lines 12693 12774 +81
==========================================
+ Hits 12223 12292 +69
- Misses 470 482 +12 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 9
Configuration used: CodeRabbit UI
Files selected for processing (17)
- contracts/okp4-cognitarium/src/contract.rs (41 hunks)
- contracts/okp4-cognitarium/src/msg.rs (7 hunks)
- contracts/okp4-cognitarium/src/querier/engine.rs (28 hunks)
- contracts/okp4-cognitarium/src/querier/mapper.rs (1 hunks)
- contracts/okp4-cognitarium/src/querier/mod.rs (1 hunks)
- contracts/okp4-cognitarium/src/querier/plan.rs (3 hunks)
- contracts/okp4-cognitarium/src/querier/plan_builder.rs (21 hunks)
- contracts/okp4-cognitarium/src/querier/variable.rs (9 hunks)
- contracts/okp4-cognitarium/src/rdf/mapper.rs (1 hunks)
- contracts/okp4-cognitarium/src/state/blank_nodes.rs (1 hunks)
- contracts/okp4-cognitarium/src/state/mod.rs (1 hunks)
- contracts/okp4-cognitarium/src/state/triples.rs (5 hunks)
- contracts/okp4-cognitarium/src/storer/engine.rs (8 hunks)
- contracts/okp4-dataverse/src/contract.rs (2 hunks)
- contracts/okp4-dataverse/src/registrar/registry.rs (2 hunks)
- docs/okp4-cognitarium.md (5 hunks)
- packages/okp4-rdf/src/normalize.rs (9 hunks)
Files not summarized due to errors (1)
- contracts/okp4-cognitarium/src/contract.rs: Error: Message exceeds token limit
Additional comments: 62
contracts/okp4-cognitarium/src/state/mod.rs (1)
- 1-1: The addition of the
blank_nodes
module and its public re-exportation aligns with the project's structure and naming conventions.contracts/okp4-cognitarium/src/querier/mod.rs (1)
- 9-9: Exporting
ResolvedVariables
from thevariable
module enhances the module's usability by other components within the project.contracts/okp4-cognitarium/src/state/blank_nodes.rs (1)
- 3-4: Introducing
BLANK_NODE_IDENTIFIER_COUNTER
as au128
counter for blank node identifiers aligns with the PR's objectives to ensure unique identification, leveraging theItem
storage mechanism effectively.contracts/okp4-cognitarium/src/querier/mapper.rs (1)
- 1-4: The simplification of
mapper.rs
by removing specific functions and dependencies onPredicate
,Subject
, and related logic aligns with the PR's objectives to streamline the querying process and overhaul blank node handling.contracts/okp4-dataverse/src/registrar/registry.rs (1)
- 7-8: Renaming
VarOrNode
toVarOrNamedNode
clarifies the distinction between blank and named nodes, aligning with the PR's objectives to improve the handling and querying of blank nodes.contracts/okp4-cognitarium/src/rdf/mapper.rs (1)
- 23-31: Updating the
TryFrom
implementation forProperty
to usemsg::IRI
instead ofmsg::Node
enhances the clarity and correctness of the conversion logic, aligning with the PR's objectives to refine the handling of IRIs and properties.contracts/okp4-cognitarium/src/querier/plan.rs (2)
- 25-34: The updates to
get_var_index
and the addition ofget_bnode_index
enhance the querying logic's efficiency and clarity, aligning with the PR's objectives to improve the handling of blank nodes.- 156-163: Updating
PatternValue
variants forsubject
andobject
in theQueryNode
struct aligns with the PR's objectives to refine the querying process and the handling of blank nodes, contributing to the querying logic's efficiency and clarity.contracts/okp4-cognitarium/src/state/triples.rs (4)
- 80-80: Changing the conversion method for
Subject::Blank
fromas_bytes()
toto_be_bytes()
aligns with the PR's objectives to ensure unique identification and improve performance.- 111-111: Updating the hashing method for
Object::Blank
to includeto_be_bytes().as_slice()
aligns with the PR's objectives to ensure unique identification and improve performance.- 134-135: Changing the type of
BlankNode
fromString
tou128
and introducing theBLANK_NODE_SIZE
constant aligns with the PR's objectives to ensure unique identification and improve performance.- 195-195: Updating test cases to use
u128
values forObject::Blank
ensures consistency with the changes made to the handling of blank nodes.contracts/okp4-cognitarium/src/storer/engine.rs (10)
- 3-4: The addition of
BLANK_NODE_IDENTIFIER_COUNTER
andBLANK_NODE_SIZE
to the imports is aligned with the PR objectives to handle blank nodes uniquely. Ensure these constants are properly defined elsewhere in the codebase and their values make sense in the context of blank node management.- 20-20: The introduction of
blank_node_id_issuer
inStoreEngine
is a crucial part of the new blank node handling mechanism. It's important to ensure that theIdentifierIssuer
is used consistently throughout the codebase for managing blank node identifiers.- 28-34: Initializing
blank_node_id_issuer
in thenew
method with a blank prefix and the loadedblank_node_id_counter
is a good practice. However, ensure that the counter is persisted correctly across transactions to maintain uniqueness of identifiers.- 61-69: The update to
rio_to_triple
to include anid_issuer
parameter and its usage within the function is aligned with the objective of managing blank nodes effectively. Ensure that all calls to this function are updated to pass theIdentifierIssuer
instance.- 171-171: Saving the
blank_node_id_issuer.counter
in thesave
method is essential for persisting the state of the identifier issuer. Verify that this operation is atomic and consistent, especially in concurrent environments, to avoid identifier conflicts.- 187-198: The modifications to
rio_to_triple
,rio_to_subject
, andrio_to_object
to utilize theIdentifierIssuer
for blank nodes are well-implemented. These changes are central to achieving the PR's objectives regarding blank node management. Ensure that the error handling for unsupported RDF star syntax is consistent with the application's overall error handling strategy.- 212-214: The handling of blank nodes in
rio_to_subject
using theIdentifierIssuer
to generate unique identifiers is a key improvement. This approach ensures that blank nodes are uniquely identifiable, aligning with the PR's goals.- 230-241: Similarly, the handling of blank nodes in
rio_to_object
is consistent with the changes inrio_to_subject
. This consistency is important for maintaining a uniform approach to blank node management across the system.- 278-278: The use of
BLANK_NODE_SIZE
for calculating the size of blank subjects is a good practice. Ensure thatBLANK_NODE_SIZE
is set to a reasonable default that accurately reflects the storage requirements for blank nodes.- 296-296: Similarly, using
BLANK_NODE_SIZE
for calculating the size of blank objects ensures consistency in how the size of different components is calculated. This consistency is crucial for accurate storage and performance metrics.contracts/okp4-cognitarium/src/querier/variable.rs (5)
- 4-4: The addition of
use okp4_rdf::normalize::IdentifierIssuer
is necessary for the new handling of blank nodes. This import should be used wherever blank node identifiers need to be generated or managed.- 20-20: Changing the handling of
Object::Blank
to dereference the node is a logical step given the new approach to managing blank nodes. Ensure that this change is consistently applied whereverObject::Blank
is used.- 45-45: Similarly, the handling of
Subject::Blank
has been updated to dereference the node. This change is consistent with the handling ofObject::Blank
and aligns with the new approach to blank node management.- 59-65: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [52-75]
The modification of the
as_value
method to accept anIdentifierIssuer
parameter is crucial for generating values for blank nodes. This change ensures that blank nodes can be uniquely identified in queries, aligning with the PR's objectives. Ensure that all calls toas_value
are updated to pass theIdentifierIssuer
instance.
- 159-161: The adjustments to test cases to use numeric values for
Subject::Blank
andObject::Blank
and to reflect changes in value generation usingIdentifierIssuer
are appropriate. These changes ensure that the tests accurately represent the new handling of blank nodes. Verify that all relevant test cases have been updated accordingly.Also applies to: 230-232, 254-256
contracts/okp4-dataverse/src/contract.rs (1)
- 134-135: The change from
VarOrNode
toVarOrNamedNode
for thepredicate
in theTriplePattern
structure within the test module aligns with the PR objectives to refine the handling of blank nodes. This modification ensures that predicates are explicitly named nodes, which supports the overall goal of improving the management and querying of blank nodes in the graph database context. This change is consistent and appears correctly implemented in the context of the provided code. However, it's important to verify that all references and usages ofTriplePattern
throughout the test suite and potentially in the production code have been updated to reflect this change, ensuring no broken functionality or unintended behavior.contracts/okp4-cognitarium/src/msg.rs (5)
- 54-54: The predicate structure in
ExecuteMsg::DeleteData
has been updated to use a named node with a prefixed IRI. This change aligns with the PR's objective to modify the handling of blank nodes and predicates. Ensure that all references and usages of this structure are updated accordingly to prevent any runtime errors.- 70-72: The introduction of
TripleDeleteTemplate
for specifying triple templates to delete is a significant change. This aligns with the PR's objective to improve the handling of blank nodes and streamline deletion operations. Ensure that the implementation ofTripleDeleteTemplate
correctly handles all possible variations of subjects, predicates, and objects, including blank nodes, named nodes, and literals.- 487-497: The
TripleDeleteTemplate
struct has been correctly defined to include subjects, predicates, and objects asVarOrNamedNode
andVarOrNamedNodeOrLiteral
. This definition is crucial for the new handling of blank nodes and ensures flexibility in specifying triples for deletion. Review the implementation to ensure that it correctly handles the serialization and deserialization of these fields, especially considering the introduction of theVarOrNamedNodeOrLiteral
enum.- 499-509: The
TripleConstructTemplate
struct is introduced to specify triples for construct queries. This change is consistent with the PR's objectives and provides a clear structure for constructing triples. Ensure that the implementation ofTripleConstructTemplate
correctly handles the serialization and deserialization of subjects, predicates, and objects, considering the use ofVarOrNode
andVarOrNodeOrLiteral
. This is crucial for the correct functioning of construct queries.- 563-577: The addition of the
VarOrNamedNodeOrLiteral
enum is a significant improvement, providing flexibility in specifying subjects, predicates, and objects in triples. This change is crucial for the new handling of blank nodes and enhances the expressiveness of the system. Ensure that the implementation correctly handles all cases of this enum, especially in the context of serialization and deserialization.packages/okp4-rdf/src/normalize.rs (9)
- 41-44: The initialization of
canonical_issuer
with au128
counter is a significant change aligning with the PR's objective to handle larger ranges of identifiers. This change is crucial for ensuring the uniqueness of blank node identifiers across different insertions.- 75-75: Reinitializing
canonical_issuer
with au128
counter during thereset
method ensures consistency in handling blank node identifiers throughout the normalization process. This consistency is vital for maintaining the integrity of blank node identifiers across operations.- 130-130: The use of
get_or_issue
withinlabel_unique_nodes
method to ensure that each unique node receives a canonical identifier is aligned with the PR's objectives. This approach effectively prevents conflicts and ensures that blank nodes are exposed with unique, ephemeral identifiers.- 152-153: Creating a new
IdentifierIssuer
with au128
counter for scoped issuers withincompute_n_degree_hashes
method is a thoughtful approach. It ensures that temporary identifiers within the scope of n-degree hash computation are handled consistently with the overall changes.- 163-163: The use of
get_or_issue
for nodes within thecompute_n_degree_hashes
method ensures that all nodes, whether they are part of the n-degree hash computation or not, receive a canonical identifier. This consistency is crucial for the integrity of the normalization process.- 218-218: Initializing a new
IdentifierIssuer
for choosing the best path incompute_n_degree_hash
method aligns with the PR's objectives. This approach ensures that the selection of the best path for n-degree hashes is handled with the same consistency and uniqueness as other parts of the normalization process.- 236-236: The introduction of
get_str_or_issue
method within thecompute_n_degree_hash
method to handle identifier issuance is a significant improvement. It ensures that identifiers are issued in a consistent manner, aligning with the new handling of blank nodes.- 248-248: Reusing
get_str_or_issue
for recursion within thecompute_n_degree_hash
method demonstrates a consistent approach to identifier issuance. This reuse is crucial for maintaining the integrity and uniqueness of identifiers throughout the normalization process.- 333-370: The changes to the
IdentifierIssuer
struct, including the transition of thecounter
field fromu64
tou128
and the introduction of new methodsget_n_or_issue
andget_str_or_issue
, are well-aligned with the PR's objectives. These changes are crucial for ensuring the uniqueness of blank node identifiers and avoiding conflicts across different insertions. The implementation is clear, and the methods are well-documented, which enhances maintainability.contracts/okp4-cognitarium/src/querier/plan_builder.rs (3)
- 2-3: The introduction of
VarOrNamedNode
alongsideVarOrNode
andVarOrNodeOrLiteral
indicates a refinement in type distinctions for handling different kinds of nodes. This change aligns with the PR's objective to improve the handling of blank nodes by making type distinctions clearer. However, ensure that all usages of these types across the codebase are updated to reflect these changes, especially in contexts where the distinction between named nodes and blank nodes is critical.- 92-100: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [76-97]
The
build_from_bgp
function's logic for determining the type of join (ForLoopJoin vs. CartesianProductJoin) based on the intersection of bound variables is a critical part of query optimization. Ensure that the logic correctly identifies cases where a ForLoopJoin is more efficient than a CartesianProductJoin and vice versa. This is crucial for performance, especially in complex queries.
- 609-625: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [483-668]
The
build_plan
test cases demonstrate the system's ability to construct query plans from various triple patterns, including those with blank nodes and named nodes. It's important to ensure comprehensive coverage of all new logic introduced in this PR, including the handling of blank nodes in different contexts (e.g., as subjects, predicates, and objects). Additionally, verify that the performance improvements mentioned in the PR objectives are reflected in these test cases, possibly through benchmark tests.docs/okp4-cognitarium.md (1)
- 747-780: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [539-769]
The introduction of
TripleConstructTemplate
andTripleDeleteTemplate
entities is a significant change that directly supports the PR's objectives to enhance the handling of RDF triples, especially in the context of construct and delete operations. These entities allow for more precise specification of triples to be constructed or deleted, which is crucial for the improved handling of blank nodes and overall system performance.It's essential to ensure that the documentation for these entities is clear and comprehensive. The descriptions provided for the properties of these entities (
object
,predicate
,subject
) are concise and accurately reflect their roles within the triple patterns. This clarity helps users understand how to use these new entities effectively in their queries.The documentation for
TripleConstructTemplate
andTripleDeleteTemplate
entities is well-written and aligns with the PR's objectives. It provides clear guidance on how to specify triples for construct and delete operations, which is crucial for leveraging the system's enhanced capabilities.contracts/okp4-cognitarium/src/querier/engine.rs (5)
- 78-84: The
construct_triples
function simplifies the creation of triples from a query plan. This change appears straightforward and correctly utilizes theResolvedTripleIterator
. Ensure that the templates provided are correctly formatted and validated before being passed to this function.- 997-1005: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1000-1039]
The test setup includes saving a blank node identifier counter to the storage. This is a critical part of ensuring that blank nodes are handled correctly in the tests. Verify that the counter is correctly initialized and used in subsequent operations to maintain unique identifiers for blank nodes. Additionally, ensure that the test data and the store setup are correctly initialized to reflect the changes made in the main code.
- 1378-1407: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1381-1433]
The
ForLoopJoinIterator
test cases are designed to verify the correct behavior of the iterator in various scenarios. It's important to ensure that the test cases cover all relevant situations, including empty inputs and combinations of left and right inputs. Additionally, verify that the merging of variables is done correctly and that the expected results match the actual behavior of the iterator.
- 1442-1476: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1445-1507]
The
CartesianProductJoinIterator
test cases aim to validate the iterator's behavior in handling Cartesian products of inputs. Ensure that the test cases are comprehensive and cover different combinations of left and right inputs, including empty cases. The merging of variables and the construction of expected results should be carefully checked to ensure accuracy.
- 1516-1527: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1519-1603]
The
triple_pattern_iter_compute_io
test function checks the computation of input and output for triple patterns. This is crucial for ensuring that the triple pattern iterator works correctly with various subject, predicate, and object configurations. Pay special attention to the handling of blank variables and ensure that the expected outcomes are correctly defined for each test case.contracts/okp4-cognitarium/src/contract.rs (11)
- 27-27: Initialization of
BLANK_NODE_IDENTIFIER_COUNTER
with0u128
ensures that blank node identifiers start from a known state. This is crucial for the new handling mechanism of blank nodes to function correctly.- 53-60: The expanded import list, including
TripleDeleteTemplate
andTripleConstructTemplate
, reflects the PR's objectives to introduce new types for delete and construct operations. This change is expected to support the differentiation between operations and enhance the system's ability to handle blank nodes effectively.- 94-145: The
delete
function's logic to handle both empty and non-emptydelete
vectors usingEither
(from theeither
crate) is a clever way to unify the handling of direct deletion templates and derived deletions from thewhere
clause. This approach simplifies the code and makes it more maintainable. However, ensure that the downstream handling ofLeft
andRight
variants ofEither
is correctly implemented to avoid runtime errors.- 131-141: The handling of deletions without a
where
clause by directly resolving templates against an empty set of variables is a significant performance optimization. It bypasses unnecessary query execution for straightforward deletion cases. This change aligns with the PR's objectives to improve performance.- 176-182: Introduction of
IdentifierIssuer
in the query module suggests an enhancement in handling blank nodes, specifically in the context of thedescribe
operation. This aligns with the PR's goal to ensure unique identification of blank nodes, preventing conflicts and ensuring consistency across operations.- 250-259: The logic in the
describe
function to dynamically construct triples based on the query's resource type (variable or named node) demonstrates flexibility in handling different query scenarios. This adaptability is crucial for a graph database context, especially when dealing with complex RDF data structures.- 295-313: The use of
IdentifierIssuer
in theconstruct
function to ensure unique identifiers for blank nodes is a critical improvement. It directly addresses the PR's objective to manage blank nodes more effectively. This change enhances the system's ability to handle RDF data consistently and without conflicts.- 336-345: The utility functions, particularly
map_select_solutions
andconstruct_atoms
, show a deep integration of the new blank node handling mechanism (IdentifierIssuer
) into the system. This integration is essential for achieving the PR's objectives related to blank node management and performance improvements.- 495-498: The test
proper_initialization
verifying the initial state ofBLANK_NODE_IDENTIFIER_COUNTER
is a good practice. It ensures that the system's state aligns with expectations right from the start, which is crucial for the reliable operation of the blank node handling mechanism.- 620-652: The test
proper_insert_blank_nodes
effectively validates the increment ofBLANK_NODE_IDENTIFIER_COUNTER
upon inserting data containing blank nodes. This test is essential for ensuring the robustness of the blank node identification system introduced in this PR.- 2192-2326: The
proper_construct
test cases demonstrate the system's ability to handle complex RDF data, including blank nodes, using the new constructs introduced in this PR. These tests are crucial for validating the functionality and performance improvements related to blank node handling and query execution.
|
||
#[cfg(test)] | ||
mod tests { | ||
use crate::msg::Literal::Simple; | ||
use crate::msg::Node::{BlankNode, NamedNode}; | ||
use crate::msg::IRI::{Full, Prefixed}; | ||
use crate::msg::{ | ||
InstantiateMsg, StoreLimitsInput, TriplePattern, VarOrNode, VarOrNodeOrLiteral, | ||
}; | ||
use crate::msg::{InstantiateMsg, StoreLimitsInput}; | ||
use cosmwasm_std::Uint128; | ||
use schemars::_serde_json; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [618-658]
The tests for default deserialization of StoreLimitsInput
and InstantiateMsg
are crucial for ensuring that the system's limitations and instantiation parameters are correctly handled. These tests validate the default values for various store limits, which is important for the system's robustness and reliability. Ensure that additional tests are added to cover the new changes introduced in this PR, especially regarding the handling of blank nodes and the new structs and enums.
Would you like assistance in adding more comprehensive tests to cover the new changes introduced in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great! ❤️
Closes #434
Closes #494
Primarily intended to implement changes discussed in the issues mentioned just above: use internal identifiers for blank to avoid conflicts across insertions, and expose blank nodes in queries as ephemeral and generated identifiers. The impacts being structural the changes present in this PR ended up to be huge, I'll try to explain them.
Blank nodes identifiers
Internally the blank nodes are persisted as
u128
, unicity being ensured by incrementing a counter, theIdentifierIssuer
present in theokp4-rdf
package has slightly been modified to ease this purpose.When querying triples, to map back the blank nodes to an ephemeral identifier the
IdentifierIssuer
is also used, generating identifiers prefixed byb
(e.g._:b0
,_:b1
, etc...).To avoid conflict with provided blank nodes in a
Construct
, those are renamed using the same strategy but with a different prefix:a
(e.g._:a0
,_:a1
, etc...).API changes
TriplePattern
The predicate part of
TriplePattern
in where clauses can not longer take a blank node as value, being impossible by spec, and in the state.DeleteData
The
delete
field containing triple patterns to be deleted has now its own type (i.e.TripleDeleteTemplate
instead ofTriplePattern
), as in this selection we must not be able to select blank nodes.Construct
For consistency purposes, and because "template" means construction conversely of "pattern" which denotes a filter, the
construct
field has now its own type (i.e.TripleConstructTemplate
), this brings no breaking change as it is identical thanTriplePattern
in its content.Behaviour changes
Regarding the
DeleteData
execution message, to allow the deletion of provided triples the where clause is not mandatory as soon as the delete selection contains not variables.Performances
The operations of
DeleteData
,Construct
andDescribe
doesn't rely anymore on theQueryEngine::select
, instead of applying the templates on resolved solutions, we directly apply them on the resolved variables. This allow a needed fine control on the resolution, and brings better performances by removing a mapping layer.Summary by CodeRabbit