Conversation
WalkthroughThe updates across various modules in the OKP4 Cognitarium contracts focus on enhancing blank node handling, refining triple pattern structures, and improving query and storage mechanisms. These changes aim to ensure unique blank node identification per insertion and remove the ability to directly reference blank nodes through message interfaces, aligning with the objectives to prevent blank node conflicts and remove blank node filtering from messages. Changes
Assessment against linked issues
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
size-limit report 📦
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #499 +/- ##
==========================================
- Coverage 96.29% 96.22% -0.08%
==========================================
Files 55 55
Lines 12693 12774 +81
==========================================
+ Hits 12223 12292 +69
- Misses 470 482 +12 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Review Status
Actionable comments generated: 9
Configuration used: CodeRabbit UI
Files selected for processing (17)
- contracts/okp4-cognitarium/src/contract.rs (41 hunks)
- contracts/okp4-cognitarium/src/msg.rs (7 hunks)
- contracts/okp4-cognitarium/src/querier/engine.rs (28 hunks)
- contracts/okp4-cognitarium/src/querier/mapper.rs (1 hunks)
- contracts/okp4-cognitarium/src/querier/mod.rs (1 hunks)
- contracts/okp4-cognitarium/src/querier/plan.rs (3 hunks)
- contracts/okp4-cognitarium/src/querier/plan_builder.rs (21 hunks)
- contracts/okp4-cognitarium/src/querier/variable.rs (9 hunks)
- contracts/okp4-cognitarium/src/rdf/mapper.rs (1 hunks)
- contracts/okp4-cognitarium/src/state/blank_nodes.rs (1 hunks)
- contracts/okp4-cognitarium/src/state/mod.rs (1 hunks)
- contracts/okp4-cognitarium/src/state/triples.rs (5 hunks)
- contracts/okp4-cognitarium/src/storer/engine.rs (8 hunks)
- contracts/okp4-dataverse/src/contract.rs (2 hunks)
- contracts/okp4-dataverse/src/registrar/registry.rs (2 hunks)
- docs/okp4-cognitarium.md (5 hunks)
- packages/okp4-rdf/src/normalize.rs (9 hunks)
Files not summarized due to errors (1)
- contracts/okp4-cognitarium/src/contract.rs: Error: Message exceeds token limit
Additional comments: 62
contracts/okp4-cognitarium/src/state/mod.rs (1)
- 1-1: The addition of the
blank_nodesmodule and its public re-exportation aligns with the project's structure and naming conventions.contracts/okp4-cognitarium/src/querier/mod.rs (1)
- 9-9: Exporting
ResolvedVariablesfrom thevariablemodule enhances the module's usability by other components within the project.contracts/okp4-cognitarium/src/state/blank_nodes.rs (1)
- 3-4: Introducing
BLANK_NODE_IDENTIFIER_COUNTERas au128counter for blank node identifiers aligns with the PR's objectives to ensure unique identification, leveraging theItemstorage mechanism effectively.contracts/okp4-cognitarium/src/querier/mapper.rs (1)
- 1-4: The simplification of
mapper.rsby removing specific functions and dependencies onPredicate,Subject, and related logic aligns with the PR's objectives to streamline the querying process and overhaul blank node handling.contracts/okp4-dataverse/src/registrar/registry.rs (1)
- 7-8: Renaming
VarOrNodetoVarOrNamedNodeclarifies the distinction between blank and named nodes, aligning with the PR's objectives to improve the handling and querying of blank nodes.contracts/okp4-cognitarium/src/rdf/mapper.rs (1)
- 23-31: Updating the
TryFromimplementation forPropertyto usemsg::IRIinstead ofmsg::Nodeenhances the clarity and correctness of the conversion logic, aligning with the PR's objectives to refine the handling of IRIs and properties.contracts/okp4-cognitarium/src/querier/plan.rs (2)
- 25-34: The updates to
get_var_indexand the addition ofget_bnode_indexenhance the querying logic's efficiency and clarity, aligning with the PR's objectives to improve the handling of blank nodes.- 156-163: Updating
PatternValuevariants forsubjectandobjectin theQueryNodestruct aligns with the PR's objectives to refine the querying process and the handling of blank nodes, contributing to the querying logic's efficiency and clarity.contracts/okp4-cognitarium/src/state/triples.rs (4)
- 80-80: Changing the conversion method for
Subject::Blankfromas_bytes()toto_be_bytes()aligns with the PR's objectives to ensure unique identification and improve performance.- 111-111: Updating the hashing method for
Object::Blankto includeto_be_bytes().as_slice()aligns with the PR's objectives to ensure unique identification and improve performance.- 134-135: Changing the type of
BlankNodefromStringtou128and introducing theBLANK_NODE_SIZEconstant aligns with the PR's objectives to ensure unique identification and improve performance.- 195-195: Updating test cases to use
u128values forObject::Blankensures consistency with the changes made to the handling of blank nodes.contracts/okp4-cognitarium/src/storer/engine.rs (10)
- 3-4: The addition of
BLANK_NODE_IDENTIFIER_COUNTERandBLANK_NODE_SIZEto the imports is aligned with the PR objectives to handle blank nodes uniquely. Ensure these constants are properly defined elsewhere in the codebase and their values make sense in the context of blank node management.- 20-20: The introduction of
blank_node_id_issuerinStoreEngineis a crucial part of the new blank node handling mechanism. It's important to ensure that theIdentifierIssueris used consistently throughout the codebase for managing blank node identifiers.- 28-34: Initializing
blank_node_id_issuerin thenewmethod with a blank prefix and the loadedblank_node_id_counteris a good practice. However, ensure that the counter is persisted correctly across transactions to maintain uniqueness of identifiers.- 61-69: The update to
rio_to_tripleto include anid_issuerparameter and its usage within the function is aligned with the objective of managing blank nodes effectively. Ensure that all calls to this function are updated to pass theIdentifierIssuerinstance.- 171-171: Saving the
blank_node_id_issuer.counterin thesavemethod is essential for persisting the state of the identifier issuer. Verify that this operation is atomic and consistent, especially in concurrent environments, to avoid identifier conflicts.- 187-198: The modifications to
rio_to_triple,rio_to_subject, andrio_to_objectto utilize theIdentifierIssuerfor blank nodes are well-implemented. These changes are central to achieving the PR's objectives regarding blank node management. Ensure that the error handling for unsupported RDF star syntax is consistent with the application's overall error handling strategy.- 212-214: The handling of blank nodes in
rio_to_subjectusing theIdentifierIssuerto generate unique identifiers is a key improvement. This approach ensures that blank nodes are uniquely identifiable, aligning with the PR's goals.- 230-241: Similarly, the handling of blank nodes in
rio_to_objectis consistent with the changes inrio_to_subject. This consistency is important for maintaining a uniform approach to blank node management across the system.- 278-278: The use of
BLANK_NODE_SIZEfor calculating the size of blank subjects is a good practice. Ensure thatBLANK_NODE_SIZEis set to a reasonable default that accurately reflects the storage requirements for blank nodes.- 296-296: Similarly, using
BLANK_NODE_SIZEfor calculating the size of blank objects ensures consistency in how the size of different components is calculated. This consistency is crucial for accurate storage and performance metrics.contracts/okp4-cognitarium/src/querier/variable.rs (5)
- 4-4: The addition of
use okp4_rdf::normalize::IdentifierIssueris necessary for the new handling of blank nodes. This import should be used wherever blank node identifiers need to be generated or managed.- 20-20: Changing the handling of
Object::Blankto dereference the node is a logical step given the new approach to managing blank nodes. Ensure that this change is consistently applied whereverObject::Blankis used.- 45-45: Similarly, the handling of
Subject::Blankhas been updated to dereference the node. This change is consistent with the handling ofObject::Blankand aligns with the new approach to blank node management.- 59-65: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [52-75]
The modification of the
as_valuemethod to accept anIdentifierIssuerparameter is crucial for generating values for blank nodes. This change ensures that blank nodes can be uniquely identified in queries, aligning with the PR's objectives. Ensure that all calls toas_valueare updated to pass theIdentifierIssuerinstance.
- 159-161: The adjustments to test cases to use numeric values for
Subject::BlankandObject::Blankand to reflect changes in value generation usingIdentifierIssuerare appropriate. These changes ensure that the tests accurately represent the new handling of blank nodes. Verify that all relevant test cases have been updated accordingly.Also applies to: 230-232, 254-256
contracts/okp4-dataverse/src/contract.rs (1)
- 134-135: The change from
VarOrNodetoVarOrNamedNodefor thepredicatein theTriplePatternstructure within the test module aligns with the PR objectives to refine the handling of blank nodes. This modification ensures that predicates are explicitly named nodes, which supports the overall goal of improving the management and querying of blank nodes in the graph database context. This change is consistent and appears correctly implemented in the context of the provided code. However, it's important to verify that all references and usages ofTriplePatternthroughout the test suite and potentially in the production code have been updated to reflect this change, ensuring no broken functionality or unintended behavior.contracts/okp4-cognitarium/src/msg.rs (5)
- 54-54: The predicate structure in
ExecuteMsg::DeleteDatahas been updated to use a named node with a prefixed IRI. This change aligns with the PR's objective to modify the handling of blank nodes and predicates. Ensure that all references and usages of this structure are updated accordingly to prevent any runtime errors.- 70-72: The introduction of
TripleDeleteTemplatefor specifying triple templates to delete is a significant change. This aligns with the PR's objective to improve the handling of blank nodes and streamline deletion operations. Ensure that the implementation ofTripleDeleteTemplatecorrectly handles all possible variations of subjects, predicates, and objects, including blank nodes, named nodes, and literals.- 487-497: The
TripleDeleteTemplatestruct has been correctly defined to include subjects, predicates, and objects asVarOrNamedNodeandVarOrNamedNodeOrLiteral. This definition is crucial for the new handling of blank nodes and ensures flexibility in specifying triples for deletion. Review the implementation to ensure that it correctly handles the serialization and deserialization of these fields, especially considering the introduction of theVarOrNamedNodeOrLiteralenum.- 499-509: The
TripleConstructTemplatestruct is introduced to specify triples for construct queries. This change is consistent with the PR's objectives and provides a clear structure for constructing triples. Ensure that the implementation ofTripleConstructTemplatecorrectly handles the serialization and deserialization of subjects, predicates, and objects, considering the use ofVarOrNodeandVarOrNodeOrLiteral. This is crucial for the correct functioning of construct queries.- 563-577: The addition of the
VarOrNamedNodeOrLiteralenum is a significant improvement, providing flexibility in specifying subjects, predicates, and objects in triples. This change is crucial for the new handling of blank nodes and enhances the expressiveness of the system. Ensure that the implementation correctly handles all cases of this enum, especially in the context of serialization and deserialization.packages/okp4-rdf/src/normalize.rs (9)
- 41-44: The initialization of
canonical_issuerwith au128counter is a significant change aligning with the PR's objective to handle larger ranges of identifiers. This change is crucial for ensuring the uniqueness of blank node identifiers across different insertions.- 75-75: Reinitializing
canonical_issuerwith au128counter during theresetmethod ensures consistency in handling blank node identifiers throughout the normalization process. This consistency is vital for maintaining the integrity of blank node identifiers across operations.- 130-130: The use of
get_or_issuewithinlabel_unique_nodesmethod to ensure that each unique node receives a canonical identifier is aligned with the PR's objectives. This approach effectively prevents conflicts and ensures that blank nodes are exposed with unique, ephemeral identifiers.- 152-153: Creating a new
IdentifierIssuerwith au128counter for scoped issuers withincompute_n_degree_hashesmethod is a thoughtful approach. It ensures that temporary identifiers within the scope of n-degree hash computation are handled consistently with the overall changes.- 163-163: The use of
get_or_issuefor nodes within thecompute_n_degree_hashesmethod ensures that all nodes, whether they are part of the n-degree hash computation or not, receive a canonical identifier. This consistency is crucial for the integrity of the normalization process.- 218-218: Initializing a new
IdentifierIssuerfor choosing the best path incompute_n_degree_hashmethod aligns with the PR's objectives. This approach ensures that the selection of the best path for n-degree hashes is handled with the same consistency and uniqueness as other parts of the normalization process.- 236-236: The introduction of
get_str_or_issuemethod within thecompute_n_degree_hashmethod to handle identifier issuance is a significant improvement. It ensures that identifiers are issued in a consistent manner, aligning with the new handling of blank nodes.- 248-248: Reusing
get_str_or_issuefor recursion within thecompute_n_degree_hashmethod demonstrates a consistent approach to identifier issuance. This reuse is crucial for maintaining the integrity and uniqueness of identifiers throughout the normalization process.- 333-370: The changes to the
IdentifierIssuerstruct, including the transition of thecounterfield fromu64tou128and the introduction of new methodsget_n_or_issueandget_str_or_issue, are well-aligned with the PR's objectives. These changes are crucial for ensuring the uniqueness of blank node identifiers and avoiding conflicts across different insertions. The implementation is clear, and the methods are well-documented, which enhances maintainability.contracts/okp4-cognitarium/src/querier/plan_builder.rs (3)
- 2-3: The introduction of
VarOrNamedNodealongsideVarOrNodeandVarOrNodeOrLiteralindicates a refinement in type distinctions for handling different kinds of nodes. This change aligns with the PR's objective to improve the handling of blank nodes by making type distinctions clearer. However, ensure that all usages of these types across the codebase are updated to reflect these changes, especially in contexts where the distinction between named nodes and blank nodes is critical.- 92-100: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [76-97]
The
build_from_bgpfunction's logic for determining the type of join (ForLoopJoin vs. CartesianProductJoin) based on the intersection of bound variables is a critical part of query optimization. Ensure that the logic correctly identifies cases where a ForLoopJoin is more efficient than a CartesianProductJoin and vice versa. This is crucial for performance, especially in complex queries.
- 609-625: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [483-668]
The
build_plantest cases demonstrate the system's ability to construct query plans from various triple patterns, including those with blank nodes and named nodes. It's important to ensure comprehensive coverage of all new logic introduced in this PR, including the handling of blank nodes in different contexts (e.g., as subjects, predicates, and objects). Additionally, verify that the performance improvements mentioned in the PR objectives are reflected in these test cases, possibly through benchmark tests.docs/okp4-cognitarium.md (1)
- 747-780: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [539-769]
The introduction of
TripleConstructTemplateandTripleDeleteTemplateentities is a significant change that directly supports the PR's objectives to enhance the handling of RDF triples, especially in the context of construct and delete operations. These entities allow for more precise specification of triples to be constructed or deleted, which is crucial for the improved handling of blank nodes and overall system performance.It's essential to ensure that the documentation for these entities is clear and comprehensive. The descriptions provided for the properties of these entities (
object,predicate,subject) are concise and accurately reflect their roles within the triple patterns. This clarity helps users understand how to use these new entities effectively in their queries.The documentation for
TripleConstructTemplateandTripleDeleteTemplateentities is well-written and aligns with the PR's objectives. It provides clear guidance on how to specify triples for construct and delete operations, which is crucial for leveraging the system's enhanced capabilities.contracts/okp4-cognitarium/src/querier/engine.rs (5)
- 78-84: The
construct_triplesfunction simplifies the creation of triples from a query plan. This change appears straightforward and correctly utilizes theResolvedTripleIterator. Ensure that the templates provided are correctly formatted and validated before being passed to this function.- 997-1005: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1000-1039]
The test setup includes saving a blank node identifier counter to the storage. This is a critical part of ensuring that blank nodes are handled correctly in the tests. Verify that the counter is correctly initialized and used in subsequent operations to maintain unique identifiers for blank nodes. Additionally, ensure that the test data and the store setup are correctly initialized to reflect the changes made in the main code.
- 1378-1407: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1381-1433]
The
ForLoopJoinIteratortest cases are designed to verify the correct behavior of the iterator in various scenarios. It's important to ensure that the test cases cover all relevant situations, including empty inputs and combinations of left and right inputs. Additionally, verify that the merging of variables is done correctly and that the expected results match the actual behavior of the iterator.
- 1442-1476: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1445-1507]
The
CartesianProductJoinIteratortest cases aim to validate the iterator's behavior in handling Cartesian products of inputs. Ensure that the test cases are comprehensive and cover different combinations of left and right inputs, including empty cases. The merging of variables and the construction of expected results should be carefully checked to ensure accuracy.
- 1516-1527: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1519-1603]
The
triple_pattern_iter_compute_iotest function checks the computation of input and output for triple patterns. This is crucial for ensuring that the triple pattern iterator works correctly with various subject, predicate, and object configurations. Pay special attention to the handling of blank variables and ensure that the expected outcomes are correctly defined for each test case.contracts/okp4-cognitarium/src/contract.rs (11)
- 27-27: Initialization of
BLANK_NODE_IDENTIFIER_COUNTERwith0u128ensures that blank node identifiers start from a known state. This is crucial for the new handling mechanism of blank nodes to function correctly.- 53-60: The expanded import list, including
TripleDeleteTemplateandTripleConstructTemplate, reflects the PR's objectives to introduce new types for delete and construct operations. This change is expected to support the differentiation between operations and enhance the system's ability to handle blank nodes effectively.- 94-145: The
deletefunction's logic to handle both empty and non-emptydeletevectors usingEither(from theeithercrate) is a clever way to unify the handling of direct deletion templates and derived deletions from thewhereclause. This approach simplifies the code and makes it more maintainable. However, ensure that the downstream handling ofLeftandRightvariants ofEitheris correctly implemented to avoid runtime errors.- 131-141: The handling of deletions without a
whereclause by directly resolving templates against an empty set of variables is a significant performance optimization. It bypasses unnecessary query execution for straightforward deletion cases. This change aligns with the PR's objectives to improve performance.- 176-182: Introduction of
IdentifierIssuerin the query module suggests an enhancement in handling blank nodes, specifically in the context of thedescribeoperation. This aligns with the PR's goal to ensure unique identification of blank nodes, preventing conflicts and ensuring consistency across operations.- 250-259: The logic in the
describefunction to dynamically construct triples based on the query's resource type (variable or named node) demonstrates flexibility in handling different query scenarios. This adaptability is crucial for a graph database context, especially when dealing with complex RDF data structures.- 295-313: The use of
IdentifierIssuerin theconstructfunction to ensure unique identifiers for blank nodes is a critical improvement. It directly addresses the PR's objective to manage blank nodes more effectively. This change enhances the system's ability to handle RDF data consistently and without conflicts.- 336-345: The utility functions, particularly
map_select_solutionsandconstruct_atoms, show a deep integration of the new blank node handling mechanism (IdentifierIssuer) into the system. This integration is essential for achieving the PR's objectives related to blank node management and performance improvements.- 495-498: The test
proper_initializationverifying the initial state ofBLANK_NODE_IDENTIFIER_COUNTERis a good practice. It ensures that the system's state aligns with expectations right from the start, which is crucial for the reliable operation of the blank node handling mechanism.- 620-652: The test
proper_insert_blank_nodeseffectively validates the increment ofBLANK_NODE_IDENTIFIER_COUNTERupon inserting data containing blank nodes. This test is essential for ensuring the robustness of the blank node identification system introduced in this PR.- 2192-2326: The
proper_constructtest cases demonstrate the system's ability to handle complex RDF data, including blank nodes, using the new constructs introduced in this PR. These tests are crucial for validating the functionality and performance improvements related to blank node handling and query execution.
|
|
||
| #[cfg(test)] | ||
| mod tests { | ||
| use crate::msg::Literal::Simple; | ||
| use crate::msg::Node::{BlankNode, NamedNode}; | ||
| use crate::msg::IRI::{Full, Prefixed}; | ||
| use crate::msg::{ | ||
| InstantiateMsg, StoreLimitsInput, TriplePattern, VarOrNode, VarOrNodeOrLiteral, | ||
| }; | ||
| use crate::msg::{InstantiateMsg, StoreLimitsInput}; | ||
| use cosmwasm_std::Uint128; | ||
| use schemars::_serde_json; | ||
|
|
There was a problem hiding this comment.
📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [618-658]
The tests for default deserialization of StoreLimitsInput and InstantiateMsg are crucial for ensuring that the system's limitations and instantiation parameters are correctly handled. These tests validate the default values for various store limits, which is important for the system's robustness and reliability. Ensure that additional tests are added to cover the new changes introduced in this PR, especially regarding the handling of blank nodes and the new structs and enums.
Would you like assistance in adding more comprehensive tests to cover the new changes introduced in this PR?
Closes #434
Closes #494
Primarily intended to implement changes discussed in the issues mentioned just above: use internal identifiers for blank to avoid conflicts across insertions, and expose blank nodes in queries as ephemeral and generated identifiers. The impacts being structural the changes present in this PR ended up to be huge, I'll try to explain them.
Blank nodes identifiers
Internally the blank nodes are persisted as
u128, unicity being ensured by incrementing a counter, theIdentifierIssuerpresent in theokp4-rdfpackage has slightly been modified to ease this purpose.When querying triples, to map back the blank nodes to an ephemeral identifier the
IdentifierIssueris also used, generating identifiers prefixed byb(e.g._:b0,_:b1, etc...).To avoid conflict with provided blank nodes in a
Construct, those are renamed using the same strategy but with a different prefix:a(e.g._:a0,_:a1, etc...).API changes
TriplePatternThe predicate part of
TriplePatternin where clauses can not longer take a blank node as value, being impossible by spec, and in the state.DeleteDataThe
deletefield containing triple patterns to be deleted has now its own type (i.e.TripleDeleteTemplateinstead ofTriplePattern), as in this selection we must not be able to select blank nodes.ConstructFor consistency purposes, and because "template" means construction conversely of "pattern" which denotes a filter, the
constructfield has now its own type (i.e.TripleConstructTemplate), this brings no breaking change as it is identical thanTriplePatternin its content.Behaviour changes
Regarding the
DeleteDataexecution message, to allow the deletion of provided triples the where clause is not mandatory as soon as the delete selection contains not variables.Performances
The operations of
DeleteData,ConstructandDescribedoesn't rely anymore on theQueryEngine::select, instead of applying the templates on resolved solutions, we directly apply them on the resolved variables. This allow a needed fine control on the resolution, and brings better performances by removing a mapping layer.Summary by CodeRabbit