-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/blank node query #497
Fix/blank node query #497
Conversation
WalkthroughThe updates across various files in the Changes
Related issues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: CodeRabbit UI
Files selected for processing (4)
- contracts/okp4-cognitarium/src/querier/engine.rs (27 hunks)
- contracts/okp4-cognitarium/src/querier/plan.rs (3 hunks)
- contracts/okp4-cognitarium/src/querier/plan_builder.rs (18 hunks)
- contracts/okp4-cognitarium/src/querier/variable.rs (4 hunks)
Additional comments: 26
contracts/okp4-cognitarium/src/querier/plan.rs (5)
- 13-13: The change to store
variables
as a vector ofPlanVariable
enums instead of strings is a significant improvement. It allows for more explicit handling of different variable types, enhancing the clarity and maintainability of the code.- 16-19: The introduction of the
PlanVariable
enum withBasic
andBlankNode
variants is a thoughtful design choice. It clearly distinguishes between regular variables and blank nodes, aligning with the PR's objective to better manage blank nodes in queries.- 23-36: The
get_var_index
method's adaptation to handlePlanVariable
variants is well-implemented. However, it's important to ensure that all calls to this method have been updated to reflect the change in handling variables, especially considering the introduction of blank node handling.Verification successful
The examination of the
get_var_index
usage withinengine.rs
indicates that the method call has been updated to reflect the changes in handling variables, including the introduction of blank node handling. The error handling for cases where a variable is not found aligns with the expectations set by the method's adaptation. Therefore, the review comment concerning the need to ensure all calls toget_var_index
have been updated accordingly is verified as addressed.* 117-118: Adding the `BlankVariable` variant to the `PatternValue` enum is a crucial update for supporting blank nodes as special variables. This change directly supports the PR's goal of improving blank node handling in queries. * 123-123: The modification in the `lookup_bound_variable` method to include `PatternValue::BlankVariable` ensures that blank nodes are correctly accounted for when determining bound variables. This is a necessary update for the new handling of blank nodes.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for usages of get_var_index to ensure they've been updated accordingly. rg --type rust 'get_var_index'Length of output: 242
Script:
#!/bin/bash # Examine the usage of get_var_index in engine.rs to ensure it's been updated to reflect changes. rg --type rust 'get_var_index' contracts/okp4-cognitarium/src/querier/engine.rs -C 5Length of output: 530
contracts/okp4-cognitarium/src/querier/variable.rs (2)
- 134-140: The introduction of the
merge_index
method in theResolvedVariables
struct is a significant improvement for managing variable assignments, especially in scenarios where the same variable might be assigned multiple times. This method enhances the robustness of variable handling by preventing unintended overwrites. However, it's crucial to ensure that all instances where variables are merged or set throughout the codebase have been updated to use this new method.- 300-320: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [303-350]
The tests provided for the
merge_index
method effectively demonstrate its intended functionality, particularly in handling scenarios where variables might conflict. It's good practice to include such tests to validate new logic, ensuring that the method behaves as expected under various conditions.contracts/okp4-cognitarium/src/querier/plan_builder.rs (7)
- 145-148: The implementation for handling
Node::BlankNode
inbuild_subject_pattern
correctly introduces the concept ofBlankVariable
for blank nodes. This aligns with the PR's objective to treat blank nodes as special variables, ensuring they are not assigned a fixed identity during query processing. However, ensure that theresolve_blank_variable
method is robust and correctly handles the uniqueness of blank variables across different queries.- 172-177: Similar to the handling of blank nodes in subjects, the handling of blank nodes in objects within
build_object_pattern
is consistent and aligns with the PR's objectives. By treating blank nodes asBlankVariable
, it ensures the querying engine can process these nodes correctly without assigning them a fixed identity. This is a crucial step in addressing the core problem identified in the PR description.- 299-299: The test
proper_initialization
correctly asserts the initial state of aPlanBuilder
instance, including the emptyvariables
vector. This is important for ensuring that the builder starts with no predefined variables, which is crucial for the accurate processing of queries. The test cases provided cover various scenarios, which is good for ensuring the robustness of the initialization logic.- 331-343: The test case handling a triple pattern with both subject and object as blank nodes demonstrates the correct application of
BlankVariable
in a practical scenario. This test case is essential for validating the changes made to support blank nodes as special variables. It's good to see comprehensive test coverage for these scenarios.- 430-430: The use of
QueryNode::Noop
with bound variables in test cases where the subject, predicate, or object does not exist is a good way to handle patterns that cannot be resolved due to missing namespaces or other issues. This approach ensures that the query processing can gracefully handle errors without causing the entire query to fail.- 514-517: In the
build_plan
test case, the handling of variables in the context of an error scenario (e.g., "Empty basic graph pattern") demonstrates the system's ability to correctly manage variable states even when the query cannot be executed. This is crucial for ensuring that error handling does not negatively impact the overall query processing logic.- 653-663: The handling of a complex query scenario involving a
ForLoopJoin
with aBlankVariable
in the test cases is a good demonstration of the system's capability to process more complex queries involving blank nodes. This test case is important for validating the changes made to support blank nodes as special variables in a realistic query scenario.contracts/okp4-cognitarium/src/querier/engine.rs (12)
- 222-222: The introduction of
TriplePatternBlankFilters
is a good approach to handling blank node filters in triple patterns. This change aligns with the PR's objective to improve blank node handling in queries.- 233-239: The conditional return within the
new
method ofTriplePatternIterator
is clear and ensures that the iterator is correctly initialized based on the computed input, output bindings, and filters. This is a good practice for initializing complex structures.- 255-259: The
post_filter
closure effectively filters triples based on blank node criteria. This approach is efficient and concise, leveraging Rust's functional programming capabilities.- 351-367: The
compute_iter_io
method's implementation is well-structured and clearly handles the computation of filters, blank filters, and bindings for the triple pattern. The use of pattern matching here is idiomatic and makes the code easy to follow.- 374-386: The
resolve_pattern_part
method's update to handle blank variables is crucial for the feature being implemented. The method's logic is sound, but it's important to ensure comprehensive testing around this functionality to catch any edge cases.- 391-405: The
map_triple
method is a valuable addition for mapping triples to resolved variables. It's well-implemented and follows Rust's idiomatic practices. However, ensure error handling is robust, especially with themerge_index
method calls.Verification successful
The provided script output shows various instances of
StdResult
usage inengine.rs
, indicating that error handling is indeed part of the codebase. However, it does not directly confirm how errors frommerge_index
are handled within themap_triple
method. To conclusively verify the error handling formerge_index
withinmap_triple
, a more targeted examination of themap_triple
function's implementation is necessary.
The examination of the
map_triple
function's implementation confirms that error handling formerge_index
calls is properly implemented using the?
operator for error propagation, in line with Rust's idiomatic practices. This addresses the concern raised in the original review comment regarding ensuring robust error handling formerge_index
method calls.* 412-422: The modifications to the `next` method in `TriplePatternIterator` to handle errors and continue iteration are appropriate. This ensures that iteration over triples is resilient to errors, which is crucial for robust query processing. * 913-913: The introduction of `PlanVariable` enum in test cases is a good practice for testing variable types. This ensures that the tests cover the new functionality related to handling both basic and blank node types. * 981-985: The test case setup with `PlanVariable::Basic` for variables is clear and aligns with the changes made in the PR. It's important to ensure that these test cases adequately cover the new logic introduced for handling blank nodes. * 1446-1448: The setup of variables in the `triple_pattern_iter_compute_io` test function is well-done, ensuring that the tests will accurately reflect the handling of different pattern values. This is crucial for validating the changes made to the `compute_iter_io` method. * 1465-1469: The test case expectations are clearly defined, which is good for ensuring that the `compute_iter_io` method's logic is correctly implemented. It's important to review these test cases to ensure they cover all possible scenarios, including edge cases. * 1645-1646: The test for the `make_state_iter` method's functionality is crucial for ensuring that the filtering logic based on blank node criteria is correctly implemented. It's good to see comprehensive test coverage for different filter scenarios.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check for error handling in the merge_index method calls within map_triple. rg "merge_index" --type rust --files-with-matches | xargs rg "Result"Length of output: 5090
Script:
#!/bin/bash # Inspect the implementation of map_triple to verify error handling for merge_index calls rg "fn map_triple" contracts/okp4-cognitarium/src/querier/engine.rs -A 20Length of output: 796
size-limit report 📦
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #497 +/- ##
==========================================
+ Coverage 96.20% 96.29% +0.08%
==========================================
Files 55 55
Lines 12261 12693 +432
==========================================
+ Hits 11796 12223 +427
- Misses 465 470 +5 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 1
Configuration used: CodeRabbit UI
Files selected for processing (3)
- contracts/okp4-cognitarium/src/querier/engine.rs (27 hunks)
- contracts/okp4-cognitarium/src/querier/plan.rs (4 hunks)
- contracts/okp4-cognitarium/src/querier/plan_builder.rs (18 hunks)
Files skipped from review as they are similar to previous changes (2)
- contracts/okp4-cognitarium/src/querier/plan.rs
- contracts/okp4-cognitarium/src/querier/plan_builder.rs
Additional comments: 12
contracts/okp4-cognitarium/src/querier/engine.rs (12)
- 222-222: The introduction of
TriplePatternBlankFilters
as a tuple(bool, bool)
is a concise way to represent the presence of blank filters for subjects and objects. However, consider documenting what eachbool
represents for clarity.- 233-239: The conditional construction of
TriplePatternIterator
based on the result ofcompute_iter_io
is a good practice, ensuring that the iterator is only created if necessary filters and bindings are successfully computed.- 255-259: The
post_filter
closure effectively filters triples based on the presence of blank nodes as specified byblank_filters
. This approach maintains the logic's simplicity while addressing the core issue of blank node handling.- 351-371: The
compute_iter_io
method's implementation is crucial for determining the filters, blank filters, and output bindings for the iterator. The use ofOption
for the return type and the tuple structure for the filters and bindings is appropriate. However, ensure that the logic correctly handles all possible combinations of pattern values to avoid any unintended behavior.Verification successful
Given the need to verify the handling of all pattern value combinations in the
compute_iter_io
method, and the limitations of the initial script, I will generate a new script to search for theresolve_pattern_part
method's definition and usage. This will help in understanding how it's implemented and its role in thecompute_iter_io
method's logic.
Based on the provided definition and usage of the
resolve_pattern_part
method, and its integration within thecompute_iter_io
method, it appears that the implementation is designed to handle all possible combinations of pattern values appropriately. The method processes different types of pattern values and returns a tuple containing filters, blank filters, and bindings, which aligns with the method's purpose. The check forpb_filter
withincompute_iter_io
further suggests that the method includes logic to handle specific query configurations. Therefore, the original concern about ensuring that the logic correctly handles all possible combinations of pattern values seems to have been addressed in the implementation.* 383-390: The handling of different `PatternValue` variants in `resolve_pattern_part` is well-implemented. It correctly distinguishes between constants, blank variables, and regular variables, setting the groundwork for accurate triple pattern resolution. * 395-409: The `map_triple` method efficiently maps a triple to resolved variables based on the output bindings. This method is essential for transforming query results into a format usable by the rest of the query engine. Ensure that error handling is robust, especially for the `merge_index` calls.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Manual verification needed to ensure all pattern value combinations are correctly handled. echo "Manual check required for compute_iter_io method logic."Length of output: 120
Script:
#!/bin/bash # Search for the definition and usage of resolve_pattern_part method rg "fn resolve_pattern_part" -A 20 rg "resolve_pattern_part" -C 5Length of output: 3951
Verification successful
Given the need to verify the robustness of error handling in the
map_triple
method, specifically regarding themerge_index
calls, we will proceed to inspect the implementation details ofmerge_index
and howmap_triple
manages potential errors from these calls.
The
map_triple
method employs Rust's idiomatic error handling mechanisms effectively by using the?
operator with themerge_index
calls. This approach ensures that any potential inconsistencies or errors detected bymerge_index
are appropriately propagated, halting the mapping process. The method's design aligns with robust error handling practices within Rust's option and error handling patterns, confirming the adequacy of the error handling inmap_triple
.* 416-426: The implementation of the `next` method in `TriplePatternIterator` correctly iterates over triples, applying the `map_triple` method. The recursive call to `self.next()` upon encountering `None` ensures that the iterator continues until a valid result is found or the underlying iterator is exhausted. * 917-917: The introduction of `PlanVariable` enum in test cases is a good practice, as it allows for more expressive and flexible test setups, accommodating both basic and blank node types. * 985-989: The test case setup with `PlanVariable::Basic` variants is clear and straightforward. However, ensure that test coverage is comprehensive, including scenarios with `PlanVariable::Blank` to fully validate the handling of blank nodes. * 1333-1339: The use of `merge_index` within the `ForLoopJoinIterator` test cases demonstrates the method's utility in merging variables based on index. This is a critical part of ensuring that joined results are correctly assembled. Consider adding assertions to verify the correctness of the merged results. * 1407-1413: The setup for `CartesianProductJoinIterator` test cases effectively demonstrates the use of `merge_index` for assembling Cartesian product results. Similar to the `ForLoopJoinIterator` cases, ensure that the correctness of the merged results is thoroughly verified. * 1450-1452: The setup for testing `TriplePatternIterator::compute_iter_io` with pre-merged variables is well done, providing a solid basis for evaluating the method's logic. Ensure that the test cases cover all relevant scenarios, including those with and without blank variables.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Manual verification needed to ensure robust error handling in map_triple method. echo "Manual check required for error handling in map_triple method."Length of output: 134
Script:
#!/bin/bash # Inspect the implementation of merge_index and its error handling within map_triple rg "fn merge_index" -A 10Length of output: 980
Script:
#!/bin/bash # Check for error handling mechanisms in map_triple related to merge_index calls rg "map_triple" -A 20 -B 5Length of output: 3856
Just added a bunch of tests for covering blank nodes query test cases at the contract level. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (3)
- contracts/okp4-cognitarium/src/contract.rs (22 hunks)
- contracts/okp4-cognitarium/testdata/blank-nodes.ttl (1 hunks)
- contracts/okp4-dataverse/src/registrar/credential.rs (1 hunks)
Files not summarized due to errors (1)
- contracts/okp4-cognitarium/src/contract.rs: Error: Message exceeds token limit
Additional comments: 14
contracts/okp4-cognitarium/testdata/blank-nodes.ttl (1)
- 31-33: The property
core:hasInformations
uses an unusual plural form. RDF properties typically represent relationships or attributes in singular form. Please verify this aligns with the intended ontology design.contracts/okp4-dataverse/src/registrar/credential.rs (1)
- 32-32: Replacing
.map
with.copied()
in theextract_vc_type
method changes the handling from a reference to a copied value. This is appropriate for types that implement theCopy
trait, like&str
. Ensure all uses of this method are updated accordingly to handle the copied value.contracts/okp4-cognitarium/src/contract.rs (12)
- 529-563: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [381-545]
The tests added for
proper_insert
,insert_existing_triples
,insert_unauthorized
, andinsert_limits
are comprehensive and cover various scenarios including successful insertions, handling of existing triples, unauthorized access, and limit checks. This ensures robustness in data insertion functionality.
- 606-606: The test
insert_existing_triples
correctly asserts that re-inserting the same data does not increase the triple count, which is crucial for avoiding duplicate data. This behavior is essential for maintaining data integrity.- 966-966: The
proper_delete
test case demonstrates the contract's ability to handle data deletion accurately, including cases with specific conditions and prefixes. It's good to see that the test verifies the expected decrease in triple count and namespace count, ensuring the delete functionality works as intended.- 1139-1139: The
proper_store
test validates the store query functionality, ensuring that the contract state is correctly reported. This is crucial for clients to understand the current state of the contract, including ownership and limits.- 1356-1574: The addition of tests for querying with blank nodes (
proper_select_blank_nodes
) is particularly relevant to the PR's objectives. These tests ensure that the enhanced handling of blank nodes in queries is functioning as expected. It's good to see various scenarios being tested, including different prefixes and variable usage.- 1575-1575: The
invalid_select
test cases are crucial for ensuring the system's resilience against invalid queries. Testing for exceeded variable counts, limits, missing prefixes, and unselected variables helps guarantee that the system behaves predictably and securely in the face of erroneous input.- 1728-1756: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1681-1794]
The
formats_describe
tests cover theDescribe
functionality across different data formats (Turtle, RDF/XML, NTriples, NQuads), ensuring the contract can correctly serialize RDF data in various formats. This is essential for interoperability and flexibility in how data can be consumed.
- 1834-1857: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1837-1864]
The
prefixes_describe
test validates the handling of prefixed names in describe queries, ensuring that the contract correctly resolves prefixes to full URIs. This is important for user convenience and readability in queries.
- 1907-1929: The
variable_describe
test demonstrates the contract's ability to handle variable resources in describe queries, ensuring dynamic query capabilities are robust. This allows for more flexible and powerful queries.- 1972-1994: The
variable_mutiple_resources_describe
test checks the describe functionality with variables that match multiple resources, ensuring the contract can handle complex query patterns and return comprehensive results.- 2037-2063: The
blanknode_describe
test ensures that describe queries can correctly handle blank nodes, which is central to the PR's objectives. This test is crucial for validating the enhanced blank node handling in the querying interface.- 2107-2158: The
proper_construct
tests validate the construct query functionality, ensuring that the contract can correctly build RDF graphs based on query patterns. This includes handling of prefixes and variable substitution, which are key for dynamic graph construction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great! Good job. 👍
Following the conversation on #494, blank nodes will be kept as part of the querying interface, this PR addresses an issue pointed out in a comment regarding the management of those blank nodes.
Details
When providing a blank node in a triple pattern, its value (i.e. the provided id for the blank node) was checked against the triples in the state, which is a non-sense as blank nodes identifiers are ephemeral.
In such case the query engine shall only ensure the relations expressed through blank nodes in triple patterns.
Proposed solution
The proposed solution consider a blank nodes triple patterns inputs as variables, a special type of variables linked to a blank node so that the querying engine can enforce values to be effectively blank nodes. Those variables are treated specially to prevent them from being used as
SelectItem
.Misc
Another fix as been done here, when using the same variable multiple times in a single triple pattern, it could have the same value..
Summary by CodeRabbit
New Features
core:hasInformations
to entities in multiple languages for enhanced dataset information.map
function withcopied
inDataverseCredential
implementation for improved data processing.Refactor
TriplePatternIterator
struct to includeblank_filters
for advanced filtering in queries.QueryPlan
to usePlanVariable
enums for better variable handling and resolution.PlanBuilder
methods for improved variable management and pattern construction in query planning.