-
Notifications
You must be signed in to change notification settings - Fork 105
Query paginated metrics efficiently #804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors paginated metric querying from traditional LIMIT + OFFSET pagination to keyset pagination (seek method) for improved performance. The change eliminates the inefficiency where OFFSET doesn't provide benefits when DISTINCT is applied, as the database still needs to process all offset elements for deduplication.
- Replaces
pageNumber_
parameters withpreviousId_
parameters in paginated query methods - Implements new
pageAstNodeMetrics
andpageFileMetrics
functions using keyset pagination - Removes the generic template-based
pageMetrics
function in favor of specific implementations
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
File | Description |
---|---|
cppmetricsservice.cpp | Implements keyset pagination logic and replaces template-based pagination with specific methods |
cppmetricsservice.h | Updates method signatures and removes template-based pagination declaration |
cxxmetrics.thrift | Updates service interface to use previousId instead of pageNumber parameters |
std::vector<model::CppAstNodeId> paged_nodes = pageMetrics<model::CppAstNodeId, model::CppAstNodeMetricsDistinctView>( | ||
path_, pageSize_, pageNumber_); | ||
std::vector<model::CppAstNodeId> paged_nodes = pageAstNodeMetrics( | ||
path_, pageSize_, previousId_.empty() ? 0 : std::stoull(previousId_)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The std::stoull function can throw std::invalid_argument or std::out_of_range exceptions when the string cannot be converted to an unsigned long long. Consider adding proper error handling or validation for the previousId_ parameter.
Copilot uses AI. Check for mistakes.
std::vector<model::CppAstNodeId> paged_nodes = pageMetrics<model::CppAstNodeId, model::CppAstNodeMetricsDistinctView>( | ||
path_, pageSize_, pageNumber_); | ||
std::vector<model::CppAstNodeId> paged_nodes = pageAstNodeMetrics( | ||
path_, pageSize_, previousId_.empty() ? 0 : std::stoull(previousId_)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The std::stoull function can throw std::invalid_argument or std::out_of_range exceptions when the string cannot be converted to an unsigned long long. Consider adding proper error handling or validation for the previousId_ parameter.
Copilot uses AI. Check for mistakes.
std::vector<model::FileId> paged_files = pageFileMetrics( | ||
path_, pageSize_, previousId_.empty() ? 0 : std::stoull(previousId_)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The std::stoull function can throw std::invalid_argument or std::out_of_range exceptions when the string cannot be converted to an unsigned long long. Consider adding proper error handling or validation for the previousId_ parameter.
std::vector<model::FileId> paged_files = pageFileMetrics( | |
path_, pageSize_, previousId_.empty() ? 0 : std::stoull(previousId_)); | |
std::uint64_t previousIdValue = 0; | |
if (!previousId_.empty()) { | |
try { | |
previousIdValue = std::stoull(previousId_); | |
} catch (const std::invalid_argument& e) { | |
core::InvalidInput ex; | |
ex.__set_msg("Invalid previousId_: " + previousId_ + ". Must be a numeric value."); | |
throw ex; | |
} catch (const std::out_of_range& e) { | |
core::InvalidInput ex; | |
ex.__set_msg("Invalid previousId_: " + previousId_ + ". Value out of range."); | |
throw ex; | |
} | |
} | |
std::vector<model::FileId> paged_files = pageFileMetrics(path_, pageSize_, previousIdValue); |
Copilot uses AI. Check for mistakes.
|
||
const std::int32_t offset = (pageNumber_ - 1) * pageSize_; | ||
return " LIMIT " + std::to_string(pageSize_) + " OFFSET " + std::to_string(offset); | ||
std::vector<model::CppAstNodeId> paged_ids(paged_nodes.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-sizing the vector with paged_nodes.size() may not be accurate since the result set size might differ from the allocated size. Consider using reserve() instead of sizing the constructor, or use emplace_back() in the transform operation.
Copilot uses AI. Check for mistakes.
std::vector<model::FileId> paged_ids(paged_nodes.size()); | ||
std::transform(paged_nodes.begin(), paged_nodes.end(), paged_ids.begin(), | ||
[](const model::CppModuleMetricsDistinctView& e){ | ||
return e.fileId; | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-sizing the vector with paged_nodes.size() may not be accurate since the result set size might differ from the allocated size. Consider using reserve() instead of sizing the constructor, or use emplace_back() in the transform operation.
std::vector<model::FileId> paged_ids(paged_nodes.size()); | |
std::transform(paged_nodes.begin(), paged_nodes.end(), paged_ids.begin(), | |
[](const model::CppModuleMetricsDistinctView& e){ | |
return e.fileId; | |
}); | |
std::vector<model::FileId> paged_ids; | |
paged_ids.reserve(paged_nodes.size()); | |
for (const auto& e : paged_nodes) { | |
paged_ids.emplace_back(e.fileId); | |
} |
Copilot uses AI. Check for mistakes.
CI only failed for SQLite due to #803. |
Paginated metric querying introduced in #784 uses
LIMIT
+OFFSET
SQL closures to select the appropriate AST Node IDs or File IDs.However
DISTINCT
is also applied on the projected IDs, therefore theOFFSET
does not provide any performance benefit: the database engine still needs to process (deduplicate) all offseted elements.This PR introduces a new approach: Keyset Pagination (a.k.a. Seek Method); where instead of using
OFFSET
, we use a last / previous ID to start querying records "after" that.