Skip to content

Conversation

mcserep
Copy link
Collaborator

@mcserep mcserep commented Jul 25, 2025

Paginated metric querying introduced in #784 uses LIMIT + OFFSET SQL closures to select the appropriate AST Node IDs or File IDs.
However DISTINCT is also applied on the projected IDs, therefore the OFFSET does not provide any performance benefit: the database engine still needs to process (deduplicate) all offseted elements.

This PR introduces a new approach: Keyset Pagination (a.k.a. Seek Method); where instead of using OFFSET, we use a last / previous ID to start querying records "after" that.

@mcserep mcserep added Kind: Refactor 🔃 Plugin: C++ Issues related to the parsing and presentation of C++ projects. Plugin: Metrics Issues related to the code metrics plugin. labels Jul 25, 2025
@mcserep mcserep self-assigned this Jul 25, 2025
@mcserep mcserep requested a review from Copilot July 25, 2025 09:51
@mcserep mcserep added this to Roadmap Jul 25, 2025
@mcserep mcserep added this to the Upcoming Release milestone Jul 25, 2025
@github-project-automation github-project-automation bot moved this to In progress in Roadmap Jul 25, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors paginated metric querying from traditional LIMIT + OFFSET pagination to keyset pagination (seek method) for improved performance. The change eliminates the inefficiency where OFFSET doesn't provide benefits when DISTINCT is applied, as the database still needs to process all offset elements for deduplication.

  • Replaces pageNumber_ parameters with previousId_ parameters in paginated query methods
  • Implements new pageAstNodeMetrics and pageFileMetrics functions using keyset pagination
  • Removes the generic template-based pageMetrics function in favor of specific implementations

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
cppmetricsservice.cpp Implements keyset pagination logic and replaces template-based pagination with specific methods
cppmetricsservice.h Updates method signatures and removes template-based pagination declaration
cxxmetrics.thrift Updates service interface to use previousId instead of pageNumber parameters

std::vector<model::CppAstNodeId> paged_nodes = pageMetrics<model::CppAstNodeId, model::CppAstNodeMetricsDistinctView>(
path_, pageSize_, pageNumber_);
std::vector<model::CppAstNodeId> paged_nodes = pageAstNodeMetrics(
path_, pageSize_, previousId_.empty() ? 0 : std::stoull(previousId_));
Copy link
Preview

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The std::stoull function can throw std::invalid_argument or std::out_of_range exceptions when the string cannot be converted to an unsigned long long. Consider adding proper error handling or validation for the previousId_ parameter.

Copilot uses AI. Check for mistakes.

std::vector<model::CppAstNodeId> paged_nodes = pageMetrics<model::CppAstNodeId, model::CppAstNodeMetricsDistinctView>(
path_, pageSize_, pageNumber_);
std::vector<model::CppAstNodeId> paged_nodes = pageAstNodeMetrics(
path_, pageSize_, previousId_.empty() ? 0 : std::stoull(previousId_));
Copy link
Preview

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The std::stoull function can throw std::invalid_argument or std::out_of_range exceptions when the string cannot be converted to an unsigned long long. Consider adding proper error handling or validation for the previousId_ parameter.

Copilot uses AI. Check for mistakes.

Comment on lines +295 to +296
std::vector<model::FileId> paged_files = pageFileMetrics(
path_, pageSize_, previousId_.empty() ? 0 : std::stoull(previousId_));
Copy link
Preview

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The std::stoull function can throw std::invalid_argument or std::out_of_range exceptions when the string cannot be converted to an unsigned long long. Consider adding proper error handling or validation for the previousId_ parameter.

Suggested change
std::vector<model::FileId> paged_files = pageFileMetrics(
path_, pageSize_, previousId_.empty() ? 0 : std::stoull(previousId_));
std::uint64_t previousIdValue = 0;
if (!previousId_.empty()) {
try {
previousIdValue = std::stoull(previousId_);
} catch (const std::invalid_argument& e) {
core::InvalidInput ex;
ex.__set_msg("Invalid previousId_: " + previousId_ + ". Must be a numeric value.");
throw ex;
} catch (const std::out_of_range& e) {
core::InvalidInput ex;
ex.__set_msg("Invalid previousId_: " + previousId_ + ". Value out of range.");
throw ex;
}
}
std::vector<model::FileId> paged_files = pageFileMetrics(path_, pageSize_, previousIdValue);

Copilot uses AI. Check for mistakes.


const std::int32_t offset = (pageNumber_ - 1) * pageSize_;
return " LIMIT " + std::to_string(pageSize_) + " OFFSET " + std::to_string(offset);
std::vector<model::CppAstNodeId> paged_ids(paged_nodes.size());
Copy link
Preview

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-sizing the vector with paged_nodes.size() may not be accurate since the result set size might differ from the allocated size. Consider using reserve() instead of sizing the constructor, or use emplace_back() in the transform operation.

Copilot uses AI. Check for mistakes.

Comment on lines +362 to +366
std::vector<model::FileId> paged_ids(paged_nodes.size());
std::transform(paged_nodes.begin(), paged_nodes.end(), paged_ids.begin(),
[](const model::CppModuleMetricsDistinctView& e){
return e.fileId;
});
Copy link
Preview

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-sizing the vector with paged_nodes.size() may not be accurate since the result set size might differ from the allocated size. Consider using reserve() instead of sizing the constructor, or use emplace_back() in the transform operation.

Suggested change
std::vector<model::FileId> paged_ids(paged_nodes.size());
std::transform(paged_nodes.begin(), paged_nodes.end(), paged_ids.begin(),
[](const model::CppModuleMetricsDistinctView& e){
return e.fileId;
});
std::vector<model::FileId> paged_ids;
paged_ids.reserve(paged_nodes.size());
for (const auto& e : paged_nodes) {
paged_ids.emplace_back(e.fileId);
}

Copilot uses AI. Check for mistakes.

@mcserep
Copy link
Collaborator Author

mcserep commented Jul 25, 2025

CI only failed for SQLite due to #803.

@mcserep mcserep merged commit 613f50c into Ericsson:master Jul 25, 2025
7 of 8 checks passed
@github-project-automation github-project-automation bot moved this from In progress to Done in Roadmap Jul 25, 2025
@mcserep mcserep deleted the efficient-pagination branch July 25, 2025 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Kind: Refactor 🔃 Plugin: C++ Issues related to the parsing and presentation of C++ projects. Plugin: Metrics Issues related to the code metrics plugin.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

1 participant