Skip to content

Conversation

@AllureCurtain
Copy link
Contributor

link #1653

  • 单一历史遍历内核traverse_commit_history_for_last_modification
  • 统一 refs 解析resolve_start_commit 被所有 API 复用
  • 代码复用率 100%:所有 API 通过 history 模块统一调用

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the commit history API to use a unified, consistent implementation across all code paths. The refactoring introduces a single traversal kernel for finding last modifications, consolidates refs resolution logic, and eliminates code duplication.

Key Changes

  • Unified traversal kernel: All history lookups now use traverse_commit_history_for_last_modification which implements BFS with TREESAME pruning to match GitHub's behavior
  • Consolidated refs resolution: New resolve_start_commit function handles tags, commit SHAs, and HEAD consistently
  • Simplified API: Merged get_latest_commit and get_latest_commit_with_refs into a single method with an optional refs parameter
  • Enhanced error handling: Added [code:xxx] prefix parsing in error handling to automatically set appropriate HTTP status codes

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
mono/src/api/router/preview_router.rs Simplified get_latest_commit to use unified API with optional refs parameter
mono/src/api/error.rs Added automatic HTTP status code parsing from error message prefixes like [code:404]
ceres/src/api_service/mono_api_service.rs Removed duplicate methods and DB-based logic, delegating to unified history module
ceres/src/api_service/mod.rs Added resolve_start_commit for unified ref resolution and updated trait signature
ceres/src/api_service/history.rs Refactored traversal algorithm to use BFS with proper merge commit handling
ceres/src/api_service/commit_ops.rs Updated to use unified traversal logic for both directories and files

else {
// Item doesn't exist at start commit, return start commit
// Item does not exist at start_commit; nothing to trace backwards
return Ok((*start_commit).clone());
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Potential logic issue: When the item does not exist at start_commit (line 274), the function returns the start_commit itself. Similar to the path check above, this could be misleading. If the item doesn't exist at the starting point, it may be more appropriate to return an error indicating the item was not found, rather than implying the start_commit is the last modification.

Suggested change
return Ok((*start_commit).clone());
return Err(GitError::NotFound(format!(
"Item '{}' does not exist at path '{}' in commit {}",
search_item.name,
path.display(),
start_commit.id
)));

Copilot uses AI. Check for mistakes.
curr_blob = p_blob;
}

Ok((*start_commit).clone())
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Potential infinite loop or correctness issue: The BFS traversal only adds parents to the queue if they haven't been visited yet (line 333), but there's a scenario where all queued commits could be exhausted without finding a modification. In such cases, the function returns start_commit at line 343. However, if the traversal logic is correct and all parents have matching hashes all the way back to the root, this fallback might never be reached in practice. Consider adding a tracing log or comment explaining when this fallback case would occur, to clarify the expected behavior.

Copilot uses AI. Check for mistakes.
let err_str = anyhow_err.to_string();

// Parse [code:xxx] format and map to appropriate HTTP status code
if let Some(code_end) = err_str.find(']').filter(|_| err_str.starts_with("[code:")) {
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential index out of bounds: The code extracts the status code by slicing &err_str[6..code_end] (line 73), assuming the string starts with "[code:" (6 characters). However, the filter condition only checks that the string starts with "[code:" and that a ] exists, but doesn't verify that code_end is actually greater than 6. If code_end is less than 6 (e.g., malformed string like "[code]"), this would cause a panic. While unlikely in practice, it's safer to add a check: code_end > 6 before slicing, or use safer extraction methods.

Suggested change
if let Some(code_end) = err_str.find(']').filter(|_| err_str.starts_with("[code:")) {
if let Some(code_end) = err_str.find(']').filter(|&idx| err_str.starts_with("[code:") && idx >= 6) {

Copilot uses AI. Check for mistakes.

let body = Json(CommonResult::<()>::common_failed());
// Remove [code:xxx] prefix from error message for cleaner display
if let Some(code_end) = err_msg.find(']').filter(|_| err_msg.starts_with("[code:")) {
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar potential index out of bounds: The code extracts the status code by slicing &err_str[6..code_end] (line 47), but only checks that err_msg.starts_with("[code:") and that ] exists. If code_end is less than 6 (e.g., malformed "[code]"), this would panic. Add validation that code_end > 6 before slicing to prevent potential runtime errors.

Suggested change
if let Some(code_end) = err_msg.find(']').filter(|_| err_msg.starts_with("[code:")) {
if let Some(code_end) = err_msg.find(']').filter(|&idx| err_msg.starts_with("[code:") && idx >= 6) {

Copilot uses AI. Check for mistakes.
Comment on lines +89 to 97
/// Get the latest commit that modified a file or directory.
/// Unified method that handles both tag-based and commit-based browsing.
async fn get_latest_commit(
&self,
path: PathBuf,
refs: Option<&str>,
) -> Result<LatestCommitInfo, GitError> {
commit_ops::get_latest_commit(self, path, refs).await
}
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation update needed: The comment on line 195 in the original trait definition mentioned "Default implementation: fallback to the version without refs", but this default implementation has been removed. The unified get_latest_commit method now requires the refs parameter. Any documentation or comments that refer to the old two-method pattern (get_latest_commit and get_latest_commit_with_refs) should be updated to reflect this consolidation.

Copilot uses AI. Check for mistakes.
Comment on lines 264 to 265
// Path does not exist at start_commit; return start_commit as a safe default
return Ok((*start_commit).clone());
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Potential logic issue: When the path does not exist at start_commit (line 264), the function returns the start_commit itself as the "last modification" commit. This might be misleading, as it suggests that this commit modified the non-existent path. Consider returning an error with a descriptive message like "[code:404] Path not found" instead, to provide clearer feedback that the path doesn't exist at the given reference.

Suggested change
// Path does not exist at start_commit; return start_commit as a safe default
return Ok((*start_commit).clone());
// Path does not exist at start_commit; return an error
return Err(GitError::NotFound(format!(
"[code:404] Path not found: {:?} at commit {}",
path, start_commit.id
)));

Copilot uses AI. Check for mistakes.
Comment on lines 95 to 97
Err(_) => Err(GitError::CustomError(
"[code:404] File not found".to_string(),
)),
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error handling inconsistency: The error pattern matching changed from Some(commit) and None (old code) to Ok(commit) and Err(_) (new code). However, the Err(_) case discards all error information with the wildcard pattern. This means different errors from resolve_last_modification_by_path (e.g., "Path has no parent directory", "Item not found in directory", etc.) all get replaced with the generic "[code:404] File not found" message. Consider preserving the specific error message or at least logging it before replacing it with a generic 404, so that debugging and troubleshooting are easier.

Copilot uses AI. Check for mistakes.
Comment on lines 232 to 240
// Try to resolve as commit SHA (must be 40-character hex)
if ref_str.len() == 40 && ref_str.chars().all(|c| c.is_ascii_hexdigit()) {
let commit = handler.get_commit_by_hash(ref_str).await;
return Ok(Arc::new(commit));
}

// Failed to resolve: return descriptive error
Err(GitError::CustomError(format!(
"Invalid reference '{}': not a valid tag name or commit SHA",
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Missing validation: The function accepts refs as Option<&str> and passes it to resolve_start_commit, which validates commit SHAs by checking if they are exactly 40 characters and all hexadecimal. However, shorter commit SHA prefixes (7-39 characters) are commonly used in Git and may be valid in your system. If your system supports short SHAs, this validation will incorrectly reject them. Consider either supporting short SHAs or documenting that only full 40-character SHAs are accepted.

Suggested change
// Try to resolve as commit SHA (must be 40-character hex)
if ref_str.len() == 40 && ref_str.chars().all(|c| c.is_ascii_hexdigit()) {
let commit = handler.get_commit_by_hash(ref_str).await;
return Ok(Arc::new(commit));
}
// Failed to resolve: return descriptive error
Err(GitError::CustomError(format!(
"Invalid reference '{}': not a valid tag name or commit SHA",
// Try to resolve as commit SHA (allow short SHA: 7-40 hex digits)
if (7..=40).contains(&ref_str.len()) && ref_str.chars().all(|c| c.is_ascii_hexdigit()) {
// Try to resolve the commit by (possibly short) hash
let commit = handler.get_commit_by_hash(ref_str).await;
// If get_commit_by_hash returns a sentinel/empty commit for not found, check for that here
// (Assume get_commit_by_hash returns a valid commit or panics/errors otherwise)
// If ambiguous or not found, get_commit_by_hash should return an error or a special value
// If not found, return a descriptive error
// For now, we assume get_commit_by_hash returns a valid commit or panics/errors otherwise
return Ok(Arc::new(commit));
}
// Failed to resolve: return descriptive error
Err(GitError::CustomError(format!(
"Invalid reference '{}': not a valid tag name or commit SHA (must be a tag name or a 7-40 digit hex commit hash)",

Copilot uses AI. Check for mistakes.
Comment on lines 34 to 44
// Treat directory as a TreeItem and find its last modification
let dir_name = path
.file_name()
.and_then(|n| n.to_str())
.ok_or_else(|| GitError::CustomError("Invalid directory path".to_string()))?
.to_string();

let parent = path
.parent()
.ok_or_else(|| GitError::CustomError("Directory has no parent".to_string()))?;

Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edge case: Root directory handling may fail. If the path is the root directory (e.g., "/" or "."), then path.parent() will return Some("") or None, and path.file_name() might not return a meaningful directory name. The code at lines 35-43 extracts the parent and directory name without checking if this is the root directory. This could lead to incorrect behavior or errors when querying the latest commit for the repository root. Consider adding special handling for the root directory case.

Suggested change
// Treat directory as a TreeItem and find its last modification
let dir_name = path
.file_name()
.and_then(|n| n.to_str())
.ok_or_else(|| GitError::CustomError("Invalid directory path".to_string()))?
.to_string();
let parent = path
.parent()
.ok_or_else(|| GitError::CustomError("Directory has no parent".to_string()))?;
// Special handling for root directory
let (dir_name, parent): (String, &std::path::Path) = if path.as_os_str().is_empty()
|| path == std::path::Path::new(".")
|| path == std::path::Path::new("/")
{
// For root, use "" as name and parent as root itself
(String::from(""), std::path::Path::new(""))
} else {
let dir_name = path
.file_name()
.and_then(|n| n.to_str())
.ok_or_else(|| GitError::CustomError("Invalid directory path".to_string()))?
.to_string();
let parent = path
.parent()
.ok_or_else(|| GitError::CustomError("Directory has no parent".to_string()))?;
(dir_name, parent)
};

Copilot uses AI. Check for mistakes.
/// commit where an item exists. This function walks the parent chain starting
/// from `start_commit` and returns the first commit where the item's hash at
/// `path` differs from its direct parent's hash (or where the item did not
/// exist in the parent). This matches the "last modification" semantics used
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing documentation: The documentation comment for traverse_commit_history_for_last_modification ends abruptly on line 243 without completing the sentence "This matches the 'last modification' semantics used". This incomplete sentence should be completed or removed to ensure clear documentation.

Suggested change
/// exist in the parent). This matches the "last modification" semantics used
/// exist in the parent). This matches the "last modification" semantics used by tools like `git log -L` or `git blame`, which identify the most recent commit where a file or directory was changed.

Copilot uses AI. Check for mistakes.
@genedna genedna requested a review from Copilot November 23, 2025 16:08
Copilot finished reviewing on behalf of genedna November 23, 2025 16:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Comment on lines +106 to +117
// Preserve the original error message for better debugging
tracing::debug!("File not found or error during traversal: {:?}", e);
// If it's already a CustomError with [code:404], preserve it
if let GitError::CustomError(msg) = &e
&& msg.starts_with("[code:404]")
{
return Err(e);
}
// Otherwise wrap with 404 code
Err(GitError::CustomError(
"[code:404] File not found".to_string(),
))
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Inconsistent error handling pattern: The error handling pattern here is convoluted with nested pattern matching. The code checks if the error is CustomError with [code:404] prefix, preserves it, and otherwise wraps it with a generic 404 message. However, this loses the original error context when wrapping.

Consider simplifying this to:

Err(e) => {
    tracing::debug!("File not found or error during traversal: {:?}", e);
    match e {
        GitError::CustomError(ref msg) if msg.starts_with("[code:404]") => Err(e),
        _ => Err(GitError::CustomError("[code:404] File not found".to_string()))
    }
}

This avoids the let-if pattern and makes the logic clearer.

Suggested change
// Preserve the original error message for better debugging
tracing::debug!("File not found or error during traversal: {:?}", e);
// If it's already a CustomError with [code:404], preserve it
if let GitError::CustomError(msg) = &e
&& msg.starts_with("[code:404]")
{
return Err(e);
}
// Otherwise wrap with 404 code
Err(GitError::CustomError(
"[code:404] File not found".to_string(),
))
tracing::debug!("File not found or error during traversal: {:?}", e);
match e {
GitError::CustomError(ref msg) if msg.starts_with("[code:404]") => Err(e),
_ => Err(GitError::CustomError("[code:404] File not found".to_string())),
}

Copilot uses AI. Check for mistakes.
Comment on lines +55 to +58
let body = Json(CommonResult::<()> {
req_result: false,
data: None,
err_message: err_msg,
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential security issue: Unbounded error message exposure: The error handling code extracts and returns the full error message from anyhow::Error directly to the API response. If internal errors contain sensitive information (e.g., file paths, database connection strings, internal system details), this could leak sensitive information to API consumers.

Consider sanitizing error messages before exposing them to API responses, especially for internal server errors. Only expose safe, user-friendly messages to the client.

Suggested change
let body = Json(CommonResult::<()> {
req_result: false,
data: None,
err_message: err_msg,
// Only expose detailed error messages for 4xx (client) errors.
let client_error = self.status.is_client_error();
let response_msg = if client_error {
err_msg.clone()
} else {
"Internal server error".to_string()
};
let body = Json(CommonResult::<()> {
req_result: false,
data: None,
err_message: response_msg,

Copilot uses AI. Check for mistakes.
Comment on lines +35 to +54
let (dir_name, parent): (String, &std::path::Path) = if path.as_os_str().is_empty()
|| path == std::path::Path::new(".")
|| path == std::path::Path::new("/")
{
// For root directory, treat it as the tree itself with empty parent
// This commit represents the root tree's last modification
(String::from(""), std::path::Path::new(""))
} else {
// For non-root directories, extract name and parent normally
let dir_name = path
.file_name()
.and_then(|n| n.to_str())
.ok_or_else(|| GitError::CustomError("Invalid directory path".to_string()))?
.to_string();
let parent = path
.parent()
.ok_or_else(|| GitError::CustomError("Directory has no parent".to_string()))?;
(dir_name, parent)
};

Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential bug with root directory handling: When handling the root directory (/, ., or empty path), the code creates a TreeItem with an empty string as the name (String::from("")). This item is then used in traverse_commit_history_for_last_modification, which searches for an item with that empty name in the tree items.

However, tree items typically don't have empty names - the root tree itself isn't contained as an item within its parent. This logic appears incorrect and may cause the traversal to fail with a 404 error when querying the root directory's last commit.

Consider either:

  1. Special-casing root directory to return the start_commit directly without traversal
  2. Using the root tree's hash instead of trying to find it as an item in a non-existent parent
Suggested change
let (dir_name, parent): (String, &std::path::Path) = if path.as_os_str().is_empty()
|| path == std::path::Path::new(".")
|| path == std::path::Path::new("/")
{
// For root directory, treat it as the tree itself with empty parent
// This commit represents the root tree's last modification
(String::from(""), std::path::Path::new(""))
} else {
// For non-root directories, extract name and parent normally
let dir_name = path
.file_name()
.and_then(|n| n.to_str())
.ok_or_else(|| GitError::CustomError("Invalid directory path".to_string()))?
.to_string();
let parent = path
.parent()
.ok_or_else(|| GitError::CustomError("Directory has no parent".to_string()))?;
(dir_name, parent)
};
if path.as_os_str().is_empty()
|| path == std::path::Path::new(".")
|| path == std::path::Path::new("/")
{
// For root directory, the commit that created the tree is the last modification.
let mut commit_info: LatestCommitInfo = start_commit.clone().into();
// If commit has a username binding, prefer showing that username
if let Ok(Some(binding)) = handler
.build_commit_binding_info(&start_commit.id.to_string())
.await
&& !binding.is_anonymous
&& binding.matched_username.is_some()
{
let username = binding.matched_username.unwrap();
commit_info.author = username.clone();
commit_info.committer = username;
}
return Ok(commit_info);
}
// For non-root directories, extract name and parent normally
let dir_name = path
.file_name()
.and_then(|n| n.to_str())
.ok_or_else(|| GitError::CustomError("Invalid directory path".to_string()))?
.to_string();
let parent = path
.parent()
.ok_or_else(|| GitError::CustomError("Directory has no parent".to_string()))?;

Copilot uses AI. Check for mistakes.
.find(']')
.filter(|&idx| err_msg.starts_with("[code:") && idx >= 6)
{
err_msg = err_msg[code_end + 1..].trim().to_string();
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential string index panic: The code extracts a substring using err_msg[code_end + 1..] without checking if code_end + 1 is within bounds. If the error message ends with ] (e.g., "[code:404]"), then code_end + 1 would be equal to the string length, which is valid for slicing. However, if there are Unicode characters or the string is malformed, this could still panic.

Consider using .get(code_end + 1..) with proper error handling instead of direct indexing to avoid potential panics.

Suggested change
err_msg = err_msg[code_end + 1..].trim().to_string();
err_msg = err_msg.get(code_end + 1..)
.map(|s| s.trim().to_string())
.unwrap_or_else(|| err_msg.clone());

Copilot uses AI. Check for mistakes.
Comment on lines +234 to +235
let commit = handler.get_commit_by_hash(ref_str).await;
return Ok(Arc::new(commit));
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing error handling for get_commit_by_hash: When resolving a commit SHA, the code calls handler.get_commit_by_hash(ref_str).await without checking if the commit exists. If the SHA is valid format but doesn't exist in the repository, this could lead to unclear error messages or panics depending on the implementation of get_commit_by_hash.

Consider wrapping this call with proper error handling to provide a clear error message when the commit SHA doesn't exist.

Suggested change
let commit = handler.get_commit_by_hash(ref_str).await;
return Ok(Arc::new(commit));
match handler.get_commit_by_hash(ref_str).await {
Ok(commit) => return Ok(Arc::new(commit)),
Err(_) => {
return Err(GitError::CustomError(format!(
"Commit with SHA '{}' does not exist in the repository",
ref_str
)));
}
}

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +89
if let Some(code_end) = err_msg
.find(']')
.filter(|&idx| err_msg.starts_with("[code:") && idx >= 6)
{
err_msg = err_msg[code_end + 1..].trim().to_string();
}

tracing::error!("Application error: {}", err_msg);

let body = Json(CommonResult::<()> {
req_result: false,
data: None,
err_message: err_msg,
});
(self.status, body).into_response()
}
}

// Backwards-compatible From: map generic errors to ApiError::internal
// Parse [code:xxx] format to set proper HTTP status code
impl<E> From<E> for ApiError
where
E: Into<anyhow::Error>,
{
fn from(err: E) -> Self {
ApiError::internal(err)
let anyhow_err = err.into();
let err_str = anyhow_err.to_string();

// Parse [code:xxx] format and map to appropriate HTTP status code
if let Some(code_end) = err_str
.find(']')
.filter(|&idx| err_str.starts_with("[code:") && idx >= 6)
{
let code = &err_str[6..code_end];
return match code {
"400" => ApiError::bad_request(anyhow_err),
"401" => ApiError::with_status(StatusCode::UNAUTHORIZED, anyhow_err),
"403" => ApiError::with_status(StatusCode::FORBIDDEN, anyhow_err),
"404" => ApiError::not_found(anyhow_err),
"409" => ApiError::with_status(StatusCode::CONFLICT, anyhow_err),
"500" => ApiError::internal(anyhow_err),
_ => ApiError::internal(anyhow_err),
};
}
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code duplication: Error code parsing logic is repeated: The logic to parse [code:xxx] format from error messages is duplicated between the into_response method (lines 46-51) and the From<E> implementation (lines 75-89). This creates a maintenance burden and increases the risk of inconsistencies.

Consider extracting this logic into a private helper function like:

fn parse_error_code(err_str: &str) -> Option<(&str, &str)> {
    if let Some(code_end) = err_str.find(']')
        .filter(|&idx| err_str.starts_with("[code:") && idx >= 6)
    {
        let code = &err_str[6..code_end];
        let msg = err_str.get(code_end + 1..)?.trim();
        Some((code, msg))
    } else {
        None
    }
}

Copilot uses AI. Check for mistakes.
Comment on lines 292 to 351
let mut visited = HashSet::new();
let mut commit_queue = VecDeque::new();
let mut last_modification_commit: Option<Arc<Commit>> = None;
let mut commit_queue: VecDeque<(Arc<Commit>, git_internal::hash::SHA1)> = VecDeque::new();

visited.insert(start_commit.id);
commit_queue.push_back(start_commit.clone());
commit_queue.push_back((start_commit.clone(), current_hash));

while let Some(commit) = commit_queue.pop_front() {
// Check parent commits
for &parent_id in &commit.parent_commit_ids {
if visited.contains(&parent_id) {
continue;
}
visited.insert(parent_id);
while let Some((commit, commit_hash)) = commit_queue.pop_front() {
// No parents: this commit introduced the item or is the earliest
// reference we can see. Treat it as the last modification.
if commit.parent_commit_ids.is_empty() {
return Ok((*commit).clone());
}

// Collect information about all parents first to properly handle merge commits
let mut parents_with_matching_hash = Vec::new();

for &parent_id in &commit.parent_commit_ids {
let parent_commit = get_commit_from_cache(handler, parent_id, &cache).await?;
let parent_tree = get_tree_from_cache(handler, parent_commit.tree_id, &cache).await?;

// Navigate to the target directory in parent
let Some(parent_target_tree) =
navigate_to_tree(handler, parent_tree, path, &cache).await?
else {
// Path doesn't exist in parent, this commit introduced it
// Track this as a potential modification, but continue to find the most recent one
if let Some(ref last_mod) = last_modification_commit {
if commit.committer.timestamp > last_mod.committer.timestamp {
last_modification_commit = Some(commit.clone());
}
} else {
last_modification_commit = Some(commit.clone());
}
commit_queue.push_back(parent_commit);
continue;
// Navigate to the target directory in the parent commit
let parent_target_tree_opt =
navigate_to_tree(handler, parent_tree, path, &cache).await?;

// Check if parent has the same item with the same hash
let parent_item_hash = if let Some(parent_target_tree) = parent_target_tree_opt {
parent_target_tree
.tree_items
.iter()
.find(|x| x.name == search_item.name && x.mode == search_item.mode)
.map(|x| x.id)
} else {
None
};

// Check if item exists in parent and compare hash
if let Some(parent_item) = parent_target_tree
.tree_items
.iter()
.find(|x| x.name == search_item.name && x.mode == search_item.mode)
{
if parent_item.id != current_hash {
// Hash changed in this commit
// Track this as a potential modification, but continue to find the most recent one
if let Some(ref last_mod) = last_modification_commit {
if commit.committer.timestamp > last_mod.committer.timestamp {
last_modification_commit = Some(commit.clone());
}
} else {
last_modification_commit = Some(commit.clone());
}
}
// Continue traversing to find all modifications
commit_queue.push_back(parent_commit);
} else {
// Item doesn't exist in parent, this commit added it
// Track this as a potential modification, but continue to find the most recent one
if let Some(ref last_mod) = last_modification_commit {
if commit.committer.timestamp > last_mod.committer.timestamp {
last_modification_commit = Some(commit.clone());
}
} else {
last_modification_commit = Some(commit.clone());
}
commit_queue.push_back(parent_commit);
// if parent has identical content, it's a candidate for traversal
if parent_item_hash == Some(commit_hash) {
parents_with_matching_hash.push((parent_commit, commit_hash));
}
}
}

// Return the most recent modification commit, or start_commit if no modification found
Ok(match last_modification_commit {
Some(commit) => (*commit).clone(),
None => (*start_commit).clone(),
})
}

/// Precise algorithm: walk commit history from HEAD and return the newest commit
/// where the file blob at `path` differs from its parent (or was added).
/// Returns Ok(Some(commit)) on success, Ok(None) if file not found at HEAD.
pub async fn resolve_latest_commit_for_file_path<T: ApiHandler + ?Sized>(
handler: &T,
path: &Path,
) -> Result<Option<Commit>, GitError> {
// Ensure file exists at HEAD and capture its blob id
let head_tree = handler.get_root_tree(None).await?;
let head_commit =
commit_ops::get_tree_relate_commit(handler, head_tree.id, PathBuf::from("/")).await?;

let current_blob =
blob_ops::get_file_blob_id(handler, path, Some(&head_commit.id.to_string())).await?;
let Some(mut curr_blob) = current_blob else {
return Ok(None);
};

let mut curr_commit = head_commit.clone();
// Safety guard to avoid pathological loops on very deep histories
let mut steps: u32 = 0;
const MAX_STEPS: u32 = 10_000;

loop {
steps += 1;
if steps > MAX_STEPS {
// Fallback: give up and return HEAD commit to avoid timeouts
tracing::warn!(
"resolve_latest_commit_for_file_path hit MAX_STEPS for path: {:?}",
path
);
return Ok(Some(curr_commit));
}

// Single-parent traversal (our commits are linear fast-forward in Mono)
let parent_id_opt = curr_commit.parent_commit_ids.first().cloned();
let Some(parent_id) = parent_id_opt else {
// Reached root of history; current commit introduced the file or first reference
return Ok(Some(curr_commit));
};

let parent_commit = handler.get_commit_by_hash(&parent_id.to_string()).await;
let parent_blob =
blob_ops::get_file_blob_id(handler, path, Some(&parent_commit.id.to_string())).await?;

if parent_blob.is_none() {
// File did not exist in parent, so current commit added it
return Ok(Some(curr_commit));
}
let p_blob = parent_blob.unwrap();
if p_blob != curr_blob {
// Blob changed between parent and current -> current touched the path
return Ok(Some(curr_commit));
// Decision logic:
// 1. If any parent has matching hash, continue traversing only those parents
// (the commit inherited content from them, not a real modification)
// 2. If NO parent has matching hash (all different or missing), then this
// commit is the last modification
if !parents_with_matching_hash.is_empty() {
// Commit inherited content from at least one parent
// Continue traversing only the parents with matching hash
for (parent_commit, p_hash) in parents_with_matching_hash {
// Only queue if not already visited
if !visited.contains(&parent_commit.id) {
visited.insert(parent_commit.id);
commit_queue.push_back((parent_commit, p_hash));
}
}
} else {
return Ok((*commit).clone());
}
// Otherwise continue walking back
curr_commit = parent_commit;
curr_blob = p_blob;
}
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential infinite loop in BFS traversal with cycles: While the visited set prevents revisiting the same commit by ID, the queue can grow unbounded if there are many commits in the history. In a repository with complex branching and merge patterns, this could lead to memory exhaustion. Consider adding a maximum depth limit or iteration count to prevent potential DoS scenarios where an attacker could craft a repository with a very deep or complex history.

For example, add a safety limit similar to the old implementation's MAX_STEPS (which was 10,000):

const MAX_ITERATIONS: usize = 10_000;
let mut iterations = 0;
while let Some((commit, commit_hash)) = commit_queue.pop_front() {
    iterations += 1;
    if iterations > MAX_ITERATIONS {
        tracing::warn!("Exceeded maximum iterations in traverse_commit_history_for_last_modification");
        return Ok((*start_commit).clone());
    }
    // ... rest of logic
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant