Skip to content

[ENH] Tool to inspect the contents of the log. #4757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 10, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions rust/log-service/src/bin/chroma-inspect-log-contents.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
use tonic::transport::Channel;

use chroma_types::chroma_proto::log_service_client::LogServiceClient;
use chroma_types::chroma_proto::{PullLogsRequest, ScoutLogsRequest};

#[tokio::main]
async fn main() {
let args = std::env::args().skip(1).collect::<Vec<_>>();
if args.len() != 2 {
eprintln!("USAGE: chroma-inspect-log-state [HOST] [COLLECTION_UUID]");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

The error message and command line usage instruction don't match the actual binary name. The error message refers to chroma-inspect-log-state but the binary is named chroma-inspect-log-contents.

std::process::exit(13);
}
let logservice = Channel::from_shared(args[0].clone())
.expect("could not create channel")
.connect()
.await
.expect("could not connect to log service");
let mut client = LogServiceClient::new(logservice);
let scouted = client
.scout_logs(ScoutLogsRequest {
collection_id: args[1].clone(),
})
.await
.expect("could not inspect log state");
let scouted = scouted.into_inner();
println!("Scouted {scouted:?}");
for i in (scouted.first_uncompacted_record_offset..=scouted.first_uninserted_record_offset)
.step_by(100)
{
let batch_size: i32 =
(std::cmp::min(i + 100, scouted.first_uninserted_record_offset) - i) as i32;
println!("Fetching [{i}:{})", i + batch_size as i64);
let pulled = client
.pull_logs(PullLogsRequest {
collection_id: args[1].clone(),
start_from_offset: i,
batch_size,
end_timestamp: i64::MAX,
})
.await
.expect("could not pull logs");
let pulled = pulled.into_inner();
for (j, record) in pulled.records.into_iter().enumerate() {
println!(
"{} {} {} {}",
i as usize + j,
record.log_offset,
record.record.as_ref().map(|r| r.operation).unwrap_or(4),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Magic number 4 is used for the default operation. Consider using a named constant or referencing the appropriate enum value to make the code more maintainable and understandable.

record
.record
.as_ref()
.map(|r| r.id.as_str())
.unwrap_or("<NONE>")
);
}
}
}
Loading