Skip to content

fix: Improve error handling within Quay Importer #1893

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

jcrossley3
Copy link
Contributor

@jcrossley3 jcrossley3 commented Jul 22, 2025

Fixes: #1892

Invalid sources will show a proper error.

Disabling importer runs will show a "canceled" message.

Errors occurring after the list of SBOM's to be fetched is created, e.g. an expired tag, could stop an importer run. This has been fixed, and any errors should be included in the report after the run completes.

Summary by Sourcery

Improve error handling in the Quay importer by capturing non-fatal errors in the report, propagating cancellation and invalid source errors, and ensuring the run completes even when individual SBOM fetches or uploads fail.

Bug Fixes:

  • Invalid import sources now return an immediate error.
  • Importer cancellation returns a canceled error instead of silently breaking the loop.
  • SBOM retrieval and upload failures are logged and included in the report without aborting the entire run.

Enhancements:

  • Extract fetch and store logic into dedicated methods that handle and report errors gracefully.
  • Refactor SBOM enumeration to use fallible streams with size validation and robust error propagation.
  • Maintain a single OCI client instance in QuayWalker and introduce an SBOM struct to encapsulate reference and size.
  • Remove unnecessary Default derives from Repository and Batch for stricter deserialization.

Tests:

  • Add a test to verify error handling for invalid sources.
  • Update existing size limit tests to align with new validation logic.

Fixes: trustification#1892

Invalid sources will show a proper error.

Disabling importer runs will show a "canceled" message.

Errors occurring after the list of SBOM's to be fetched is created,
e.g. an expired tag, could stop an importer run. This has been fixed,
and any errors should be included in the report after the run
completes.
Copy link

sourcery-ai bot commented Jul 22, 2025

Reviewer's Guide

Refactors the Quay importer walker to improve error handling by initializing a shared OCI client, restructuring fetch/store to log and collect errors per SBOM without aborting runs, converting SBOM and repository listing to fallible async streams with cancellation support, refining SBOM validation and data structures, and adding an integration test for invalid sources.

File-Level Changes

Change Details Files
Initialize and reuse OCI client across walker execution
  • Added oci field to QuayWalker struct
  • Moved oci::Client::new() into constructor
  • Removed local OCI client instantiation from run()
modules/importer/src/runner/quay/walker.rs
Separate fetch and store with per-item error reporting
  • Created fetch() returning Option, logging warnings and adding retrieval errors to report
  • Updated store() to catch ingestion errors, log warnings, extend warnings or add errors to report
  • Replaced early ? propagation to ensure run continues on individual failures
modules/importer/src/runner/quay/walker.rs
Refactor SBOM and repository listing to fallible async streams
  • Changed sboms() to return Result<Vec, Error> with try_filter, try_flatten, try_collect
  • Updated repositories() to use try_unfold and try_flatten, propagate HTTP and JSON errors, check for cancellation
  • Adjusted repository() and associated stream transforms for consistent error handling
modules/importer/src/runner/quay/walker.rs
Refine SBOM validation and data models
  • Replaced too_big() with valid(), inverting behavior to filter valid SBOMs
  • Modified Repository.sboms() to return Vec and introduced SBOM struct
  • Removed unnecessary Default derives from Repository and Batch
modules/importer/src/runner/quay/walker.rs
Add integration test for invalid importer source
  • Added invalid_source test to verify new error path on bad sources
  • Updated existing test size_limit to use adjusted limit value
modules/importer/src/runner/quay/walker.rs

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jcrossley3 - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `modules/importer/src/runner/quay/walker.rs:169` </location>
<code_context>
-    async fn repositories(&self, page: Option<String>) -> impl Stream<Item = Repository> {
-        stream::unfold(page, async |state| match state {
-            None => None,
+    async fn repositories(&self, page: Option<String>) -> Result<Vec<Repository>, Error> {
+        stream::try_unfold(page, async |state| match state {
             Some(page) => {
</code_context>

<issue_to_address>
repositories() now returns a Result and collects all repositories eagerly.

Collecting all repositories into a Vec may lead to high memory usage with large datasets. Consider if a streaming approach is more scalable for your use case.

Suggested implementation:

```rust
    fn repositories(&self, page: Option<String>) -> impl Stream<Item = Result<Repository, Error>> {
        stream::try_unfold(page, move |state| {
            let this = self.clone();
            async move {
                match state {
                    Some(page) => {
                        if this.context.is_canceled().await {
                            return Some((Err(Error::Canceled), None));
                        }
                        let batch_result: Result<Batch, Error> = this
                            .client
                            .get(this.importer.repositories_url(&page))
                            .send()
                            .await
                            .and_then(|resp| resp.json::<Batch>().await.map_err(Error::from));
                        match batch_result {
                            Ok(batch) => {
                                let mut repos = batch.repositories.into_iter();
                                let next_page = batch.next_page;
                                if let Some(repo) = repos.next() {
                                    // If there are more repositories in this batch, keep yielding them
                                    Some((Ok(repo), Some((repos, next_page))))
                                } else if let Some(next_page) = next_page {
                                    // If there are no more in this batch but there is a next page, continue
                                    Some((Err(Error::NoRepositories), Some((Vec::new().into_iter(), Some(next_page)))))
                                } else {
                                    // No more repositories or pages
                                    None
                                }
                            }
                            Err(e) => Some((Err(e), None)),
                        }
                    }
                    None => None,
                }
            }
        })
        .flat_map(|result| {
            // If we have a tuple of (repos iterator, next_page), stream each repo
            futures::stream::iter(match result {
                (Ok(repo), Some((mut repos, next_page))) => {
                    let mut items = vec![Ok(repo)];
                    items.extend(repos.map(Ok));
                    items
                }
                (Err(e), _) => vec![Err(e)],
                _ => vec![],
            })
        })

```

- You may need to adjust the logic for handling batches and pagination, depending on the actual structure of your `Batch` and how pagination is implemented.
- Ensure that `self` is `Clone` or otherwise accessible in the async closure.
- You may need to import `futures::stream::{self, Stream, StreamExt}` and other necessary traits.
- The above code assumes that `Batch` has `repositories: Vec<Repository>` and `next_page: Option<String>`.
- Adjust error handling and streaming logic as needed for your actual use case.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link

codecov bot commented Jul 22, 2025

Codecov Report

❌ Patch coverage is 17.52577% with 80 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.91%. Comparing base (ecc14e8) to head (726c415).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
modules/importer/src/runner/quay/walker.rs 16.66% 76 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1893      +/-   ##
==========================================
- Coverage   68.06%   67.91%   -0.16%     
==========================================
  Files         365      365              
  Lines       23065    23152      +87     
  Branches    23065    23152      +87     
==========================================
+ Hits        15699    15723      +24     
- Misses       6486     6548      +62     
- Partials      880      881       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jcrossley3
Copy link
Contributor Author

@sourcery-ai review

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jcrossley3 - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `modules/importer/src/runner/quay/walker.rs:143` </location>
<code_context>
-    async fn sboms(&self) -> Vec<Reference> {
-        let tags: Vec<(Reference, u64)> = self
+    async fn sboms(&self) -> Result<Vec<Reference>, Error> {
+        let repositories = self
             .repositories(Some(String::new()))
-            .await
-            .filter(|repo| future::ready(repo.is_public && self.modified_since(repo.last_modified)))
-            .map(|repo| self.repository(repo.namespace, repo.name))
+            .try_fold(vec![], |mut acc, repo| async move {
+                if repo.is_public && self.modified_since(repo.last_modified) {
+                    acc.push(self.repository(repo.namespace, repo.name));
+                }
+                Ok(acc)
+            })
+            .await?;
+        let tags: Vec<(Reference, u64)> = stream::iter(repositories)
             .buffer_unordered(32) // TODO: make configurable
</code_context>

<issue_to_address>
The repositories stream now returns Results, but unwrap_or_default() is still used later.

Using unwrap_or_default() here may hide errors from repository results. Please handle errors explicitly to prevent silent failures.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
        let tags: Vec<(Reference, u64)> = stream::iter(repositories)
            .buffer_unordered(32) // TODO: make configurable
            .filter_map(|repo| future::ready(repo.unwrap_or_default().sboms(&self.importer.source)))
            .map(stream::iter)
            .flatten()
            .collect()
            .await;
=======
        let tags: Vec<(Reference, u64)> = stream::iter(repositories)
            .buffer_unordered(32) // TODO: make configurable
            .filter_map(|repo_result| {
                match repo_result {
                    Ok(repo) => future::ready(Some(repo.sboms(&self.importer.source))),
                    Err(e) => {
                        log::warn!("Error retrieving repository: {e}");
                        future::ready(None)
                    }
                }
            })
            .map(stream::iter)
            .flatten()
            .collect()
            .await;
>>>>>>> REPLACE

</suggested_fix>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@jcrossley3
Copy link
Contributor Author

@sourcery-ai review

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jcrossley3 - I've reviewed your changes - here's some feedback:

  • In sboms(), the try_fold on the repositories stream will abort the entire run on the first API error—consider catching and reporting each repositories() error instead of propagating Err so that listing failures become skipped entries rather than early exit.
  • SBOM references filtered out by too_big are silently dropped—consider logging or adding a report entry for size‐skipped items so they’re visible in the final report.
  • You may want to check for cancellation earlier in sboms() (or during repository traversal) to allow long listing operations to be canceled promptly rather than only inside the import loop.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In sboms(), the try_fold on the repositories stream will abort the entire run on the first API error—consider catching and reporting each repositories() error instead of propagating Err so that listing failures become skipped entries rather than early exit.
- SBOM references filtered out by too_big are silently dropped—consider logging or adding a report entry for size‐skipped items so they’re visible in the final report.
- You may want to check for cancellation earlier in sboms() (or during repository traversal) to allow long listing operations to be canceled promptly rather than only inside the import loop.

## Individual Comments

### Comment 1
<location> `modules/importer/src/runner/quay/walker.rs:181` </location>
<code_context>
+    fn repositories(&self, page: Option<String>) -> impl Stream<Item = Result<Repository, Error>> {
</code_context>

<issue_to_address>
Switching to try_unfold/try_flatten changes error propagation for repository streaming.

This change causes the stream to stop at the first error instead of continuing. Please confirm if this behavior aligns with your requirements for handling partial results.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@jcrossley3 jcrossley3 requested review from dejanb, ctron and mrizzi July 23, 2025 21:30
@jcrossley3
Copy link
Contributor Author

@sourcery-ai review

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jcrossley3 - I've reviewed your changes - here's some feedback:

  • Extract the hardcoded 32 concurrency limit in your stream buffering into a named constant or configuration to avoid magic numbers.
  • Add a context.is_canceled() check inside the sboms pagination stream (right after fetching each page) so cancellation halts the importer immediately.
  • Consider batching report updates for fetch and store operations instead of locking the mutex separately in each method to reduce contention and improve clarity.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Extract the hardcoded `32` concurrency limit in your stream buffering into a named constant or configuration to avoid magic numbers.
- Add a `context.is_canceled()` check inside the `sboms` pagination stream (right after fetching each page) so cancellation halts the importer immediately.
- Consider batching report updates for fetch and store operations instead of locking the mutex separately in each method to reduce contention and improve clarity.

## Individual Comments

### Comment 1
<location> `modules/importer/src/runner/quay/walker.rs:143` </location>
<code_context>
-                } else {
-                    Some(reference)
-                }
+    async fn sboms(&self) -> Result<Vec<Reference>, Error> {
+        self.repositories(Some(String::new()))
+            .try_filter(|v| future::ready(v.is_public && self.modified_since(v.last_modified)))
+            .map_ok(|v| self.repository(v.namespace, v.name))
+            .try_buffer_unordered(32)
+            .map_ok(|repo| {
+                stream::iter(
+                    repo.sboms(&self.importer.source)
+                        .into_iter()
+                        .map(Ok::<_, Error>), // try_flatten expects Results
+                )
             })
-            .collect()
+            .try_flatten()
+            .try_filter_map(|sbom| future::ready(Ok(self.valid(&sbom).then_some(sbom.reference))))
+            .try_collect()
+            .await
     }

</code_context>

<issue_to_address>
sboms() now returns a Result and uses try_stream combinators for error handling.

Note that with this approach, the process stops at the first error. If you need to return partial results, consider gathering successful results and handling errors separately.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
    async fn sboms(&self) -> Result<Vec<Reference>, Error> {
        self.repositories(Some(String::new()))
            .try_filter(|v| future::ready(v.is_public && self.modified_since(v.last_modified)))
            .map_ok(|v| self.repository(v.namespace, v.name))
            .try_buffer_unordered(32)
            .map_ok(|repo| {
                stream::iter(
                    repo.sboms(&self.importer.source)
                        .into_iter()
                        .map(Ok::<_, Error>), // try_flatten expects Results
                )
            })
            .try_flatten()
            .try_filter_map(|sbom| future::ready(Ok(self.valid(&sbom).then_some(sbom.reference))))
            .try_collect()
            .await
    }
=======
    /// Returns a tuple of (successful References, errors encountered)
    async fn sboms(&self) -> Result<(Vec<Reference>, Vec<Error>), Error> {
        use futures::stream::{StreamExt, TryStreamExt};
        let mut successes = Vec::new();
        let mut errors = Vec::new();

        let mut stream = self.repositories(Some(String::new()))
            .try_filter(|v| future::ready(v.is_public && self.modified_since(v.last_modified)))
            .map_ok(|v| self.repository(v.namespace, v.name))
            .try_buffer_unordered(32)
            .map_ok(|repo| {
                stream::iter(
                    repo.sboms(&self.importer.source)
                        .into_iter()
                        .map(Ok::<_, Error>),
                )
            })
            .try_flatten();

        while let Some(result) = stream.next().await {
            match result {
                Ok(sbom) => {
                    if self.valid(&sbom) {
                        successes.push(sbom.reference);
                    }
                }
                Err(e) => {
                    errors.push(e);
                }
            }
        }

        Ok((successes, errors))
    }
>>>>>>> REPLACE

</suggested_fix>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@jcrossley3
Copy link
Contributor Author

@sourcery-ai dismiss

@ctron
Copy link
Contributor

ctron commented Jul 25, 2025

* Extract the hardcoded `32` concurrency limit in your stream buffering into a named constant or configuration to avoid magic numbers.

I think there's a point in making this configurable. Maybe hitting a rate limit and not being able to opt out of it might he a severe issue.

* Add a `context.is_canceled()` check inside the `sboms` pagination stream (right after fetching each page) so cancellation halts the importer immediately.

Also this one, to my understanding, these are remote calls, and be quite a lot of them. Having a cancellation point here seems to make sense.

Also accounting for bad data in junk repos that can now cause us to
error before our list of SBOM's to ingest is created.
@jcrossley3
Copy link
Contributor Author

I think there's a point in making this configurable.

Done

Having a cancellation point here seems to make sense.

Done

@jcrossley3 jcrossley3 requested a review from ctron July 25, 2025 14:36
@jcrossley3 jcrossley3 enabled auto-merge July 25, 2025 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve error messaging in Quay importer
2 participants