Skip to content

Conversation

@cosmir17
Copy link
Contributor

Closes #785

Summary

This PR adds SARIF (Static Analysis Results Interchange Format) v2.1.0 output support to cargo-deny, enabling integration with GitHub code scanning/security and other tools that consume SARIF.

Changes

  • Added --format sarif option alongside existing human and json formats
  • Created SARIF v2.1.0 structures and serialization
  • Implemented diagnostic collection and conversion to SARIF format
  • Output SARIF to stdout when using the new format

Usage

# Output SARIF for advisories
cargo deny --format sarif check advisories > advisories.sarif

# Output SARIF for licenses
cargo deny --format sarif check licenses > licenses.sarif

# Output SARIF for all checks
cargo deny --format sarif check > all-checks.sarif

# Upload to GitHub Security tab
cargo deny --format sarif check | \
  gh api /repos/OWNER/REPO/code-scanning/sarifs \
    --method POST \
    --field commit_sha=$(git rev-parse HEAD) \
    --field ref=$(git rev-parse --abbrev-ref HEAD) \
    --field sarif=@-

Implementation Details

  • No new dependencies: Uses only existing serde/serde_json (as requested)
  • Collects diagnostics during checks and converts them to SARIF at the end
  • Maps cargo-deny severity levels to SARIF levels (error, warning, note)
  • Each diagnostic type (advisories, licenses, bans, sources) gets appropriate tags
  • Compatible with GitHub Security tab and other SARIF consumers

Testing

Tested with real projects and test data:

  • Advisories check produces valid SARIF (44 results with test data)
  • Licenses check produces valid SARIF with proper results (319 results in test)
  • Output validates against SARIF v2.1.0 schema
  • Successfully parsed by jq and other JSON tools

Example Output

{
  "$schema": "https://json.schemastore.org/sarif-2.1.0.json",
  "version": "2.1.0",
  "runs": [
    {
      "tool": {
        "driver": {
          "name": "cargo-deny",
          "version": "0.18.4",
          "semanticVersion": "0.18.4",
          "rules": [...]
        }
      },
      "results": [
        {
          "ruleId": "Advisory(Vulnerability)",
          "message": {
            "text": "Uncontrolled recursion leads to abort in HTML serialization"
          },
          "level": "error",
          "locations": [...]
        }
      ]
    }
  ]
}

@cosmir17
Copy link
Contributor Author

Hi @gilescope, can you review this PR when you have time before I un-draft it?

Copy link

@gilescope gilescope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would expect a happy path test at least.

@cosmir17 cosmir17 force-pushed the feat/sarif-output-support branch from af86fbb to fda367b Compare August 19, 2025 11:49
@cosmir17
Copy link
Contributor Author

Would expect a happy path test at least.

Added comprehensive test suite in latest commit - created tests/sarif.rs with 5 tests. All test passing!

The PR should now be ready for re-review @gilescope

Copy link

@gilescope gilescope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cosmir17 cosmir17 marked this pull request as ready for review August 19, 2025 14:31
@cosmir17 cosmir17 requested a review from Jake-Shadle as a code owner August 19, 2025 14:31
@Jake-Shadle Jake-Shadle changed the title feat: add SARIF output format support for GitHub code scanning integration Add SARIF output format support for GitHub code scanning integration Aug 20, 2025
@cosmir17 cosmir17 requested a review from Jake-Shadle August 20, 2025 10:13
Copy link
Member

@Jake-Shadle Jake-Shadle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a case where the code is unimportant in itself other than how it affects the sarif output, so I'll be reviewing the JSON output of this PR with some examples.

{
  "ruleId": "License(Accepted)",
  "message": {
    "text": "license requirements satisfied"
  },
  "level": "note",
  "locations": [
    {
      "physicalLocation": {
        "artifactLocation": {
          "uri": "Cargo.toml"
        },
        "region": {
          "startLine": 35
        }
      }
    }
  ],
  "partialFingerprints": {
    "cargo-deny/fingerprint": "License(Accepted):Cargo.toml:35"
  }
},

ruleId: According to https://github.com/microsoft/sarif-tutorials/blob/main/docs/2-Basics.md#rule-id, it's encouraged to use opaque ids here, which cargo-deny doesn't have. I think it's fine to use the current identifiers rather than add opaque ids as well, but regardless, the current scheme of () is...kind of ugly? We should either do the work of adding opaque ids for every code, or at least do something like l:accepted or s:git-source-underspecified.
locations: I'm not sure the rationale for using only the filename for this. This loses so much context that it makes it impossible to tell which package this pertains to. Additionally, the region object in sarif includes many more fields that could be filled out, particularly the snippet field.
partialFingerprints: Though this is better than it was, this still seems like an insufficient way to calculate a unique fingerprint. I think a much better way would be to use the package identifier, combined with the diagnostic code, combined with the machine agnostic path + byte offsets of the label(s).

Beyond that, there would seem to be an argument for doing more aggressive filtering for what actually ends up in the sarif report. For example

note[skipped-by-root]: skipping crate 'windows_x86_64_msvc = 0.52.6' due to root skip
   ┌─ /home/jake/code/cargo-deny/deny.toml:35:16
   │
35 │     { crate = "[email protected]", reason = "a foundational crate for many that bumps far too frequently to ever have a shared version" },
   │                ━━━━━━━━━━━━━━━━━━             ───────────────────────────────────────────────────────────────────────────────────────── reason
   │                │                               
   │                matched skip root
   │
   ├ windows_x86_64_msvc v0.52.6 (*)

gets output as

{
  "ruleId": "Bans(SkippedByRoot)",
  "message": {
    "text": "skipping crate 'windows-targets = 0.52.6' due to root skip"
  },
  "level": "note",
  "locations": [
    {
      "physicalLocation": {
        "artifactLocation": {
          "uri": "deny.toml"
        },
        "region": {
          "startLine": 35
        }
      }
    }
  ],
  "partialFingerprints": {
    "cargo-deny/fingerprint": "Bans(SkippedByRoot):deny.toml:35"
  }
},

This doesn't seem worthwhile to output in sarif? But I could be wrong, it just seems like there should be a distinction made between warnins and errors, and some of the informational diagnostics that cargo-deny outputs when people are debugging or otherwise checking their configurations.

@cosmir17 cosmir17 force-pushed the feat/sarif-output-support branch from c8a30dc to ee13acd Compare August 20, 2025 20:40
@cosmir17
Copy link
Contributor Author

@Jake-Shadle Thanks for the detailed review. I've addressed several points:

Fixed:

  • Rule IDs now use the format you suggested (l:accepted instead of License(Accepted))
  • Note/Help severity items are filtered out - only Warning/Error remain in SARIF
  • Fingerprints now include package identifiers extracted from messages (e.g., core-foundation-0.9.4:b:skipped:deny.toml:30)

Still Missing:

  • Package context in locations - still shows just "deny.toml" without package info
  • Region snippet field - only have line numbers, not the actual code snippet
  • Byte offsets for machine-agnostic fingerprints - using line numbers instead

The core issue is that SarifCollector only receives (code, severity, message, file_path, line) - no structured package metadata. I can extract package names from message strings for fingerprints, but for proper location context we'd need to refactor how diagnostics flow through the system.

Question: Should I:

  1. Continue with message parsing to extract more context (hacky but functional)
  2. Accept current limitations for initial SARIF support and iterate later
  3. Attempt deeper refactoring to preserve package metadata through the pipeline

The current implementation provides value for CI/CD integration even with these limitations. What would you prefer?

cc'ing @gilescope

cosmir17 and others added 12 commits August 22, 2025 13:15
- Add Sarif variant to Format enum
- Create SARIF module with v2.1.0 format structures
- Update output handling to support SARIF format
- Update logger to suppress logs for SARIF output

This provides the foundation for SARIF output. Full implementation
of diagnostic collection and SARIF generation to follow.
- Add SARIF collector to accumulate diagnostics
- Convert cargo-deny diagnostics to SARIF format
- Output complete SARIF v2.1.0 document to stdout
- Map diagnostic codes and severity levels appropriately
- Successfully generates SARIF for all check types (advisories, licenses, bans, sources)

Tested with licenses check - produces valid SARIF with 319 results.
Output is compatible with GitHub Security tab and Checkmarx BYOR.
- Change String to &'static str for better performance
- Use HashMap entry API to avoid double lookup
- Simplify no-op SARIF match arms to one-liners
- Add comprehensive test suite with 5 tests including edge cases
- All review comments addressed, all tests passing
- Parse real diagnostic codes from the string using FromStr trait instead of hardcoding generic ones
- Extract actual file paths and line numbers from diagnostic labels instead of hardcoding "Cargo.lock:1"
- Use proper EnumString parsing with fallback heuristics only when exact parsing fails
- Fixes issue where all diagnostics had identical codes/locations/fingerprints

Addresses review feedback about incorrect diagnostic information in SARIF output
- Use shorter, cleaner rule ID format (e.g. 'l:accepted' instead of 'License(Accepted)')
- Filter out Note/Help severity diagnostics to focus on actionable issues
- Extract package identifiers from diagnostic messages for better context
- Include package info in fingerprints (e.g. 'openssl-0.10.64:l:rejected:Cargo.toml:15')
- Add tests for SARIF output quality following TDD approach

Addresses Jake's feedback about losing package context and ugly Debug format for rule IDs.
While package info is still missing from locations due to architectural constraints,
fingerprints now include package identifiers parsed from messages.
@Jake-Shadle Jake-Shadle force-pushed the feat/sarif-output-support branch from ee13acd to 63fc69f Compare August 22, 2025 14:58
@cosmir17
Copy link
Contributor Author

cosmir17 commented Sep 2, 2025

Hi @Jake-Shadle, thanks for your changes. Can I ask any updates from your side?

@Jake-Shadle Jake-Shadle changed the title Add SARIF output format support for GitHub code scanning integration Add SARIF output format Sep 19, 2025
@Jake-Shadle Jake-Shadle merged commit 2a86a72 into EmbarkStudios:main Sep 19, 2025
12 checks passed
@cosmir17
Copy link
Contributor Author

Thank you @Jake-Shadle for merging this PR! The SARIF output support is exactly what we need at Midnight Network for integrating cargo-deny results into our security scanning pipeline.

We're excited to use this feature - would it be possible to publish a new release to crates.io when convenient? We'd love to start using it via cargo install cargo-deny in our CI workflows.

Thanks again for this great addition to cargo-deny! 🙏

@cosmir17
Copy link
Contributor Author

@BrewTestBot BrewTestBot mentioned this pull request 18 hours ago
Homebrew/homebrew-core#245269

Thank you ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SARIF output support to upload to github

3 participants