Add SARIF output format #790

cosmir17 · 2025-08-18T15:33:09Z

Closes #785

Summary

This PR adds SARIF (Static Analysis Results Interchange Format) v2.1.0 output support to cargo-deny, enabling integration with GitHub code scanning/security and other tools that consume SARIF.

Changes

Added --format sarif option alongside existing human and json formats
Created SARIF v2.1.0 structures and serialization
Implemented diagnostic collection and conversion to SARIF format
Output SARIF to stdout when using the new format

Usage

# Output SARIF for advisories
cargo deny --format sarif check advisories > advisories.sarif

# Output SARIF for licenses
cargo deny --format sarif check licenses > licenses.sarif

# Output SARIF for all checks
cargo deny --format sarif check > all-checks.sarif

# Upload to GitHub Security tab
cargo deny --format sarif check | \
  gh api /repos/OWNER/REPO/code-scanning/sarifs \
    --method POST \
    --field commit_sha=$(git rev-parse HEAD) \
    --field ref=$(git rev-parse --abbrev-ref HEAD) \
    --field sarif=@-

Implementation Details

No new dependencies: Uses only existing serde/serde_json (as requested)
Collects diagnostics during checks and converts them to SARIF at the end
Maps cargo-deny severity levels to SARIF levels (error, warning, note)
Each diagnostic type (advisories, licenses, bans, sources) gets appropriate tags
Compatible with GitHub Security tab and other SARIF consumers

Testing

Tested with real projects and test data:

Advisories check produces valid SARIF (44 results with test data)
Licenses check produces valid SARIF with proper results (319 results in test)
Output validates against SARIF v2.1.0 schema
Successfully parsed by jq and other JSON tools

Example Output

{
  "$schema": "https://json.schemastore.org/sarif-2.1.0.json",
  "version": "2.1.0",
  "runs": [
    {
      "tool": {
        "driver": {
          "name": "cargo-deny",
          "version": "0.18.4",
          "semanticVersion": "0.18.4",
          "rules": [...]
        }
      },
      "results": [
        {
          "ruleId": "Advisory(Vulnerability)",
          "message": {
            "text": "Uncontrolled recursion leads to abort in HTML serialization"
          },
          "level": "error",
          "locations": [...]
        }
      ]
    }
  ]
}

cosmir17 · 2025-08-18T15:34:18Z

Hi @gilescope, can you review this PR when you have time before I un-draft it?

src/sarif_collector.rs

src/cargo-deny/common.rs

gilescope

Would expect a happy path test at least.

cosmir17 · 2025-08-19T11:52:02Z

Would expect a happy path test at least.

Added comprehensive test suite in latest commit - created tests/sarif.rs with 5 tests. All test passing!

The PR should now be ready for re-review @gilescope

gilescope

LGTM

src/cargo-deny/check.rs

Jake-Shadle

This is a case where the code is unimportant in itself other than how it affects the sarif output, so I'll be reviewing the JSON output of this PR with some examples.

{
  "ruleId": "License(Accepted)",
  "message": {
    "text": "license requirements satisfied"
  },
  "level": "note",
  "locations": [
    {
      "physicalLocation": {
        "artifactLocation": {
          "uri": "Cargo.toml"
        },
        "region": {
          "startLine": 35
        }
      }
    }
  ],
  "partialFingerprints": {
    "cargo-deny/fingerprint": "License(Accepted):Cargo.toml:35"
  }
},

ruleId: According to https://github.com/microsoft/sarif-tutorials/blob/main/docs/2-Basics.md#rule-id, it's encouraged to use opaque ids here, which cargo-deny doesn't have. I think it's fine to use the current identifiers rather than add opaque ids as well, but regardless, the current scheme of () is...kind of ugly? We should either do the work of adding opaque ids for every code, or at least do something like l:accepted or s:git-source-underspecified.
locations: I'm not sure the rationale for using only the filename for this. This loses so much context that it makes it impossible to tell which package this pertains to. Additionally, the region object in sarif includes many more fields that could be filled out, particularly the snippet field.
partialFingerprints: Though this is better than it was, this still seems like an insufficient way to calculate a unique fingerprint. I think a much better way would be to use the package identifier, combined with the diagnostic code, combined with the machine agnostic path + byte offsets of the label(s).

Beyond that, there would seem to be an argument for doing more aggressive filtering for what actually ends up in the sarif report. For example

note[skipped-by-root]: skipping crate 'windows_x86_64_msvc = 0.52.6' due to root skip
   ┌─ /home/jake/code/cargo-deny/deny.toml:35:16
   │
35 │     { crate = "[email protected]", reason = "a foundational crate for many that bumps far too frequently to ever have a shared version" },
   │                ━━━━━━━━━━━━━━━━━━             ───────────────────────────────────────────────────────────────────────────────────────── reason
   │                │                               
   │                matched skip root
   │
   ├ windows_x86_64_msvc v0.52.6 (*)

gets output as

{
  "ruleId": "Bans(SkippedByRoot)",
  "message": {
    "text": "skipping crate 'windows-targets = 0.52.6' due to root skip"
  },
  "level": "note",
  "locations": [
    {
      "physicalLocation": {
        "artifactLocation": {
          "uri": "deny.toml"
        },
        "region": {
          "startLine": 35
        }
      }
    }
  ],
  "partialFingerprints": {
    "cargo-deny/fingerprint": "Bans(SkippedByRoot):deny.toml:35"
  }
},

This doesn't seem worthwhile to output in sarif? But I could be wrong, it just seems like there should be a distinction made between warnins and errors, and some of the informational diagnostics that cargo-deny outputs when people are debugging or otherwise checking their configurations.

cosmir17 · 2025-08-20T20:48:48Z

@Jake-Shadle Thanks for the detailed review. I've addressed several points:

Fixed:

Rule IDs now use the format you suggested (l:accepted instead of License(Accepted))
Note/Help severity items are filtered out - only Warning/Error remain in SARIF
Fingerprints now include package identifiers extracted from messages (e.g., core-foundation-0.9.4:b:skipped:deny.toml:30)

Still Missing:

Package context in locations - still shows just "deny.toml" without package info
Region snippet field - only have line numbers, not the actual code snippet
Byte offsets for machine-agnostic fingerprints - using line numbers instead

The core issue is that SarifCollector only receives (code, severity, message, file_path, line) - no structured package metadata. I can extract package names from message strings for fingerprints, but for proper location context we'd need to refactor how diagnostics flow through the system.

Question: Should I:

Continue with message parsing to extract more context (hacky but functional)
Accept current limitations for initial SARIF support and iterate later
Attempt deeper refactoring to preserve package metadata through the pipeline

The current implementation provides value for CI/CD integration even with these limitations. What would you prefer?

cc'ing @gilescope

- Add Sarif variant to Format enum - Create SARIF module with v2.1.0 format structures - Update output handling to support SARIF format - Update logger to suppress logs for SARIF output This provides the foundation for SARIF output. Full implementation of diagnostic collection and SARIF generation to follow.

- Add SARIF collector to accumulate diagnostics - Convert cargo-deny diagnostics to SARIF format - Output complete SARIF v2.1.0 document to stdout - Map diagnostic codes and severity levels appropriately - Successfully generates SARIF for all check types (advisories, licenses, bans, sources) Tested with licenses check - produces valid SARIF with 319 results. Output is compatible with GitHub Security tab and Checkmarx BYOR.

- Change String to &'static str for better performance - Use HashMap entry API to avoid double lookup - Simplify no-op SARIF match arms to one-liners - Add comprehensive test suite with 5 tests including edge cases - All review comments addressed, all tests passing

- Parse real diagnostic codes from the string using FromStr trait instead of hardcoding generic ones - Extract actual file paths and line numbers from diagnostic labels instead of hardcoding "Cargo.lock:1" - Use proper EnumString parsing with fallback heuristics only when exact parsing fails - Fixes issue where all diagnostics had identical codes/locations/fingerprints Addresses review feedback about incorrect diagnostic information in SARIF output

- Use shorter, cleaner rule ID format (e.g. 'l:accepted' instead of 'License(Accepted)') - Filter out Note/Help severity diagnostics to focus on actionable issues - Extract package identifiers from diagnostic messages for better context - Include package info in fingerprints (e.g. 'openssl-0.10.64:l:rejected:Cargo.toml:15') - Add tests for SARIF output quality following TDD approach Addresses Jake's feedback about losing package context and ugly Debug format for rule IDs. While package info is still missing from locations due to architectural constraints, fingerprints now include package identifiers parsed from messages.

…ate version

cosmir17 · 2025-09-02T18:44:42Z

Hi @Jake-Shadle, thanks for your changes. Can I ask any updates from your side?

cosmir17 · 2025-09-21T21:31:55Z

Thank you @Jake-Shadle for merging this PR! The SARIF output support is exactly what we need at Midnight Network for integrating cargo-deny results into our security scanning pipeline.

We're excited to use this feature - would it be possible to publish a new release to crates.io when convenient? We'd love to start using it via cargo install cargo-deny in our CI workflows.

Thanks again for this great addition to cargo-deny! 🙏

cosmir17 · 2025-09-23T12:05:43Z

@BrewTestBot BrewTestBot mentioned this pull request 18 hours ago
Homebrew/homebrew-core#245269

Thank you ❤️