-
Notifications
You must be signed in to change notification settings - Fork 109
Add SARIF output format #790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SARIF output format #790
Conversation
|
Hi @gilescope, can you review this PR when you have time before I un-draft it? |
gilescope
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would expect a happy path test at least.
af86fbb to
fda367b
Compare
Added comprehensive test suite in latest commit - created tests/sarif.rs with 5 tests. All test passing! The PR should now be ready for re-review @gilescope |
gilescope
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Jake-Shadle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a case where the code is unimportant in itself other than how it affects the sarif output, so I'll be reviewing the JSON output of this PR with some examples.
{
"ruleId": "License(Accepted)",
"message": {
"text": "license requirements satisfied"
},
"level": "note",
"locations": [
{
"physicalLocation": {
"artifactLocation": {
"uri": "Cargo.toml"
},
"region": {
"startLine": 35
}
}
}
],
"partialFingerprints": {
"cargo-deny/fingerprint": "License(Accepted):Cargo.toml:35"
}
},ruleId: According to https://github.com/microsoft/sarif-tutorials/blob/main/docs/2-Basics.md#rule-id, it's encouraged to use opaque ids here, which cargo-deny doesn't have. I think it's fine to use the current identifiers rather than add opaque ids as well, but regardless, the current scheme of () is...kind of ugly? We should either do the work of adding opaque ids for every code, or at least do something like l:accepted or s:git-source-underspecified.
locations: I'm not sure the rationale for using only the filename for this. This loses so much context that it makes it impossible to tell which package this pertains to. Additionally, the region object in sarif includes many more fields that could be filled out, particularly the snippet field.
partialFingerprints: Though this is better than it was, this still seems like an insufficient way to calculate a unique fingerprint. I think a much better way would be to use the package identifier, combined with the diagnostic code, combined with the machine agnostic path + byte offsets of the label(s).
Beyond that, there would seem to be an argument for doing more aggressive filtering for what actually ends up in the sarif report. For example
note[skipped-by-root]: skipping crate 'windows_x86_64_msvc = 0.52.6' due to root skip
┌─ /home/jake/code/cargo-deny/deny.toml:35:16
│
35 │ { crate = "[email protected]", reason = "a foundational crate for many that bumps far too frequently to ever have a shared version" },
│ ━━━━━━━━━━━━━━━━━━ ───────────────────────────────────────────────────────────────────────────────────────── reason
│ │
│ matched skip root
│
├ windows_x86_64_msvc v0.52.6 (*)
gets output as
{
"ruleId": "Bans(SkippedByRoot)",
"message": {
"text": "skipping crate 'windows-targets = 0.52.6' due to root skip"
},
"level": "note",
"locations": [
{
"physicalLocation": {
"artifactLocation": {
"uri": "deny.toml"
},
"region": {
"startLine": 35
}
}
}
],
"partialFingerprints": {
"cargo-deny/fingerprint": "Bans(SkippedByRoot):deny.toml:35"
}
},This doesn't seem worthwhile to output in sarif? But I could be wrong, it just seems like there should be a distinction made between warnins and errors, and some of the informational diagnostics that cargo-deny outputs when people are debugging or otherwise checking their configurations.
c8a30dc to
ee13acd
Compare
|
@Jake-Shadle Thanks for the detailed review. I've addressed several points: Fixed:
Still Missing:
The core issue is that SarifCollector only receives (code, severity, message, file_path, line) - no structured package metadata. I can extract package names from message strings for fingerprints, but for proper location context we'd need to refactor how diagnostics flow through the system. Question: Should I:
The current implementation provides value for CI/CD integration even with these limitations. What would you prefer? cc'ing @gilescope |
- Add Sarif variant to Format enum - Create SARIF module with v2.1.0 format structures - Update output handling to support SARIF format - Update logger to suppress logs for SARIF output This provides the foundation for SARIF output. Full implementation of diagnostic collection and SARIF generation to follow.
- Add SARIF collector to accumulate diagnostics - Convert cargo-deny diagnostics to SARIF format - Output complete SARIF v2.1.0 document to stdout - Map diagnostic codes and severity levels appropriately - Successfully generates SARIF for all check types (advisories, licenses, bans, sources) Tested with licenses check - produces valid SARIF with 319 results. Output is compatible with GitHub Security tab and Checkmarx BYOR.
- Change String to &'static str for better performance - Use HashMap entry API to avoid double lookup - Simplify no-op SARIF match arms to one-liners - Add comprehensive test suite with 5 tests including edge cases - All review comments addressed, all tests passing
- Parse real diagnostic codes from the string using FromStr trait instead of hardcoding generic ones - Extract actual file paths and line numbers from diagnostic labels instead of hardcoding "Cargo.lock:1" - Use proper EnumString parsing with fallback heuristics only when exact parsing fails - Fixes issue where all diagnostics had identical codes/locations/fingerprints Addresses review feedback about incorrect diagnostic information in SARIF output
- Use shorter, cleaner rule ID format (e.g. 'l:accepted' instead of 'License(Accepted)') - Filter out Note/Help severity diagnostics to focus on actionable issues - Extract package identifiers from diagnostic messages for better context - Include package info in fingerprints (e.g. 'openssl-0.10.64:l:rejected:Cargo.toml:15') - Add tests for SARIF output quality following TDD approach Addresses Jake's feedback about losing package context and ugly Debug format for rule IDs. While package info is still missing from locations due to architectural constraints, fingerprints now include package identifiers parsed from messages.
ee13acd to
63fc69f
Compare
|
Hi @Jake-Shadle, thanks for your changes. Can I ask any updates from your side? |
|
Thank you @Jake-Shadle for merging this PR! The SARIF output support is exactly what we need at Midnight Network for integrating cargo-deny results into our security scanning pipeline. We're excited to use this feature - would it be possible to publish a new release to crates.io when convenient? We'd love to start using it via Thanks again for this great addition to cargo-deny! 🙏 |
Thank you ❤️ |
Closes #785
Summary
This PR adds SARIF (Static Analysis Results Interchange Format) v2.1.0 output support to cargo-deny, enabling integration with GitHub code scanning/security and other tools that consume SARIF.
Changes
--format sarifoption alongside existinghumanandjsonformatsUsage
Implementation Details
Testing
Tested with real projects and test data:
Example Output
{ "$schema": "https://json.schemastore.org/sarif-2.1.0.json", "version": "2.1.0", "runs": [ { "tool": { "driver": { "name": "cargo-deny", "version": "0.18.4", "semanticVersion": "0.18.4", "rules": [...] } }, "results": [ { "ruleId": "Advisory(Vulnerability)", "message": { "text": "Uncontrolled recursion leads to abort in HTML serialization" }, "level": "error", "locations": [...] } ] } ] }