-
Notifications
You must be signed in to change notification settings - Fork 11
New sca_match_kind and CallTransitiveReachable RPC v0 #342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
test plan: see related PR in semgrep
Backwards compatibility summary:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This spurred a bunch of questions in my head which I then talked to @mmcqd about. We sometimes have dependencies that are used both transitively and directly. Currently, we just treat these as direct dependencies and declare findings to be unreachable if we cannot find a match in first-party code, but we should stop doing so since we want to be more rigorous about our handling of transitive risk: even if a vulnerable is unreachable directly, it may still be reachable via some third-party dependency that also uses the vulnerable dependency.
I saw two ways of handling this:
- We can try to encompass both direct and transitive cases in a single finding. This is nice in some cases (e.g. if it is neither directly nor transitively reachable, it is nice to have a single unreachable finding) but it is much harder to make the analysis that we performed clear (we would need to build complicated UX to allow customers to see that, for example, a finding was directly unreachable but transitively reachable). This is close to what we currently do, but some customers have complained that the behavior of Semgrep is not clear when a dependency is both direct and transitive.
- We can create two findings when a dependency is both direct and transitive, one for the transitive usage and one for the direct usage. We would then apply only one type of analysis to each finding. The direct finding would be one of
DirectReachable
,DirectUnreachable
, orLockfileOnlyMatch Direct
, and the transitive finding would be one ofTransitiveReachable
, TransitiveUnreachable,
TransitiveUnknown, or
LockfileOnlyMatch Transitive`.
In either case, I think we will need to add another value to the Transitivity sum type in FoundDependency
that allows a dependency to be TransitiveAndDirect
.
Matthew and I thought that it made more sense to go with approach number 2, which splits findings in two when a dependency is both direct and transitive. This way, each sca_match_kind
corresponds to one type of analysis. In option one, the sca_match_kind
would need to encompass multiple types of analysis (direct and transitive) in a way that would likely get awkward, especially when there are multiple direct code matches that produce multiple findings. I would love to hear what you think of this, though, and am happy to chat about it in the morning.
semgrep_output_v1.atd
Outdated
* (reachable as originally defined by Semgrep Inc.) | ||
* the match location will be in some target code. | ||
*) | ||
| Reachable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need an Unreachable
option as well, right? Unreachable feels like fundamentally a different type of match because we ran the pattern, but didn't find any results. Like TransitiveUnreachable
, direct Unreachable
is also a positive finding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if we didn't find any results in first party code, there will be no finding.
Or actually we might just generate at first a LockfileOnlyMatch Transitive, that hopefully we can then
transform in TransitiveUnreachable.
But let's keep this Unreachable, we'll see.
I committed some changes that might make sense if we go with option 2 above |
I like option 2. |
test plan:
see related PR in semgrep
make setup && make
to update the generated code after editing a.atd
file (TODO: have a CI check)For example, the Semgrep backend need to still be able to consume data
generated by Semgrep 1.50.0.
See https://atd.readthedocs.io/en/latest/atdgen-tutorial.html#smooth-protocol-upgrades
Note that the types related to the semgrep-core JSON output or the
semgrep-core RPC do not need to be backward compatible!