-
Notifications
You must be signed in to change notification settings - Fork 1.3k
dashboard/app: sort bugs by impact #6072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
ffc3b35
to
aba13b4
Compare
The impact score is deducted from the title.
d9b86ff
to
e7160e6
Compare
// The last regexp ".*" is a catch-all for unknown bugs. | ||
var impactOrder = []string{ | ||
// Memory safety. | ||
`^KASAN:.*Write`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally bug types go here:
https://github.com/google/syzkaller/blob/master/pkg/report/crash/types.go
These types are kept up-to-date with parsing code and are tested:
https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/73#L3
The list here may become incomplete, won't be updated when kernel output changes, untested, and duplicates logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Classification/prioritization also ideally moved to pkg/report, then it becomes possible to use it in syz-manager and elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw your recommendation in the graph
code. Thanks for all these prefixes.
Let's dive a bit. The first example are KASAN reads and writes.
Do you want to have 2 types, KASAN-Read and KASAN-Write instead of the current KASAN?
Or do you mean to first map the title to the Types and do submatching by the specialized code here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to have unique types for everything we care to differentiate anywhere is the codebase (i.e. have multiple types instead of the current KASAN).
If there is code that only cases about a type being a KASAN report, we can add additional predicates, e.g.:
func (t Type) IsKASAN() bool {
return t == KASANRead || t == KASANWrite || t == KASANDoubleFree
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks more clear. I'll check whether it could be merged with the graph drawing code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I probably had the same line of reasoning when worked on the graph drawing mode.
If the bug type analysis is moved into pkg/report, then it will require re-parsing all crash logs. That's probably why I did not do it.
Can we at least have title parsing for types unified?
Also wonder if it's possible to re-use our extensive and maintained testcase set in pkg/report for this parsing.
// Concurrency. | ||
`^KCSAN:`, | ||
// Locking. | ||
`^BUG: sleeping function`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking if this should be above BUG/NULL derefs...
cc @melver @ramosian-glider @FlorentRevest thoughts on exact ordering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear there's a definitive ordering.
I would probably:
- get all recent kernel CVEs (~5 years)
- those that reference syzbot report (Reported-by), retrieve syzbot report
- get type of report per these regexp rules
- sort by count
If there are report types left that didn't have CVE need either a) some manual sorting like here, or b) classify based on similarity (e.g. UBSAN index out of bounds ~~ KASAN out of bounds) to bug types that appeared in CVE until there are no unclassified types left.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, problem with recent CVEs is that they hardly distinguish severity anymore. Older CVEs likely were a stronger indicator of severity, but there also were fewer of those.
@@ -1938,6 +1939,7 @@ func createUIBug(c context.Context, bug *Bug, state *ReportingState, managers [] | |||
uiBug := &uiBug{ | |||
Namespace: bug.Namespace, | |||
Title: bug.displayTitle(), | |||
ImpactScore: TitleToImpact(bug.Title), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we at least should look at AltTitles.
What do you think about duplicates? If I duped KASAN write to a WARNING, should it have WARNING score?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the overall idea. Thanks!
Duping KASAN to WARNING is a signal to UP the Rank for both bugs to MAX(KASAN, WARNING).
var impactOrder = []string{ | ||
// Memory safety. | ||
`^KASAN:.*Write`, | ||
`^KASAN:.*Read`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about invalid-free/double-free? These look as bad as writes to me. Maybe worse.
// Memory safety. | ||
`^KASAN:.*Write`, | ||
`^KASAN:.*Read`, | ||
`^WARNING: refcount bug`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also "KASAN: null-ptr-deref", they shouldn't go this high. They should be clustered with DoS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've personally been considering null ptr deref and general protection faults as still somewhat high prio based on https://googleprojectzero.blogspot.com/2023/01/exploiting-null-dereferences-in-linux.html and the broad feeling that they are some sort of memory corruption. I agree it's less obvious than a UAF/OOB and should be lower but maybe not necessarily a DoS either
`^KASAN:.*Write`, | ||
`^KASAN:.*Read`, | ||
`^WARNING: refcount bug`, | ||
`^UBSAN: array-index`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also KFENCE:
.
Ok, I think we should run this on all pkg/report testcases and (1) ensure that no non-corrupted report is unclassified (falls into .*
), (2) we have at least one test for every regexp.
It should significantly help to make things correct now, and avoid regressions in future.
`^Internal error in`, | ||
`^kernel panic:`, | ||
`^general protection fault`, | ||
`.*`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it somehow special, e.g. 0 or -1. We need to see them on the dashboard. It's now "low", it's we failed to classify it.
I love this... :) FWIW, I've additionally been prioritizing bugs based on 1- whether they obviously don't need privileges (do reproducers trivially work in the setuid(nobody) instance ?) 2- do they use some sort of fault injection/hwpoison ? 3- do they use a corrupted file system (based on the fsck result) 4- whether they happen orders of magnitude more often than the rest (because it impacts syzbot's ability to find other bugs) It may be hard to throw all of this into a score but I wanted to put it out there for discussion |
@@ -168,6 +168,7 @@ <h1><a href="/{{$.Namespace}}">syzbot</a></h1> | |||
<th><a onclick="return sortTable(this, 'Kernel', textSort)" href="#">Kernel</a></th> | |||
{{end}} | |||
<th><a onclick="return sortTable(this, 'Title', textSort)" href="#">Title</a></th> | |||
<th><a onclick="return sortTable(this, 'ImpactScore', numSort)" href="#">ImpactScore</a></th> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel there should be a tooltip here broadly explaining what this is about. As far as I can tell this impact is focused on security (maybe RT folks care about locking issues more than about security issues ?) which would be good to clarify. Additionally, it should be made clear that the heuristic is based on what syzbot can deduce in an obvious way. For example: if the bug is reported as a KASAN write UAF then it's obvious that this is as bad as it can get, on the other hand, a bug with a random title could actually share the same root cause with a KASAN write UAF so it may also be bad. I am convinced that given the size of the backlog these prioritization criteria are very useful but it should still be made very transparent to the user that this is totally opportunistic and heuristic based and in no way a definitive authoritative exploitability score.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to simplify this step if possible.
The proposal is to call it "Rank" to lower the expectations.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually we will complicate the algorithm and include other signals (not only titles).
At that moment it could be called "ImpactScore" or something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah all of this is fine. My point was mainly that users who want to understand what this means should have a way to figure out what the heuristic is about. If not through a tooltip in the UI, maybe a "(help?)" link to the documentation like in the assets column of bug pages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it makes sense to summarize our rationale and in case of any questions/change requests review the overall rationale.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added 2 "TODO"s to this page's header.
The impact score is deducted from the title.
Closes #6071.
TODO: