Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I directly know which Payload caused each crash without run debug all payloads? #231

Open
liujf628995 opened this issue Sep 13, 2023 · 2 comments

Comments

@liujf628995
Copy link

For example,here are four crash logs:
ls logs/crash_*
logs/crash_42b50f.log logs/crash_a5870a.log logs/crash_5acc73.log logs/crash_fa0e83.log
And here are four payloads:
ls corpus/crash/
payload_00208 payload_00244 payload_00254 payload_00262
Can I directly know which Payload caused each crash without run debug all payloads?

@Wenzel
Copy link
Contributor

Wenzel commented Sep 13, 2023

If I understand your issue, you would are not able to associate a given payload to the corresponding crash log ?

I looked at the function responsible for storing the crash log in kafl.fuzzer:kafl_fuzzer/worker/qemu.py:store_crashlogs

    def store_crashlogs(self, label, stamp):
        # Collect current/accumulated logs
        # We don't have a payload ID yet and in fact manager may refuse to store
        if self.hprintf_log and os.path.exists(self.hprintf_logfile):
            if os.path.getsize(self.hprintf_logfile) > 0:
                shutil.copy(self.hprintf_logfile, "%s/logs/%s_%s.log" % (
                    self.config.workdir, label[:5], stamp[:6]))
                os.truncate(self.hprintf_logfile, 0)

And according to the comments we don't have the payload ID yet.
I understands this is inconvenient.

@il-steffen , any particular reason why we can't insert the payload ID in the crash log filename ?
Any possible workaround in your opinion ?

@il-steffen
Copy link
Collaborator

This code is in the individual workers. The potential findings are sent to the manager which will check for uniqueness again before assigning a payload ID and storing in the corpus.

Especially for crashes, the workers may find a lot of crashing inputs that all have the same coverage/bitmap. They are redundant and can potentially take up a lot of storage very fast, so we only want to store unique-looking items. The manager does that by checking the bitmap and assigning a unique ID. The workers do not know this ID so I'm using a subset of the bitmap hash as unique identifier.

Those logs are meant to get a live view of potential findings. They should usually be reproducible by replaying the payloads saved to $WORKDIR/corpus/{crash,kasan,timeout}. You can also find the corresponding saved corpus payload by searching the metadata/node_* info for the truncated hash. If there is no such item or the item does not reproduce the saved crash log, it means the execution is not deterministic for some reason.

A possible fix to this inconvenience may be to send the crash logs or crash log name to the manager and let it be renamed there. I think we're doing this with live collected coverage (binary pt dumps).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants