-
Notifications
You must be signed in to change notification settings - Fork 3
Generate metadata in distributed rule #79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| if __name__ == "__main__": | ||
| output_file = sys.argv[1] | ||
| input_files = sys.argv[2:] | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the first entries in the metadata.json file is version, which is currently on 2. We could check the version and throw an warning (or maybe an error) if its not something we can handle; for instance, if CodeChecker starts emitting new versions of this, we should know about it.
As a fun fact, there is a version 1 to version 2 converter in CodeChecker:
https://github.com/Ericsson/codechecker/blob/1d5c4ffc6cc45f4f6eb61756faee92b8699ff39a/web/client/codechecker_client/metadata.py#L18
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a good use of the common lib mentioned in #81 (fail, warn)
src/metadata_merge.py
Outdated
| # We append info from json2 to json1 from here on out | ||
| json1_root["result_source_files"].update(json2_root["result_source_files"]) | ||
| json1_root["skipped"] = json1_root["skipped"] + json2_root["skipped"] | ||
| # Merge time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Merge time | |
| # Merge time; we assume here both json files describe jobs in | |
| # the same analysis invocation, implying that the analysis start | |
| # time is the lowest timestamp, and the end is the highest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a wrong assumption; every cached action will contain its original times.
Should we leave this time part out of the file? (storing the time is not really hermetic or reproducible)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh an excellent observation!! We should totally check how incremental analysis works in CodeChecker, lets not be too smart ourselves.
| json1_root = json1["tools"][0] | ||
| json2_root = json2["tools"][0] | ||
| # We append info from json2 to json1 from here on out | ||
| json1_root["result_source_files"].update(json2_root["result_source_files"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should comment here that if the runs came from the same bazel invocation then (and lets make sure that this is actually true) the following fields must be exactly teh same:
- name
- version (of codechecker, not the version of the metadata.json file)
- working_directory
- output_path
On the other hand, how about these fields? Are we sure these are the same for every job?:
- command
Also, action_num should be aggregated up to actually reflect the amount of jobs, shouldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried my best to cover everything here.
There are a couple of misses...
The analyzer tools, for example, have their own version number, but we theoretically could tolerate different versions of them running (?).
Why:
We currently do not produce a
metadata.jsonwith our distributed rule.What:
Added metadata output for all steps, then added a final step merging all metadata files into one.
Addresses:
Fixes: #45