Skip to content

Conversation

@furtib
Copy link
Contributor

@furtib furtib commented Sep 12, 2025

Why:
We currently do not produce a metadata.json with our distributed rule.

What:
Added metadata output for all steps, then added a final step merging all metadata files into one.

Addresses:
Fixes: #45

@furtib furtib requested a review from Szelethus September 12, 2025 12:54
@furtib furtib self-assigned this Sep 12, 2025
@furtib furtib added the enhancement New feature or request label Sep 12, 2025
@Szelethus Szelethus requested review from dkrupp and nettle September 15, 2025 13:29
if __name__ == "__main__":
output_file = sys.argv[1]
input_files = sys.argv[2:]

Copy link
Contributor

@Szelethus Szelethus Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the first entries in the metadata.json file is version, which is currently on 2. We could check the version and throw an warning (or maybe an error) if its not something we can handle; for instance, if CodeChecker starts emitting new versions of this, we should know about it.

As a fun fact, there is a version 1 to version 2 converter in CodeChecker:
https://github.com/Ericsson/codechecker/blob/1d5c4ffc6cc45f4f6eb61756faee92b8699ff39a/web/client/codechecker_client/metadata.py#L18

Copy link
Contributor Author

@furtib furtib Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a good use of the common lib mentioned in #81 (fail, warn)

# We append info from json2 to json1 from here on out
json1_root["result_source_files"].update(json2_root["result_source_files"])
json1_root["skipped"] = json1_root["skipped"] + json2_root["skipped"]
# Merge time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Merge time
# Merge time; we assume here both json files describe jobs in
# the same analysis invocation, implying that the analysis start
# time is the lowest timestamp, and the end is the highest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a wrong assumption; every cached action will contain its original times.
Should we leave this time part out of the file? (storing the time is not really hermetic or reproducible)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh an excellent observation!! We should totally check how incremental analysis works in CodeChecker, lets not be too smart ourselves.

json1_root = json1["tools"][0]
json2_root = json2["tools"][0]
# We append info from json2 to json1 from here on out
json1_root["result_source_files"].update(json2_root["result_source_files"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should comment here that if the runs came from the same bazel invocation then (and lets make sure that this is actually true) the following fields must be exactly teh same:

  • name
  • version (of codechecker, not the version of the metadata.json file)
  • working_directory
  • output_path

On the other hand, how about these fields? Are we sure these are the same for every job?:

  • command

Also, action_num should be aggregated up to actually reflect the amount of jobs, shouldn't it?

Copy link
Contributor Author

@furtib furtib Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried my best to cover everything here.

There are a couple of misses...
The analyzer tools, for example, have their own version number, but we theoretically could tolerate different versions of them running (?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Generate metadata.json in file-by-file rule

2 participants