w.r.t. the correctness, I think internally there are some niche methods that would need to be overhauled too. like report.correctness_by_topic() instead we could do something like metrics_by_topic("correctness") or something