-
Notifications
You must be signed in to change notification settings - Fork 23
Support CI Warning for Newly Introduced Phrases #302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
In the message about a newly-introduced unknown phrase, it says that type checking the body of the AO will not be performed after that line. But this may be misleading if there's another line preceding it that already would cause the type checking to stop. Should we only warn when the newly-introduced unknown phrase does not already follow an existing unknown phrase? I will ask the other editors what they think. |
@michaelficarra Thank you for bringing this up. Since there hasn’t been much feedback yet, I’d really appreciate hearing what the other editors think as well. From our side, we’d also be happy to adjust the wording to a more neutral tone if that would be preferable. |
|
@stonechoe After discussing with @syg at editor call, we'd like to make sure this won't be too noisy before integrating it. Can you show how many warnings would have been generated in the last 50-or-so commits? Something kind of like this: #!/bin/bash
mkdir -p added_lines
commits=$(git rev-list --reverse -n 50 HEAD)
for SHA in $commits; do
short_sha=$(git rev-parse --short=10 $SHA)
msg=$(git log -1 --pretty=format:'%s' $SHA)
author=$(git log -1 --pretty=format:'%an' $SHA)
date=$(git log -1 --pretty=format:'%ad' $SHA)
echo "- $short_sha $msg $author $date"
added_lines_file="added_lines/${short_sha}.json"
if [ ! -f "$added_lines_file" ]; then
added_lines=$(git diff -U0 --no-color "${SHA}^" "${SHA}" -- spec.html |
# keep only the hunk headers (the @@ lines)
awk '
# Example header: @@ -158,0 +159,2 @@
/^@@/ {
# Extract “+<start>[,<count>]” part
match($0, /\+([0-9]+)(,([0-9]+))?/, a)
start = a[1]
count = (a[3] == "" ? 1 : a[3])
# Print every line number in the added range
for (i = 0; i < count; i++) print start + i
}
')
# Join line numbers with comma or space (whichever format you need)
added_joined=$(echo "$added_lines" | paste -sd "," -)
added_lines_json="[$added_joined]"
# Set it as an output value
echo "added_lines=$added_lines_json" > "$added_lines_file"
fi
git checkout $SHA
esmeta ... "$added_lines_file"
echo
done |
|
Oh, also, on the question of whether to do the "smart" thing and avoid "dead" steps, we decided that we would rather do the "dumb" thing and warn about any new or changed steps that are not understood, regardless of context. |
a9b543c to
0a27906
Compare
|
@michaelficarra I ran an experiment on the latest 50 and 100 commits on the main branch of ecma262. The results are as follows:
It seems that commits associated with certain large editorial PRs tend to generate more warnings. These are the top 5 commits that generated the most warnings:
The attached CSV contains the detailed results, sorted from oldest to newest commit. CSV FileCheck Script
$ python3 check.py spec.html --limit 104 --esmeta-home "$ESMETA_HOME" \
--debug \
--save-raw-dir out/esmeta#!/usr/bin/env python3
import argparse, csv, json, os, re, subprocess, sys, time
HUNK_RE = re.compile(r"\@\@.*\+(\d+)(?:,(\d+))?\s\@\@")
DEFAULT_WARN_RE = re.compile(r"(?i)\bwarn(ing)?\b|\bnew(?:ly)?\b|\bphrase\b") # adjust to your esmeta output
# --------------------------- git helpers ---------------------------
def sh_capture(*args, allow_fail=False):
try:
out = subprocess.check_output(args, stderr=subprocess.STDOUT)
return out.decode("utf-8", "replace")
except subprocess.CalledProcessError as e:
if allow_fail:
return e.output.decode("utf-8", "replace")
raise
def list_commits(limit: int):
out = sh_capture("git","rev-list","--first-parent","--reverse","-n",str(limit),"HEAD")
return [l for l in out.splitlines() if l.strip()]
def commit_meta(sha, fmt):
return sh_capture("git","log","-1",f"--pretty=format:{fmt}",sha).strip()
def first_parent(sha):
parts = sh_capture("git","rev-list","--parents","-n","1",sha).split()
return parts[1] if len(parts) > 1 else None
def added_lines_for_commit(parent, sha, target):
if not parent:
content = sh_capture("git","show",f"{sha}:{target}", allow_fail=True)
if not content:
return []
n = content.count("\n") + (0 if content.endswith("\n") else 1)
return list(range(1, n+1))
diff = sh_capture("git","diff","-U0","--no-color","--find-renames",parent,sha,"--",target, allow_fail=True)
if not diff:
return []
out = []
for m in HUNK_RE.finditer(diff):
start = int(m.group(1)); cnt = int(m.group(2) or "1")
out.extend(range(start, start+cnt))
return out
# --------------------------- esmeta via shell ---------------------------
def run_esmeta_via_shell(cmd_str, added_lines_json, sha,
shell_path="/bin/zsh", debug=False, extra_env=None):
"""
Execute esmeta through a shell so scripts without shebang still work.
sha: commit SHA, passed as -extract:target=<sha>
"""
env = os.environ.copy()
if extra_env:
env.update(extra_env)
full_cmd = f"{cmd_str} -extract:target={sha}"
t0 = time.perf_counter()
proc = subprocess.run(
full_cmd,
input=added_lines_json, text=True,
stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
shell=True, executable=shell_path, env=env, check=False
)
dt = time.perf_counter() - t0
if debug:
print(f"[DEBUG] shell: {shell_path}", file=sys.stderr)
print(f"[DEBUG] cmd: {full_cmd}", file=sys.stderr)
print(f"[DEBUG] elapsed: {dt:.3f}s", file=sys.stderr)
return proc.stdout or ""
# --------------------------- main ---------------------------
def main():
ap = argparse.ArgumentParser()
ap.add_argument("target", nargs="?", default="spec.html", help="file to inspect (default: spec.html)")
ap.add_argument("--limit", type=int, default=50, help="number of commits to scan (default: 50)")
ap.add_argument("--outdir", default="added_lines", help="dir to cache per-commit added line json (default: added_lines)")
ap.add_argument("--csv", default="warnings_summary.csv", help="output CSV path (default: warnings_summary.csv)")
ap.add_argument("--warn-re", default=None, help="Python regex to match warning lines in esmeta output")
ap.add_argument("--esmeta-home", default=os.environ.get("ESMETA_HOME"),
help="path to esmeta (for $ESMETA_HOME in command env)")
ap.add_argument("--esmeta-cmd", default=None,
help="full shell command line for esmeta; if omitted, uses '\"$ESMETA_HOME/bin/esmeta\" extract -status -extract:warn-action'")
ap.add_argument("--esmeta-arg", action="append", default=[],
help="extra args appended (space-joined) to esmeta-cmd; repeatable")
ap.add_argument("--shell", default="/bin/zsh", help="shell to execute command (default: /bin/zsh; use /bin/sh if preferred)")
ap.add_argument("--no-esmeta", action="store_true", help="skip esmeta invocation (structure only)")
ap.add_argument("--debug", action="store_true", help="print command, sizes, timing to stderr")
ap.add_argument("--save-raw-dir", default=None,
help="directory to save per-commit raw esmeta stdout (.out) and stdin json (.in.json)")
args = ap.parse_args()
os.makedirs(args.outdir, exist_ok=True)
if args.save_raw_dir:
os.makedirs(args.save_raw_dir, exist_ok=True)
warn_re = re.compile(args.warn_re) if args.warn_re else DEFAULT_WARN_RE
# Default esmeta command if not provided
if not args.no_esmeta:
if not args.esmeta_cmd:
if not args.esmeta_home:
print("ERROR: provide --esmeta-home or --esmeta-cmd", file=sys.stderr)
sys.exit(2)
args.esmeta_cmd = f'"{os.path.join(args.esmeta_home, "bin", "esmeta")}" extract -status -extract:warn-action'
# Append extra args if any
if args.esmeta_arg:
args.esmeta_cmd = args.esmeta_cmd + " " + " ".join(args.esmeta_arg)
total_warns = 0
with open(args.csv, "w", newline="", encoding="utf-8") as f:
writer = csv.writer(f)
writer.writerow(["short_sha","author","date_iso","warn_count","one_example","subject"])
for sha in list_commits(args.limit):
ssha = sh_capture("git","rev-parse","--short=10",sha).strip()
subject = commit_meta(sha, "%s").replace(",", " ")
author = commit_meta(sha, "%an").replace(",", " ")
dateiso = commit_meta(sha, "%cI")
# cache added lines json per commit
cache_json = os.path.join(args.outdir, f"{ssha}.json")
if not os.path.exists(cache_json):
parent = first_parent(sha)
nums = added_lines_for_commit(parent, sha, args.target)
with open(cache_json, "w", encoding="utf-8") as jf:
json.dump(nums, jf, ensure_ascii=False)
warn_count, example = 0, "-"
if not args.no_esmeta:
added_json = open(cache_json, "r", encoding="utf-8").read()
output = run_esmeta_via_shell(
cmd_str=args.esmeta_cmd,
added_lines_json=added_json,
sha=sha, # <--- 여기서 sha 전달
shell_path=args.shell,
debug=args.debug,
extra_env={"ESMETA_HOME": args.esmeta_home} if args.esmeta_home else None,
)
# optional raw save
if args.save_raw_dir:
with open(os.path.join(args.save_raw_dir, f"{ssha}.out"), "w", encoding="utf-8") as fo:
fo.write(output)
with open(os.path.join(args.save_raw_dir, f"{ssha}.in.json"), "w", encoding="utf-8") as fi:
fi.write(added_json)
# summarize warnings
lines = [l for l in output.splitlines() if l.startswith("::warning")]
warncount = len(lines)
total_warns += warn_count
example = lines[0] if lines else "-"
print(f"- {ssha} {author} {dateiso} warnings={warn_count}")
writer.writerow([ssha, author, dateiso, warn_count, example, subject])
print(f"\nTOTAL warnings across last {args.limit} commits: {total_warns}")
print(f"Summary written to: {args.csv}")
if __name__ == "__main__":
main() |
|
@michaelficarra We’re considering merging this PR and then opening a corresponding PR in tc39/ecma262. If editors have already decided not to proceed with this feature, please let me know. Otherwise, I think opening one on ecma262 would help move things along more quickly. |
|
@stonechoe I'm sorry I forgot to get back to you earlier! We talked about the results you posted in our editor call. The other editors felt that it was a bit too noisy at the moment to integrate, but we believe there is a good amount of low-hanging fruit that pretty much just needs parser support. Take a look at some samples of not-yet-understood steps and types I took from an ECMA-262 CI run: https://gist.github.com/michaelficarra/8ed4c5d1243dc1465f14e90623e1c8c9. We think if there's parser support added for the steps which are effectively just aliases of other currently-understood steps, the signal-to-noise ratio would be acceptable enough to integrate into our CI. So feel free to open the PR whenever you like, but we'll need to work to reduce the noise a little before we can merge it. |
|
@michaelficarra Thanks for sharing your concerns. We recently refactored our metalanguage parser (#316). After this revision, the recent results are as follows for the recent commits in ecma262:
You can see a preview of our warning messages by clicking each |
|
@jhnaldo We talked about this at the last editor call and we are willing to give it a try now with the improved parser. Thanks for doing that work! |
|
@michaelficarra Thanks for the update! That's excellent news. I will create a PR on the ecma262 repository soon. |
This pull request depends on #298.
It introduces the
-extract:warn-action option, which resolves issue #289 by flagging any newly added “Yet” phrases (for steps and types) in the GitHub Actions workflow of tc39/ecma262.The option accepts a JSON array of line numbers through standard input and emits a warning for each matching phrase on those lines. (We plan to write documentation explaining the consequences and impact of newly introduced “Yet” phrases on
esmeta tycheck.)Example — Check for lines 1, 2, and 3:
To enable automatic warnings in a pull-request check, pass the lines added in the diff.
See the YAML snippet below for an example configuration. To see how the warning messages appear, check out this example PR.