pre-commit hook + ruff usage #2013

cpelley · 2024-07-12T10:29:54Z

ruff usage (setup to be equivalent to flake8 + isort + black), as per dagruner, improver_suite etc.
- Performance improvements from its use.
- Standardisation across repositories.
copyright and init checks also turned into pre-commit checks.
- No longer any need improver_tests/test_source_code.py (removed in this PR).
- __init__.py file checks handled via init_check script. Auto-fix supported (creates missing __init__.py files).
- Copyright check will add missing copyright headers.
pre-commit checking against changed files only in actions.
Updating ci.yml formatting while I was at it to increase readability (line spacing between steps).
CI workflow running the pre-commit (formally flake8 and black) now only look at files touched by the PR.

Note that you can run pre-commit manually (you needn't wait for a commit):

All files:

pre-commit run --all-files

Specific files:

pre-commit run --files <some-file> <some-other-file>

Issues

https://github.com/MetOffice/improver_suite/issues/2214

Why ruff?

This PR switches our use from black, isort, and flake8 to ruff for code formatting and linting. Ruff is gaining popularity due to its performance, efficiency and flexibility. It combines the functionality of black, isort, and flake8 into a single tool (and more if we enable additional rulesets), reducing the complexity of our tooling setup. Ruff offers excellent performance, faster execution times, and the ability to auto-fix issues, ensuring a consistent codebase. Adopting ruff aligns with industry trends and provides a unified approach to code quality with our other repositories.

Note

Some differences between black and ruff:
https://docs.astral.sh/ruff/formatter/#black-compatibility
(CI now only runs pre-commit on files changed so changes will be made to files as people touch them)

cpelley · 2024-07-12T11:16:32Z

May be of interest to you @s-boardman - towards consistency between repositories.

cpelley · 2024-07-16T14:20:42Z

@gavinevans - you happy for us to proceed with this? I can chase-up a second review from the technical team. I just wasn't sure whether science would be happy with leaving this as a purely technical team concern change.
cheers

cpelley · 2024-07-23T07:26:33Z

I have updated the code style guide to provide more context.

pyproject.toml

SamGriffithsMO

A few more points, I have been testing improver/nbhood/nbhood.py;

There appears to be an issue with the copyright_check, it is failing when I don't think it should. Can you add a fix option to copyright_check so users don't have to figure out what the correct header is? (this will also tell me if it is working correctly)
bin/improver_tests still contains references to old tools-black/isort/flake8 (is this used/ can it be removed? Or does it need updating)
This change appears to update the line-length from 100 down to 88 (not total clear to me why) - note, I am okay with it going to 88. Best guess for cause is the flake8 configuration in setup.cfg (which is still there and presumably could be removed?)
There is a formatting change that I think are actually quite bad (for readability), can we do anything about this. Specifically spacing in explicit arrays, e.g.

-                    [[[ 0.75,  0.75,  0.5 ,  0.5 ,  0.5 ,  0.75,  0.75],
-                      [ 0.75,  0.55,  0.55,  0.5 ,  0.55,  0.55,  0.55],
-                      [ 0.55,  0.55,  0.5 ,  0.5 ,  0.5 ,  0.5 ,  0.5 ],
-                      [ 0.5 ,  0.5 ,  0.5 ,  0.5 ,  0.5 ,  0.5 ,  0.5 ],
-                      [ 0.5 ,  0.5 ,  0.5 ,  0.5 ,  0.5 ,  0.55,  0.55],
-                      [ 0.55,  0.55,  0.55,  0.5 ,  0.55,  0.55,  0.75],
-                      [ 0.75,  0.75,  0.5 ,  0.5 ,  0.5 ,  0.75,  0.75]]],
+                    (
+                        [
+                            [
+                                [0.75, 0.75, 0.5, 0.5, 0.5, 0.75, 0.75],
+                                [0.75, 0.55, 0.55, 0.5, 0.55, 0.55, 0.55],
+                                [0.55, 0.55, 0.5, 0.5, 0.5, 0.5, 0.5],
+                                [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
+                                [0.5, 0.5, 0.5, 0.5, 0.5, 0.55, 0.55],
+                                [0.55, 0.55, 0.55, 0.5, 0.55, 0.55, 0.75],
+                                [0.75, 0.75, 0.5, 0.5, 0.5, 0.75, 0.75],
+                            ]
+                        ],
+                    )

cpelley · 2024-07-23T11:00:38Z

There appears to be an issue with the copyright_check, it is failing when I don't think it should.

Can you give an example of this case that failed when you think it shouldn't have?

Can you add a fix option to copyright_check so users don't have to figure out what the correct header is?

The copyright check will already add a copyright header automatically if it is missing. When it appears to be present but isn't correct, it's very difficult to automatically fix or present a diff in the general case. Essentially, checking whether it 'appears' to have a copyright defined is achieved by searching for the term 'copyright' within a comment line ('#'). Beyond that, comments can include arbitrary information so there is no sure way to know where this incorrect copyright finishes, only that is doesn't match.

EDIT: I suppose we could derive a diff based on the number of lines of the correct copyright notice we expect to be there. We could do this, but then wouldn't it be simpler to delete the copyright header and have it automatically populated for you?? (rather than modifying them I mean)

EDIT2: Since the improver_tests/test_source_code.py:test_py_licence that this copyright check replaces doesn't provide this diff, I'm going to consider it out of scope for this PR. Perhaps useful thing to consider in a future mind, though I'm not sure how much so, considering the time people have been managing with their existing solution without such differencing capability 🤷‍♂️

bin/improver_tests still contains references to old tools-black/isort/flake8 (is this used/ can it be removed? Or does it need updating)

I'll grep for references to flake8 and black 👍

This change appears to update the line-length from 100 down to 88 (not total clear to me why) - note, I am okay with it going to 88. Best guess for cause is the flake8 configuration in setup.cfg (which is still there and presumably could be removed?)

Good spotting, yes I didn't think to look at setup.cfg for the flake8 config.
I'll remove and update the ruff limit to 100 (for now at least).

There is a formatting change that I think are actually quite bad (for readability), can we do anything about this. Specifically spacing in explicit arrays, e.g.

Yes, I'm not sure. Will look into it. I think ideally this would be done with an in-line rule exclusion for such things.

cpelley · 2024-07-23T11:17:15Z

Pulled back into draft to address some points raised by @SamGriffithsMO (thanks for taking a close look at this).

SamGriffithsMO · 2024-07-23T11:45:36Z

Can you give an example of this case that failed when you think it shouldn't have?

I just ran it on nbhood/nbhood.py, it already has a header that, to me, looks like it should pass. Let me know if you can't reproduce it

(improver_production) [sgriffit@vld456:/net/home/h03/sgriffit/repos/improver]$ git status
HEAD detached at origin/PRE_COMMIT
nothing to commit, working tree clean
(improver_production) [sgriffit@vld456:/net/home/h03/sgriffit/repos/improver]$ pre-commit run -v --files improver/nbhood/nbhood.py
ruff.....................................................................Failed
- hook id: ruff
- duration: 0.12s
- exit code: 1

improver/nbhood/nbhood.py:320:89: E501 Line too long (99 > 88)
improver/nbhood/nbhood.py:329:89: E501 Line too long (89 > 88)
improver/nbhood/nbhood.py:331:89: E501 Line too long (93 > 88)
improver/nbhood/nbhood.py:332:89: E501 Line too long (96 > 88)
improver/nbhood/nbhood.py:336:89: E501 Line too long (91 > 88)
improver/nbhood/nbhood.py:338:89: E501 Line too long (89 > 88)
improver/nbhood/nbhood.py:347:89: E501 Line too long (95 > 88)
improver/nbhood/nbhood.py:349:89: E501 Line too long (93 > 88)
improver/nbhood/nbhood.py:352:89: E501 Line too long (92 > 88)
improver/nbhood/nbhood.py:353:89: E501 Line too long (92 > 88)
improver/nbhood/nbhood.py:381:89: E501 Line too long (99 > 88)
improver/nbhood/nbhood.py:382:89: E501 Line too long (96 > 88)
improver/nbhood/nbhood.py:388:89: E501 Line too long (96 > 88)
improver/nbhood/nbhood.py:642:89: E501 Line too long (92 > 88)
Found 14 errors.

ruff-format..............................................................Failed
- hook id: ruff-format
- duration: 0.05s
- files were modified by this hook

1 file reformatted

Check copyright header...................................................Failed
- hook id: copyright_check
- duration: 0.11s
- exit code: 1

Incorrect Copyright header in 'improver/nbhood/nbhood.py'

Check for missing __init__.py files......................................Passed
- hook id: init_check
- duration: 0.02s

Since the improver_tests/test_source_code.py:test_py_licence that this copyright check replaces doesn't provide this diff, I'm going to consider it out of scope for this PR.

👍

mo-robert-purvis · 2024-07-23T12:59:36Z

Copyright check fails on that nbhood.py because first line has Crown copyright not Crown Copyright (case on second word) and second line has released under a BSD 3-Clause license rather than released under the BSD 3-Clause license - a vs the.

copyright_check

github-actions · 2024-09-22T01:44:29Z

In order to maintain a backlog of relevant PRs, we automatically label them as stale after 60 days of inactivity.

If this PR is still important to you, then please comment on this PR and the stale label will be removed.

Otherwise this PR will be automatically closed in 30 days time.

cpelley · 2024-10-24T06:49:38Z

pyproject.toml

-line_length = 88
+[tool.ruff.lint]
+extend-select = ["E", "F", "W", "I"]  # add C90 later
+ignore = ["E203", "E731", "E501", "E741"]  # remove "E501", "E741" later


Wanted to draw attention to this: Ideally we should work towards removing any exclusions. These exist here now so as not to attempt to cause too many changes (raising standards) whilst first adopting ruff here.

SamGriffithsMO

👍

mo-robert-purvis

That was a lot of files!

bayliffe

Will you remake the improver_production environment to exclude the now-redundant packages (black etc), or does the environment need to be left unchanged until post-PS46 to leave things genuinely unchanged?

One tiny typo highlighted. I've not looked at every file, but I get the gist of it (thanks to Sam for being more rigorous). Happy with this change, thanks.

As an aside, I note that the copyright check is by far the slowest of the tests to run, so we might optimise that in future (not required here) to save ourselves whole seconds.

README.md

Co-authored-by: bayliffe <[email protected]>

cpelley · 2024-11-11T07:38:58Z

Will you remake the improver_production environment to exclude the now-redundant packages (black etc), or does the environment need to be left unchanged until post-PS46 to leave things genuinely unchanged?

Keep our existing environments unchanged 👍
Whilst there are now some redundant packages within, I don't think worth the time/effort/testing changing our pre-existing environments.

One tiny typo highlighted. I've not looked at every file, but I get the gist of it (thanks to Sam for being more rigorous). Happy with this change, thanks.

Done. Thanks.

As an aside, I note that the copyright check is by far the slowest of the tests to run, so we might optimise that in future (not required here) to save ourselves whole seconds.

Whilst the Copyright header check will be order of magnitudes slower than ruff, I presume it will be faster than the python copyright header checker that this replaces. Also, whilst checking the entire repository might take seconds (this branch), the pre-commit only runs against files that have change compared to master. Under normal circumstances, a development branch will be changing tens of files at most (normally), so I think we are OK??

~~At 20 files, we are about just about at the 1 second threshold and 600 files reaching 10 seconds.~~

I have switched from parsing with Python to using grep within the copyright header checking to achieve an order of magnitude improvement in speed (with minor caveats to change in pattern matching - not as strict).

Crude benchmark:

xychart-beta
    title "copyright header check benchmark (1 sample only)"
    x-axis "Number of files" [1, 2, 5, 10, 20, 30, 50, 100, 200, 300, 600]
    y-axis "Elapsed time (s)" 0.5 --> 1.3
    line [0.60, 0.62, 0.69, 0.66, 0.61, 0.67, 0.72, 0.79, 0.82, 1.01, 1.23]

cpelley · 2024-11-11T10:23:53Z

@nivnac, you may be interested to look at this.

...

@nivnac, you happy for us to merge this?

bayliffe

Thanks @cpelley the copyright check is noticeably much faster.

mo-robert-purvis · 2024-11-11T14:48:29Z

README.md

+Ensure that you have python available on the path, then install the pre-commit hook by running `pre-commit install` from within your working copy.
+pre-commit checks will run against modified files when you commit from then on.
+
+These pre-commit hooks will run as part of continuous integration to maintain standards in the project.


Which standards? Should this say "code quality standards"?

🤷‍♂️ I don't see what might be ambiguous myself.
Since I'm making the below suggested change, I'll include this one here too anyway.

copyright_check

mo-robert-purvis

Some small cosmetic changes suggested

cpelley marked this pull request as draft July 12, 2024 10:29

cpelley force-pushed the PRE_COMMIT branch 2 times, most recently from b890199 to 9aba4cf Compare July 12, 2024 10:57

pre-commit hook

f13d784

cpelley force-pushed the PRE_COMMIT branch from 9aba4cf to f13d784 Compare July 12, 2024 11:05

cpelley marked this pull request as ready for review July 12, 2024 11:09

cpelley self-assigned this Jul 12, 2024

cpelley requested review from mo-robert-purvis and SamGriffithsMO July 12, 2024 11:09

cpelley changed the title ~~pre-commit hook usage~~ pre-commit hook + ruff usage Jul 12, 2024

mo-robert-purvis previously approved these changes Jul 15, 2024

View reviewed changes

cpelley added blue_team test labels Jul 16, 2024

Update Code-Style-Guide.rst

4ce4b71

cpelley dismissed mo-robert-purvis’s stale review via 4ce4b71 July 23, 2024 07:24

SamGriffithsMO reviewed Jul 23, 2024

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

SamGriffithsMO previously requested changes Jul 23, 2024

View reviewed changes

cpelley marked this pull request as draft July 23, 2024 11:00

mo-robert-purvis reviewed Jul 23, 2024

View reviewed changes

copyright_check Outdated Show resolved Hide resolved

improved pre-commit config

37ddc2b

mo-robert-purvis mentioned this pull request Jul 25, 2024

Plugin discovery #2009

Merged

github-actions bot added the Stale label Sep 22, 2024

cpelley commented Oct 24, 2024

View reviewed changes

SamGriffithsMO previously approved these changes Oct 24, 2024

View reviewed changes

mo-robert-purvis previously approved these changes Oct 25, 2024

View reviewed changes

bayliffe reviewed Nov 8, 2024

View reviewed changes

README.md Outdated Show resolved Hide resolved

Update README.md

09ac849

Co-authored-by: bayliffe <[email protected]>

cpelley dismissed stale reviews from mo-robert-purvis and SamGriffithsMO via 09ac849 November 11, 2024 07:31

cpelley added 2 commits November 11, 2024 08:31

Merge remote-tracking branch 'origin/master' into PRE_COMMIT

3f5e093

no python within copyright check

f08ad24

cpelley force-pushed the PRE_COMMIT branch from 0f6a463 to f08ad24 Compare November 11, 2024 09:36

fixed grep for different versions

5b25614

cpelley requested review from bayliffe, SamGriffithsMO and mo-robert-purvis November 11, 2024 10:19

bayliffe reviewed Nov 11, 2024

View reviewed changes

bayliffe previously approved these changes Nov 11, 2024

View reviewed changes

mo-robert-purvis reviewed Nov 11, 2024

View reviewed changes

copyright_check Outdated Show resolved Hide resolved

mo-robert-purvis previously approved these changes Nov 11, 2024

View reviewed changes

suggested review changes

bb07197

cpelley dismissed stale reviews from mo-robert-purvis and bayliffe via bb07197 November 11, 2024 15:59

cpelley requested review from mo-robert-purvis and bayliffe November 11, 2024 15:59

mo-robert-purvis approved these changes Nov 11, 2024

View reviewed changes

Merge remote-tracking branch 'origin/master' into PRE_COMMIT

d8d4a1f

SamGriffithsMO approved these changes Nov 12, 2024

View reviewed changes

nivnac approved these changes Nov 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pre-commit hook + ruff usage #2013

pre-commit hook + ruff usage #2013

cpelley commented Jul 12, 2024 •

edited

Loading

cpelley commented Jul 12, 2024

cpelley commented Jul 16, 2024

cpelley commented Jul 23, 2024

SamGriffithsMO left a comment

cpelley commented Jul 23, 2024 •

edited

Loading

cpelley commented Jul 23, 2024

SamGriffithsMO commented Jul 23, 2024

mo-robert-purvis commented Jul 23, 2024

github-actions bot commented Sep 22, 2024

cpelley Oct 24, 2024 •

edited

Loading

SamGriffithsMO left a comment

mo-robert-purvis left a comment

bayliffe left a comment

cpelley commented Nov 11, 2024 •

edited

Loading

cpelley commented Nov 11, 2024

bayliffe left a comment

mo-robert-purvis Nov 11, 2024

cpelley Nov 11, 2024 •

edited

Loading

mo-robert-purvis left a comment

pre-commit hook + ruff usage #2013

Are you sure you want to change the base?

pre-commit hook + ruff usage #2013

Conversation

cpelley commented Jul 12, 2024 • edited Loading

Issues

Why ruff?

Note

cpelley commented Jul 12, 2024

cpelley commented Jul 16, 2024

cpelley commented Jul 23, 2024

SamGriffithsMO left a comment

Choose a reason for hiding this comment

cpelley commented Jul 23, 2024 • edited Loading

cpelley commented Jul 23, 2024

SamGriffithsMO commented Jul 23, 2024

mo-robert-purvis commented Jul 23, 2024

github-actions bot commented Sep 22, 2024

cpelley Oct 24, 2024 • edited Loading

Choose a reason for hiding this comment

SamGriffithsMO left a comment

Choose a reason for hiding this comment

mo-robert-purvis left a comment

Choose a reason for hiding this comment

bayliffe left a comment

Choose a reason for hiding this comment

cpelley commented Nov 11, 2024 • edited Loading

cpelley commented Nov 11, 2024

bayliffe left a comment

Choose a reason for hiding this comment

mo-robert-purvis Nov 11, 2024

Choose a reason for hiding this comment

cpelley Nov 11, 2024 • edited Loading

Choose a reason for hiding this comment

mo-robert-purvis left a comment

Choose a reason for hiding this comment

cpelley commented Jul 12, 2024 •

edited

Loading

cpelley commented Jul 23, 2024 •

edited

Loading

cpelley Oct 24, 2024 •

edited

Loading

cpelley commented Nov 11, 2024 •

edited

Loading

cpelley Nov 11, 2024 •

edited

Loading