Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pre-commit hook + ruff usage #2013

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open

pre-commit hook + ruff usage #2013

wants to merge 18 commits into from

Conversation

cpelley
Copy link
Contributor

@cpelley cpelley commented Jul 12, 2024

  • ruff usage (setup to be equivalent to flake8 + isort + black), as per dagruner, improver_suite etc.
    • Performance improvements from its use.
    • Standardisation across repositories.
  • copyright and init checks also turned into pre-commit checks.
    • No longer any need improver_tests/test_source_code.py (removed in this PR).
    • __init__.py file checks handled via init_check script. Auto-fix supported (creates missing __init__.py files).
    • Copyright check will add missing copyright headers.
  • pre-commit checking against changed files only in actions.
  • Updating ci.yml formatting while I was at it to increase readability (line spacing between steps).
  • CI workflow running the pre-commit (formally flake8 and black) now only look at files touched by the PR.

Note that you can run pre-commit manually (you needn't wait for a commit):

All files:

pre-commit run --all-files

Specific files:

pre-commit run --files <some-file> <some-other-file>

Issues

Why ruff?

This PR switches our use from black, isort, and flake8 to ruff for code formatting and linting. Ruff is gaining popularity due to its performance, efficiency and flexibility. It combines the functionality of black, isort, and flake8 into a single tool (and more if we enable additional rulesets), reducing the complexity of our tooling setup. Ruff offers excellent performance, faster execution times, and the ability to auto-fix issues, ensuring a consistent codebase. Adopting ruff aligns with industry trends and provides a unified approach to code quality with our other repositories.

Note

Some differences between black and ruff:
https://docs.astral.sh/ruff/formatter/#black-compatibility
(CI now only runs pre-commit on files changed so changes will be made to files as people touch them)

@cpelley cpelley marked this pull request as draft July 12, 2024 10:29
@cpelley cpelley force-pushed the PRE_COMMIT branch 2 times, most recently from b890199 to 9aba4cf Compare July 12, 2024 10:57
@cpelley cpelley marked this pull request as ready for review July 12, 2024 11:09
@cpelley cpelley self-assigned this Jul 12, 2024
@cpelley cpelley changed the title pre-commit hook usage pre-commit hook + ruff usage Jul 12, 2024
@cpelley
Copy link
Contributor Author

cpelley commented Jul 12, 2024

May be of interest to you @s-boardman - towards consistency between repositories.

@cpelley
Copy link
Contributor Author

cpelley commented Jul 16, 2024

@gavinevans - you happy for us to proceed with this? I can chase-up a second review from the technical team. I just wasn't sure whether science would be happy with leaving this as a purely technical team concern change.
cheers

@cpelley
Copy link
Contributor Author

cpelley commented Jul 23, 2024

I have updated the code style guide to provide more context.

pyproject.toml Outdated Show resolved Hide resolved
Copy link
Contributor

@SamGriffithsMO SamGriffithsMO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more points, I have been testing improver/nbhood/nbhood.py;

  1. There appears to be an issue with the copyright_check, it is failing when I don't think it should. Can you add a fix option to copyright_check so users don't have to figure out what the correct header is? (this will also tell me if it is working correctly)
  2. bin/improver_tests still contains references to old tools-black/isort/flake8 (is this used/ can it be removed? Or does it need updating)
  3. This change appears to update the line-length from 100 down to 88 (not total clear to me why) - note, I am okay with it going to 88. Best guess for cause is the flake8 configuration in setup.cfg (which is still there and presumably could be removed?)
  4. There is a formatting change that I think are actually quite bad (for readability), can we do anything about this. Specifically spacing in explicit arrays, e.g.
-                    [[[ 0.75,  0.75,  0.5 ,  0.5 ,  0.5 ,  0.75,  0.75],
-                      [ 0.75,  0.55,  0.55,  0.5 ,  0.55,  0.55,  0.55],
-                      [ 0.55,  0.55,  0.5 ,  0.5 ,  0.5 ,  0.5 ,  0.5 ],
-                      [ 0.5 ,  0.5 ,  0.5 ,  0.5 ,  0.5 ,  0.5 ,  0.5 ],
-                      [ 0.5 ,  0.5 ,  0.5 ,  0.5 ,  0.5 ,  0.55,  0.55],
-                      [ 0.55,  0.55,  0.55,  0.5 ,  0.55,  0.55,  0.75],
-                      [ 0.75,  0.75,  0.5 ,  0.5 ,  0.5 ,  0.75,  0.75]]],
+                    (
+                        [
+                            [
+                                [0.75, 0.75, 0.5, 0.5, 0.5, 0.75, 0.75],
+                                [0.75, 0.55, 0.55, 0.5, 0.55, 0.55, 0.55],
+                                [0.55, 0.55, 0.5, 0.5, 0.5, 0.5, 0.5],
+                                [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
+                                [0.5, 0.5, 0.5, 0.5, 0.5, 0.55, 0.55],
+                                [0.55, 0.55, 0.55, 0.5, 0.55, 0.55, 0.75],
+                                [0.75, 0.75, 0.5, 0.5, 0.5, 0.75, 0.75],
+                            ]
+                        ],
+                    )

@cpelley
Copy link
Contributor Author

cpelley commented Jul 23, 2024

There appears to be an issue with the copyright_check, it is failing when I don't think it should.

Can you give an example of this case that failed when you think it shouldn't have?

Can you add a fix option to copyright_check so users don't have to figure out what the correct header is?

The copyright check will already add a copyright header automatically if it is missing. When it appears to be present but isn't correct, it's very difficult to automatically fix or present a diff in the general case. Essentially, checking whether it 'appears' to have a copyright defined is achieved by searching for the term 'copyright' within a comment line ('#'). Beyond that, comments can include arbitrary information so there is no sure way to know where this incorrect copyright finishes, only that is doesn't match.

EDIT: I suppose we could derive a diff based on the number of lines of the correct copyright notice we expect to be there. We could do this, but then wouldn't it be simpler to delete the copyright header and have it automatically populated for you?? (rather than modifying them I mean)

EDIT2: Since the improver_tests/test_source_code.py:test_py_licence that this copyright check replaces doesn't provide this diff, I'm going to consider it out of scope for this PR. Perhaps useful thing to consider in a future mind, though I'm not sure how much so, considering the time people have been managing with their existing solution without such differencing capability 🤷‍♂️

bin/improver_tests still contains references to old tools-black/isort/flake8 (is this used/ can it be removed? Or does it need updating)

I'll grep for references to flake8 and black 👍

This change appears to update the line-length from 100 down to 88 (not total clear to me why) - note, I am okay with it going to 88. Best guess for cause is the flake8 configuration in setup.cfg (which is still there and presumably could be removed?)

Good spotting, yes I didn't think to look at setup.cfg for the flake8 config.
I'll remove and update the ruff limit to 100 (for now at least).

There is a formatting change that I think are actually quite bad (for readability), can we do anything about this. Specifically spacing in explicit arrays, e.g.

Yes, I'm not sure. Will look into it. I think ideally this would be done with an in-line rule exclusion for such things.

@cpelley cpelley marked this pull request as draft July 23, 2024 11:00
@cpelley
Copy link
Contributor Author

cpelley commented Jul 23, 2024

Pulled back into draft to address some points raised by @SamGriffithsMO (thanks for taking a close look at this).

@SamGriffithsMO
Copy link
Contributor

Can you give an example of this case that failed when you think it shouldn't have?

I just ran it on nbhood/nbhood.py, it already has a header that, to me, looks like it should pass. Let me know if you can't reproduce it

(improver_production) [sgriffit@vld456:/net/home/h03/sgriffit/repos/improver]$ git status
HEAD detached at origin/PRE_COMMIT
nothing to commit, working tree clean
(improver_production) [sgriffit@vld456:/net/home/h03/sgriffit/repos/improver]$ pre-commit run -v --files improver/nbhood/nbhood.py
ruff.....................................................................Failed
- hook id: ruff
- duration: 0.12s
- exit code: 1

improver/nbhood/nbhood.py:320:89: E501 Line too long (99 > 88)
improver/nbhood/nbhood.py:329:89: E501 Line too long (89 > 88)
improver/nbhood/nbhood.py:331:89: E501 Line too long (93 > 88)
improver/nbhood/nbhood.py:332:89: E501 Line too long (96 > 88)
improver/nbhood/nbhood.py:336:89: E501 Line too long (91 > 88)
improver/nbhood/nbhood.py:338:89: E501 Line too long (89 > 88)
improver/nbhood/nbhood.py:347:89: E501 Line too long (95 > 88)
improver/nbhood/nbhood.py:349:89: E501 Line too long (93 > 88)
improver/nbhood/nbhood.py:352:89: E501 Line too long (92 > 88)
improver/nbhood/nbhood.py:353:89: E501 Line too long (92 > 88)
improver/nbhood/nbhood.py:381:89: E501 Line too long (99 > 88)
improver/nbhood/nbhood.py:382:89: E501 Line too long (96 > 88)
improver/nbhood/nbhood.py:388:89: E501 Line too long (96 > 88)
improver/nbhood/nbhood.py:642:89: E501 Line too long (92 > 88)
Found 14 errors.

ruff-format..............................................................Failed
- hook id: ruff-format
- duration: 0.05s
- files were modified by this hook

1 file reformatted

Check copyright header...................................................Failed
- hook id: copyright_check
- duration: 0.11s
- exit code: 1

Incorrect Copyright header in 'improver/nbhood/nbhood.py'

Check for missing __init__.py files......................................Passed
- hook id: init_check
- duration: 0.02s

Since the improver_tests/test_source_code.py:test_py_licence that this copyright check replaces doesn't provide this diff, I'm going to consider it out of scope for this PR.

👍

@mo-robert-purvis
Copy link

Copyright check fails on that nbhood.py because first line has Crown copyright not Crown Copyright (case on second word) and second line has released under a BSD 3-Clause license rather than released under the BSD 3-Clause license - a vs the.

copyright_check Outdated Show resolved Hide resolved
Copy link

In order to maintain a backlog of relevant PRs, we automatically label them as stale after 60 days of inactivity.

If this PR is still important to you, then please comment on this PR and the stale label will be removed.

Otherwise this PR will be automatically closed in 30 days time.

@github-actions github-actions bot added the Stale label Sep 22, 2024
line_length = 88
[tool.ruff.lint]
extend-select = ["E", "F", "W", "I"] # add C90 later
ignore = ["E203", "E731", "E501", "E741"] # remove "E501", "E741" later
Copy link
Contributor Author

@cpelley cpelley Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to draw attention to this: Ideally we should work towards removing any exclusions. These exist here now so as not to attempt to cause too many changes (raising standards) whilst first adopting ruff here.

SamGriffithsMO
SamGriffithsMO previously approved these changes Oct 24, 2024
Copy link
Contributor

@SamGriffithsMO SamGriffithsMO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link

@mo-robert-purvis mo-robert-purvis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a lot of files!

Copy link
Contributor

@bayliffe bayliffe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you remake the improver_production environment to exclude the now-redundant packages (black etc), or does the environment need to be left unchanged until post-PS46 to leave things genuinely unchanged?

One tiny typo highlighted. I've not looked at every file, but I get the gist of it (thanks to Sam for being more rigorous). Happy with this change, thanks.

As an aside, I note that the copyright check is by far the slowest of the tests to run, so we might optimise that in future (not required here) to save ourselves whole seconds.

README.md Outdated Show resolved Hide resolved
Co-authored-by: bayliffe <[email protected]>
@cpelley
Copy link
Contributor Author

cpelley commented Nov 11, 2024

Will you remake the improver_production environment to exclude the now-redundant packages (black etc), or does the environment need to be left unchanged until post-PS46 to leave things genuinely unchanged?

Keep our existing environments unchanged 👍
Whilst there are now some redundant packages within, I don't think worth the time/effort/testing changing our pre-existing environments.

One tiny typo highlighted. I've not looked at every file, but I get the gist of it (thanks to Sam for being more rigorous). Happy with this change, thanks.

Done. Thanks.

As an aside, I note that the copyright check is by far the slowest of the tests to run, so we might optimise that in future (not required here) to save ourselves whole seconds.

Whilst the Copyright header check will be order of magnitudes slower than ruff, I presume it will be faster than the python copyright header checker that this replaces. Also, whilst checking the entire repository might take seconds (this branch), the pre-commit only runs against files that have change compared to master. Under normal circumstances, a development branch will be changing tens of files at most (normally), so I think we are OK??

At 20 files, we are about just about at the 1 second threshold and 600 files reaching 10 seconds.

I have switched from parsing with Python to using grep within the copyright header checking to achieve an order of magnitude improvement in speed (with minor caveats to change in pattern matching - not as strict).

Crude benchmark:

xychart-beta
    title "copyright header check benchmark (1 sample only)"
    x-axis "Number of files" [1, 2, 5, 10, 20, 30, 50, 100, 200, 300, 600]
    y-axis "Elapsed time (s)" 0.5 --> 1.3
    line [0.60, 0.62, 0.69, 0.66, 0.61, 0.67, 0.72, 0.79, 0.82, 1.01, 1.23]
Loading

@cpelley
Copy link
Contributor Author

cpelley commented Nov 11, 2024

@nivnac, you may be interested to look at this.

...

@nivnac, you happy for us to merge this?

Copy link
Contributor

@bayliffe bayliffe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cpelley the copyright check is noticeably much faster.

bayliffe
bayliffe previously approved these changes Nov 11, 2024
README.md Outdated
Ensure that you have python available on the path, then install the pre-commit hook by running `pre-commit install` from within your working copy.
pre-commit checks will run against modified files when you commit from then on.

These pre-commit hooks will run as part of continuous integration to maintain standards in the project.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which standards? Should this say "code quality standards"?

Copy link
Contributor Author

@cpelley cpelley Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷‍♂️ I don't see what might be ambiguous myself.
Since I'm making the below suggested change, I'll include this one here too anyway.

copyright_check Outdated Show resolved Hide resolved
Copy link

@mo-robert-purvis mo-robert-purvis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small cosmetic changes suggested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blue_team BoM review required PRs opened by non-BoM developers that require a BoM review test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants