Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci(pre-commit): Prevent cache time-of-use/time-of-check problems #1314

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Holzhaus
Copy link

@Holzhaus Holzhaus commented Sep 3, 2022

Interally, proselint uses caching to speed up analysis, and the
corresponding code is written in a way that makes it obvious that it is
not intended to run multiple instances of proselint simulatenously.

For example, the memoize function in tools.py suffers from
time-of-use vs. time-of-check problems:

if not os.path.isdir(cache_dirname):
    # ...
    # While this is running, another process may create the
    # `cache_dirname` directory.
    # ...
    os.makedirs(cache_dirname)  # <-- Crashes if `cache_dirname` exists

As the code between the check and the directory creating is quite short
and does not take a lot of time to execute, it's unlikely to encounter
this issue on a reasonably modern desktop computer, but I ran into this
issue multiple times when running proselint via pre-commit on a
relatively slow low-powered ARM device in a fresh docker container.

It would be preferable to rewrite the caching code with multiprocessing
in mind and by incorporating concepts like "Easier To Ask for Forgiveness, not
Permission" [1], but for now, it should suffice to disable
parallel execution in pre-commit.

Interally, proselint uses caching to speed up analysis, and the
corresponding code is written in a way that makes it obvious that it is
not intended to run multiple instances of proselint simulatenously.

For example, the `memoize` function in `tools.py` suffers from
time-of-use vs. time-of-check problems:

    if not os.path.isdir(cache_dirname):
        # ...
        # While this is running, another process may create the
        # `cache_dirname` directory.
        # ...
        os.makedirs(cache_dirname)  # <-- Crashes if `cache_dirname` exists

As the code between the check and the directory creating is quite short
and does not take a lot of time to execute, it's unlikely to encounter
this issue on a reasonably modern desktop computer, but I ran into this
issue multiple times when running proselint via pre-commit on a
relatively slow low-powered ARM device in a fresh docker container.

It would be preferable to rewrite the caching code with multiprocessing
in mind and by incorporating concepts like "Easier To Ask for Forgiveness, not
Permission" [1], but for now, it should suffice to disable
parallel execution in pre-commit.
@Nytelife26
Copy link
Member

Sorry, I've only just seen this. Thank you for your contribution!
I'm not so sure about the idea of outright disabling parallelism in pre-commit hooks. I think I'd rather refactor to fix the issue, perhaps by adding a try-except.

@Holzhaus
Copy link
Author

Holzhaus commented Nov 4, 2022

I'm not so sure about the idea of outright disabling parallelism in pre-commit hooks.

It's certainly not ideal, but currently proselint is broken on slow computers and will fail intermittently.The parallelism doesn't really make a noticeable performance difference so I thought this was a reasonable workaround until someone finds the time to fix the root cause.

I think I'd rather refactor to fix the issue, perhaps by adding a try-except.

These time-of-check/time-of-use issues are found in many locations in the code base, not just that one code snippet it posted above. And even that one is simplified (I stripped out the legacy cache migration). If you think this is straightforward to fix, than it makes sense to do that. But I wasn't even sure that writing to and reading from the cache is thread-safe when the code that checks if the cache exists isn't. And I currently don't have enough spare time to rewrite proselint's caching logic in a thread-safe manner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants