Skip to content

Performance suggestion: do not run unselected plugins/checks #751

Open
@asottile

Description

@asottile

In GitLab by @hugovk on Jun 5, 2020, 01:45

Please read this brief portion of documentation before going any further: http://flake8.pycqa.org/en/latest/internal/contributing.html#filing-a-bug

Please describe how you installed Flake8

$ pip install -U flake8
$ brew install flake8
# etc.

Please provide the exact, unmodified output of flake8 --bug-report

{
  "dependencies": [],
  "platform": {
    "python_implementation": "CPython",
    "python_version": "3.8.3",
    "system": "Darwin"
  },
  "plugins": [
    {
      "is_local": false,
      "plugin": "flake8_2020",
      "version": "1.6.0"
    },
    {
      "is_local": false,
      "plugin": "mccabe",
      "version": "0.6.1"
    },
    {
      "is_local": false,
      "plugin": "pycodestyle",
      "version": "2.6.0"
    },
    {
      "is_local": false,
      "plugin": "pyflakes",
      "version": "2.2.0"
    }
  ],
  "version": "3.8.2"
}

Please describe the problem or feature

I noticed that Flake8 takes the same time to run with --select as without. As shown using -vv verbosity, it runs all the plugins and checks regardless of --select, and only reports the selected ones afterwards.

Flake8 can sometimes take a long time to run on large codebases, and if it was possible to only run the selected checks, that would save a lot of time, CPU and power.

Would it be possible to only run selected checks/plugins? Rather than running them anyway and discarding that work when reporting?


Docs

For reference, my emphasis.

flake8 --help says --select is for which ones to enable:

  --select errors       Comma-separated list of errors and warnings to enable. For example, ``--select=E4,E51,W234``.
                        (Default: ['E', 'F', 'W', 'C90'])

The docs are a bit more explicit:

Specify the list of error codes you wish Flake8 to report.

https://flake8.pycqa.org/en/latest/user/options.html#cmdoption-flake8-select


Example

An example running on the TensorFlow codebase:

$ time flake8
...
flake8  323.91s user 4.31s system 98% cpu 5:32.78 total
$ time flake8 --select YTT
...
flake8 --select YTT  318.62s user 3.80s system 99% cpu 5:25.51 total

Both about the same, around 5m20s.

With an ugly hack (I know this mixes plugin names with error codes, but it's just to get a rough idea, and there's other places to skip too):

diff --git a/src/flake8/checker.py b/src/flake8/checker.py
index d993cb9..9ed986d 100644
--- a/src/flake8/checker.py
+++ b/src/flake8/checker.py
@@ -486,6 +486,8 @@ class FileChecker(object):
             return

         for plugin in self.checks["ast_plugins"]:
+            if plugin["name"] != "YTT":
+                continue
             checker = self.run_check(plugin, tree=ast)
             # If the plugin uses a class, call the run method of it, otherwise
             # the call should return something iterable itself
$ time flake8 --select YTT
flake8 --select YTT  276.90s user 3.17s system 98% cpu 4:43.00 total

About 4m30s, nearly a minute and ~13% faster.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions