Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve extract performance via ignoring directory early during os.walk #694

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

stkao05
Copy link

@stkao05 stkao05 commented Feb 11, 2020

Currently, the extraction code will do an os.walk to perform a deep file search. However, this file exploration could be very slow when there were directories that were deep and contain many files. Even if you have specified some directories to be ignored in the mapping file, the os.walk would explore these directories.

The PR improves the extract process performance via making sure to skip exploring those ignore directory early during os.walk.

@stkao05
Copy link
Author

stkao05 commented Feb 11, 2020

Real-life scenario I have experience

When you are working with front-end, typically you would have a node_modules directory in your codebase which contains source codes of all 3rd party lib (similar to Python's /site-packages/), and this directory typically is very large.

@codecov-io
Copy link

codecov-io commented Feb 11, 2020

Codecov Report

Merging #694 into master will decrease coverage by 0.03%.
The diff coverage is 77.77%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #694      +/-   ##
==========================================
- Coverage   90.97%   90.94%   -0.04%     
==========================================
  Files          24       24              
  Lines        4176     4184       +8     
==========================================
+ Hits         3799     3805       +6     
- Misses        377      379       +2
Impacted Files Coverage Δ
babel/messages/extract.py 94.38% <77.77%> (-0.56%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0cfa69e...6ada6ee. Read the comment docs.

@stkao05 stkao05 requested a review from akx February 11, 2020 10:40
@stkao05
Copy link
Author

stkao05 commented May 21, 2020

Ping @akx

if dirname.startswith('.') or dirname.startswith('_'):
return False

absdir = os.path.join(root, dirname).replace(os.sep, '/')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the logic for ignoring filenames uses

filename = relpath(filepath, dirpath)

I think this should also use a relative path to the root. Otherwise this might end up ignoring paths that happen to contain an ignored fragment outside the relative root.

That is, if your project lives in /foobars/myproject/, and you've ignored *foobar* (as it has a special meaning within the myproject directory), and you invoke Babel from within /foobars/myproject/, this would ignore all files.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedbacks. Just applied your suggestion in 252323a

@akx
Copy link
Member

akx commented Jan 28, 2022

Hi @stkao05#832 landed today, so this would need to be rebased :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants