-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Although we should have the .gitignore file in all our repos, there is still a chance that some large files may be accidentally committed. And sometimes they realize it and remove these files with another commit. So when doing code review, unless we check each single commit to search for large files manually, there is no other way to find out. And once the PR is merged, these files will stay in the repo forever.
So I propose adding a GitHub action that does this task. I used the command below to list all objects in the repo sorted by their sizes.
git rev-list --objects --all \
| git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \
| sed -n 's/^blob //p' \
| sort --numeric-sort --key=2 \
| $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
Example output (last 10 lines):
1a98da8ccb1a1330e717215b970b3fd5601a40fe 12KiB src/cptac_luad/processing/create_metapipeline_dna_input_from_dataset_registry_yaml.py
d283a8677377c57d0247ba29360f9fc8b5b4b02c 13KiB src/cptac_luad/processing/create_metapipeline_dna_input_from_dataset_registry_yaml.py
8e392c90d8731ae09d23f7c292ba7f352cf88911 13KiB src/cptac_luad/processing/create_metapipeline_dna_input_from_dataset_registry_yaml.py
64bdb3bc4a54840130341e05bf20c407545c731f 14KiB src/cptac_luad/processing/create_metapipeline_dna_input_from_dataset_registry_yaml.py
5ca6502471ffae73e76942392973d813599aaad7 14KiB src/cptac_luad/processing/create_metapipeline_dna_input_from_dataset_registry_yaml.py
3365f0b7c3efedfc22555e1ac9351db7762dfb92 14KiB src/cptac_luad/processing/create_metapipeline_dna_input_from_dataset_registry_yaml.py
2e4d1fe710899f1941251178ea67f2fd9a476634 17KiB .pylintrc
59436aec7b1cbe9c208edd5f290c625ebc3792e8 17KiB .pylintrc
d159169d1050894d3ea3b98e1c965c4058208fe1 18KiB LICENSE.md
8e8e72a88324670c5abd321803477f3e419d36a6 18KiB .pylintrc
In this way, we will be able to catch these "hidden" large or "bad" files from the PR branch commit history.
Metadata
Metadata
Assignees
Labels
No labels