Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large files & untracking #103

Open
IAlibay opened this issue Oct 28, 2020 · 3 comments
Open

Large files & untracking #103

IAlibay opened this issue Oct 28, 2020 · 3 comments

Comments

@IAlibay
Copy link
Collaborator

IAlibay commented Oct 28, 2020

Tagging @RMeli who has more experience with this here.

We are at the point where we are starting to track quite substantial amounts of data (i.e. xtc files in the MD tutorials and ~ 100 x 1 MB PDB files in the homology modelling tutorial).

We really need to have a policy in place to make sure we stop tracking files that get removed and also how to deal with limiting the addition of new large files (i.e. encouraging tarballs/zip/etc..).

@RMeli
Copy link
Member

RMeli commented Oct 28, 2020

The issue of large files is actually a solved problem with git lfs but unfortunately I haven't seen easy ways to set it up locally (on GitHub you have to pay for additional space; the free 1GB storage runs out very quickly).

If we can get all files under 100MB we are OK. Otherwise, a good option would be to store all files on Zenodo and have a script to download them locally.

@IAlibay
Copy link
Collaborator Author

IAlibay commented Oct 28, 2020

Maybe at the very least we can apply: https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/removing-sensitive-data-from-a-repository when we go ahead and remove large files? (we can try it out on the xtc in the MD tutorials)

I'm not sure if there's an easy way to see how many large historical files github is storing, any ideas beyond thrawling through git logs?

@RMeli
Copy link
Member

RMeli commented Oct 28, 2020

I thought you were mentioning future files, not current files. I'm not sure how to check for large files that are no longer in the current status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants