Large files & untracking #103

IAlibay · 2020-10-28T14:48:53Z

Tagging @RMeli who has more experience with this here.

We are at the point where we are starting to track quite substantial amounts of data (i.e. xtc files in the MD tutorials and ~ 100 x 1 MB PDB files in the homology modelling tutorial).

We really need to have a policy in place to make sure we stop tracking files that get removed and also how to deal with limiting the addition of new large files (i.e. encouraging tarballs/zip/etc..).

RMeli · 2020-10-28T15:17:35Z

The issue of large files is actually a solved problem with git lfs but unfortunately I haven't seen easy ways to set it up locally (on GitHub you have to pay for additional space; the free 1GB storage runs out very quickly).

If we can get all files under 100MB we are OK. Otherwise, a good option would be to store all files on Zenodo and have a script to download them locally.

IAlibay · 2020-10-28T18:11:56Z

Maybe at the very least we can apply: https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/removing-sensitive-data-from-a-repository when we go ahead and remove large files? (we can try it out on the xtc in the MD tutorials)

I'm not sure if there's an easy way to see how many large historical files github is storing, any ideas beyond thrawling through git logs?

RMeli · 2020-10-28T21:41:05Z

I thought you were mentioning future files, not current files. I'm not sure how to check for large files that are no longer in the current status.

IAlibay added the maintenance label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large files & untracking #103

Large files & untracking #103

IAlibay commented Oct 28, 2020

RMeli commented Oct 28, 2020

IAlibay commented Oct 28, 2020

RMeli commented Oct 28, 2020

Large files & untracking #103

Large files & untracking #103

Comments

IAlibay commented Oct 28, 2020

RMeli commented Oct 28, 2020

IAlibay commented Oct 28, 2020

RMeli commented Oct 28, 2020