Skip to content

Add data retention policy #188

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Feb 9, 2025
Merged

Add data retention policy #188

merged 9 commits into from
Feb 9, 2025

Conversation

asmacdo
Copy link
Member

@asmacdo asmacdo commented Aug 6, 2024

Heres a sketch of a possible data retention policy. Lets iron out what we want here prior to implementation.

Fixes: #182

from Yarik's initial thoughts : #177 (comment)

@asmacdo asmacdo requested a review from yarikoptic August 9, 2024 15:46
Remove unnecessary (and unclosed paren
Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is good for the starting point. After implemented/deployed we will see how it could be improved

- `nwb_cache`
- Yarn Cache
- `__pycache__`
- pip cache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case user is still active -- I think it would be useful to report to the long running users, after reaching some threshold on any of those folders (e.g. 50MB) asking to clean them up.

Copy link
Member

@kabilar kabilar Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @asmacdo, should we add a separate point here about monitoring and reporting the quotas of cache directories for active users?

- large file list
- summarized data retention policy
- Notice number
- request to cleanup
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meanwhile it might be worth creating a simple data record schema to store those records as well so they could be reused by the tools to assemble higher level stats etc.

Co-authored-by: Yaroslav Halchenko <[email protected]>
@dandi dandi locked and limited conversation to collaborators Sep 17, 2024
@kabilar kabilar changed the title Initial commit for data retention policy discussion Add data retention policy Sep 17, 2024
Copy link
Member

@kabilar kabilar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @asmacdo. This is great. A few suggestions are listed above.

@kabilar
Copy link
Member

kabilar commented Oct 14, 2024

Hi @asmacdo, please let me know when this is ready for review. And then we can update the DANDI Terms and Policies as needed.

@kabilar
Copy link
Member

kabilar commented Jan 23, 2025

@asmacdo Continuing the discussion from Slack.

As we work to ephemeral environments and given our current strategy of notifying users monthly, perhaps we should just have a policy that users with data totaling more than 10 GB would get an email notice?

Proposed updated email template:

Hi <github/dandi username>,

The DANDI team is working to reduce our DANDI Hub costs.  A large portion of our costs include data stored on [DANDI Hub](https://hub.dandiarchive.org/) (not DANDI Archive).

There is currently about X GB stored under your user directory on DANDI Hub.

The data storage available on the DANDI Hub is meant for environment management and should not exceed 10GB.  Data files should be uploaded to DANDI Archive.  Please email [email protected] if you need to store more than 10 GB.  We will review each request individually and work with you to find a solution for your compute requirements.

Can you please review your files stored on DANDI Hub, upload any relevant files to your respective Dandisets on DANDI Archive, and delete any unused files on DANDI Hub?

Thank you.

DANDI Team

@kabilar
Copy link
Member

kabilar commented Jan 30, 2025

I will provide some suggestions based on the recent standup meeting where we decided to reset home directories after 45 days of not logging in to JuptyerHub.

Copy link
Member

@kabilar kabilar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @asmacdo, I have simplified the policy based on our team meeting this week. Please review suggestions and once you are ready we will need to update the DANDI Terms and Policies as needed.

asmacdo and others added 2 commits February 4, 2025 10:49
@kabilar
Copy link
Member

kabilar commented Feb 5, 2025

This policy is inline with the General Policies v1.1.0 listed here for DANDI Hub. So we will not need to update the DANDI Terms and Policies.

Copy link
Member

@kabilar kabilar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @asmacdo, just one more suggestion and then we can merge this document.

cc @satra @yarikoptic @bendichter

@bendichter
Copy link
Member

lgtm

@asmacdo asmacdo merged commit f918b93 into main Feb 9, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create data retention policy
4 participants