Skip to content

[juwels] Implement Unified Directory Structure for Public and Private Datasets #435

@wael-mika

Description

@wael-mika

Is your feature request related to a problem? Please describe.

As our project continues to grow, both the number of users and the variety of datasets required for training are increasing significantly. Currently, these datasets are scattered across various directories associated with different projects, causing inefficiencies.

To address this, we propose implementing a unified data access system. This system would require all developers to be members of either hclimrep or weatherai groups (or both), enabling streamlined access to both private and public datasets.

  • Public datasets: These datasets are intended for eventual public release. At a certain stage in the project timeline, they should become openly accessible, allowing external users to experiment with and evaluate our model.

  • Private datasets: These datasets contain restricted data for which we do not possess licenses or permissions to publicly distribute. They are strictly utilized internally for training and model development purposes.

Describe the solution you'd like

Proposed Data Directory Structure

To improve dataset management and accessibility, we propose the creation of two new data directories, public and private, within the following paths:

$WEATHERGEN_DATA/shared/{public,private}
$WEATHERGEN_DATA/shared/{public,private}

Describe alternatives you've considered

No response

Additional context

No response

Organisation

No response

Metadata

Metadata

Assignees

Labels

dataAnything related to the datasets used in the projectinfraIssues related to infrastructureneeds-design

Type

No type

Projects

Status

In Progress

Relationships

None yet

Development

No branches or pull requests

Issue actions