Skip to content

[Discussion] Dataset Abstraction #123

@GabrielKP

Description

@GabrielKP

When working with multiple documents, it possibly would be great to include some sort of abstraction for multiple documents forming a Dataset.

The Dataset would then allow:

  • Caching of Datasets after doing transformations on them (which I am currently requiring), e.g. in a Pipelined workflow
  • Saving and loading Datasets efficiently.

@ArneBinder and I were discussing using HuggingFace Datasets as base class, which would bring the hole power of HuggingFace Datasets to pytorch-ie.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions