Skip to content

[datasets] Mocking small versions of datasets for unittests #680

Closed
@fg-mindee

Description

@fg-mindee

As docTR grows, the number of supported datasets will increase. We cannot afford to add several minutes to the CI tests for every dataset that we add. So I suggest the following:

  • adding pytest fixture in tests/conftest.py that will create the data files in a temporary folder and return the path to it
  • use this for dataset unittests instead of downloading the subsamples or full dataset

The sole inconvenience I can see is the time to implement, but the advantages are that we won't need internet to run those unittests anymore, the CI will be considerably faster and any developer will be able to read the structure of the dataset file in the unittest.

If we move forward with this, we'll have to do PRs for the following datasets:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions