This repository contains information about everything related to data preparation that is required before onboarding data to Filecoin. This includes tooling, documentation, and performance benchmarks.
The repository is split into 4 main sections:
- Docs: this section includes documentation explaining how data onboarding to filecoin works, best practices and common pitfalls. It also contains links to available tools in the ecosystem.
- Modules: the different data onboarding steps are encoded as modules (written in python and bash) which could be easily imported and used in any data onboarding pipeline.
- Orchestrators: these are example scripts demonstrating how to import and use the modules from the modules section to orchestrate data onboarding.
- Performance benchmarks: these include performance benchmarks for different available tools.
- banyancomputer/dataprep -- this tool handles encryption, compression, deduping and chunking. The output of this tool could then be carred etc and used for deal making.