BlockchainTracer

Python package to trace sensitive information and process flows on the blockchain.

Leverages the blockchain’s inherent properties —immutability, transparency, availability, and traceability— to record and audit sequential steps in any process. Ideal for applications requiring verifiable records of actions or sensitive data trails.

Save sequential steps of anything.

Multipurpose

Improve reproducibility of Machine Learning models. There is a 'reproducibility crysis'. (Reproducibility and Traceability of ML models is where more focused is this work).
Upload hashes of big data files.
Trace NGO donations.
Improve supply chain traceability.
Save important data of scientific studies.
Proof of authorship. Trace results with an address and a timestamp.
Text.
User-defined applications.

Installation Guide

1. Clone the Repository

git clone https://github.com/francocerino/BlockchainTracer.git
cd BlockchainTracer

2. (Recommended) Create and Activate a Virtual Environment

python3 -m venv blockchain_tracer_env
source blockchain_tracer_env/bin/activate

3. Install the Package

pip install .

Frontend

Run this command in your consele:

npx shadcn@latest add "https://v0.app/chat/b/b_g1kTbNDXhik?token=eyJhbGciOiJkaXIiLCJlbmMiOiJBMjU2R0NNIn0..LPIK7itf1p9wLa7I.F6HOGYSmZvQniRTrCUZMAWm8yRrZP-Yg2F7XY82pPVOJOM3thdiHDJsjuh4.FnnualxaLhk_c6dTlGTWuQ"

Machine Learning traceability and reproducibility

This case has similar ideas to supply chain traceability, but in this case is traceability for a Machine Learning pipeline, where the idea also aims to improve reproducibility through the use of standards developed for ML leveraged with the transparency, persistence, and immutability characteristics that blockchain provides.

Roadmap:

Stage 1

Read saved and related bibliography to clarify the needed things for ML reproducibility.
- A Survey of Data Provenance in e-Science
- Ensuring Trustworthy Neural Network Training via Blockchain
- Towards Enabling Trusted Artificial Intelligence via Blockchain
- BlockFlow: Trust in Scientific Provenance Data
- ProML: A Decentralised Platform for Provenance Management of Machine Learning Software Systems
- Blockchain Based Provenance Sharing of Scientific Workflows
- Improving Reproducibility in Machine Learning Research (2021)
- Reproducibility in Machine Learning-Driven Research (2023)
- Leakage and the reproducibility crisis in machine learning-based science (2023)
- reforms: Reporting Standards for Machine Learning Based Science (2023)
- Traceability for Trustworthy AI: A Review of Models and Tools (2021). Comparison of some existing frameworks for ML reproducibility.
- Reproducibility in PyTorch
- Advancing Research Reproducibility in Machine Learning through Blockchain Technology (2024). Shows a review of works related to ML reproducibility with Blockchain.
- Promoting Distributed Trust in Machine Learning and Computational Simulation via a Blockchain Network
- Blockchain analytics and Artificial Intelligence
- Automatically Tracking Metadata and Provenance of Machine Learning Experiments Comments an approach for scikit-learn Pipelines and other libraries.
- Reproducibility in Machine Learning-based Research: Overview, Barriers and Drivers (2024)
- Model Cards for Model Reporting, Model Cards applied to known models. Each model card could be accompanied with Datasheets, Nutrition Labels, Data Statements, or Factsheets, describing datasets that the model was trained and evaluated on.
- ML Reproducibility Tools and Best Practices
Specifying differentiators of this work. A solution that has simultaneously:
- Traceability of ML models in EVM Blockchains with a Python API. Python is the most used language in ML, and EVM the most used for smart contracts.
- Open source code.
- Following standards of previous studies for ML reproducibility. Is a good idea more focus on narrative for reproducibility?
- Ability to trace other processes in general. But focused in ML reproducibility.
- Trace computer environment where the ML model was trained.
- Use Arweave or IPFS for large data, storing its hash in the EVM blockchain.
Fine-tune the requirements for good reproducibility.
- The NeurIPS 2019 ML reproducibility checklist of Improving Reproducibility in Machine Learning Research.
- JSON data structure with every configuration of the ML pipeline (hardware, environment, preprocesses, hyperparameters, seeds, metrics, package versions, etc).
- Model info sheet of Leakage and the reproducibility crisis in machine learning-based science (2023)
- Standarized enviroment. Leakage and the reproducibility crisis in machine learning-based science (2023)
- Checklist of reforms: Reporting Standards for Machine Learning Based Science (2023).
- Minimal Description Profile: Traceability for Trustworthy AI: A Review of Models and Tools (2021).
- Model Cards for Model Reporting
- MLFlow for data logging. Has an UI to compare models logged and is coded to work with very well known models from sklearn, XGBoost, etc.
Give the user things needed to reproduce models.
Ensure the code is easy to use and works well.
- Python code to facilitate technical people, not necessarily in blockchain.
- Integration with EVM blockchains (the most used and highly decentralized).
- The code must be secure with respect to private key.
- Test code.

Stage 2

Solve what to do with code and binaries.
Integration with IPFS or Arweave for large data.

Stage 3

Frontend for scalability (usable by non-technical persons).
Smart contract to decentralice the code used.
Extend to other public blockchains.
Extend to private blockchains.
Display option to trace data with a new address.
Expand to more RPCs (besides Infura).
Automate model info sheet completion.

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
blockchaintracer-frontend		blockchaintracer-frontend
blockchaintracer		blockchaintracer
demo		demo
modelo_de_datos		modelo_de_datos
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BlockchainTracer

Multipurpose

Installation Guide

1. Clone the Repository

2. (Recommended) Create and Activate a Virtual Environment

3. Install the Package

Frontend

Machine Learning traceability and reproducibility

Roadmap:

Stage 1

Stage 2

Stage 3

About

Uh oh!

Releases

Packages

Languages

License

francocerino/BlockchainTracer

Folders and files

Latest commit

History

Repository files navigation

BlockchainTracer

Multipurpose

Installation Guide

1. Clone the Repository

2. (Recommended) Create and Activate a Virtual Environment

3. Install the Package

Frontend

Machine Learning traceability and reproducibility

Roadmap:

Stage 1

Stage 2

Stage 3

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages