Skip to content

francocerino/BlockchainTracer

Repository files navigation

BlockchainTracer

Python package to trace sensitive information and process flows on the blockchain.

Leverages the blockchain’s inherent properties —immutability, transparency, availability, and traceability— to record and audit sequential steps in any process. Ideal for applications requiring verifiable records of actions or sensitive data trails.

Save sequential steps of anything.

Multipurpose

  • Improve reproducibility of Machine Learning models. There is a 'reproducibility crysis'. (Reproducibility and Traceability of ML models is where more focused is this work).
  • Upload hashes of big data files.
  • Trace NGO donations.
  • Improve supply chain traceability.
  • Save important data of scientific studies.
  • Proof of authorship. Trace results with an address and a timestamp.
  • Text.
  • User-defined applications.

Installation Guide

1. Clone the Repository

git clone https://github.com/francocerino/BlockchainTracer.git
cd BlockchainTracer

2. (Recommended) Create and Activate a Virtual Environment

python3 -m venv blockchain_tracer_env
source blockchain_tracer_env/bin/activate

3. Install the Package

pip install .

Frontend

Run this command in your consele:

npx shadcn@latest add "https://v0.app/chat/b/b_g1kTbNDXhik?token=eyJhbGciOiJkaXIiLCJlbmMiOiJBMjU2R0NNIn0..LPIK7itf1p9wLa7I.F6HOGYSmZvQniRTrCUZMAWm8yRrZP-Yg2F7XY82pPVOJOM3thdiHDJsjuh4.FnnualxaLhk_c6dTlGTWuQ"

Machine Learning traceability and reproducibility

This case has similar ideas to supply chain traceability, but in this case is traceability for a Machine Learning pipeline, where the idea also aims to improve reproducibility through the use of standards developed for ML leveraged with the transparency, persistence, and immutability characteristics that blockchain provides.

Roadmap:

Stage 1

  1. Read saved and related bibliography to clarify the needed things for ML reproducibility.

  2. Specifying differentiators of this work. A solution that has simultaneously:

    • Traceability of ML models in EVM Blockchains with a Python API. Python is the most used language in ML, and EVM the most used for smart contracts.
    • Open source code.
    • Following standards of previous studies for ML reproducibility. Is a good idea more focus on narrative for reproducibility?
    • Ability to trace other processes in general. But focused in ML reproducibility.
    • Trace computer environment where the ML model was trained.
    • Use Arweave or IPFS for large data, storing its hash in the EVM blockchain.
  3. Fine-tune the requirements for good reproducibility.

  4. Give the user things needed to reproduce models.

  5. Ensure the code is easy to use and works well.

    • Python code to facilitate technical people, not necessarily in blockchain.
    • Integration with EVM blockchains (the most used and highly decentralized).
    • The code must be secure with respect to private key.
    • Test code.

Stage 2

  1. Solve what to do with code and binaries.
  2. Integration with IPFS or Arweave for large data.

Stage 3

  1. Frontend for scalability (usable by non-technical persons).
  2. Smart contract to decentralice the code used.
  3. Extend to other public blockchains.
  4. Extend to private blockchains.
  5. Display option to trace data with a new address.
  6. Expand to more RPCs (besides Infura).
  7. Automate model info sheet completion.

About

A Python package to trace sensible information and processes in blockchain.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published