The entrypoint for learning about Blockchain Data Standards working group, manifesto, and relevant links.
- Telegram Group: http://t.me/blockchain_data_standards
This manifest is in draft mode, meaning the inital idea is being put in place and comments and suggestions are welcome as nothing is yet committed.
- Ozgur (github/ozgrakkurt) - Steel Cake
- Storm (github/sslivkoff) - Paradigm
- Mikko Ohtamaa (x/moo9000) - Trading Protocol
- Swanny (x/swanny14) - Swanware
- Rick Dudley (github/afdudley) - Vulcanize
- Jason Smythe (x/JasoonSmythe) - Envio
- Aram (github/aramalipoor) - eRPC
- Kasra (github/kasrakhosravi) - eRPC
- Parithosh Jayanthi (github/parithosh) - Ethereum Foundation
- Matt Stam (github/mattstam) - Succinct Labs
- Yule Andrade (github/yulesa)
- Shoham (github/shohamc1) - Erigon
- fucory.eth (x/roninjin10) - Tevm
- Sam Bacha (github/sambacha) - Manifold
- Adam Fuller (azfuller.com) - The Graph
- Karel Balogh (x/karelxfi) - SQD
- Wesley Blake (github/wesleycharlesblake) - Chronicles
- Keri (github/kclowes) - Ethereum / web3.py
- Preston Van Loon (github/prestonvanloon) - Prysm
- ...
If you contribute to the conversations, manifesto, catalog, or libraries feel free to create a PR add your name and personal bio link
This working group aims to unify blockchain data efforts by creating common standards (e.g. schemas) and tools.
At the moment, the focus is:
- 3️⃣ Schema / Structure / Semantics is the main focus of the group, which is to create a common taxonomy for blockchain "raw data" so that providers of data, and consumers of data have lower barriers to entry.
- 4️⃣ Querying / Filtering efforts are focused on common use-cases and access patterns based on real-world production applications. The BDS working group mainly provides best-practice conventions and specs but will NOT get into implementation details.
- 5️⃣ Transport is how the data is technically delivered to consumers. The BDS working group will only offer recommendations and potentially useful libraries and tools (e.g. a translation tool between a gRPC data provider and a Subgraph which requires JSON-RPC)
- BDS - Short for Blockchain Data Standards
- Data Provider - A company or open-source tool responsible for extracting raw blockchain data, enriching, massaging and post-processing to comply with the BDS specs.
- Raw Data Indexing - Process of extracting raw blockchain data (e.g. from underyling node leveldb, or via json-rpc) and normalizing it according to the BDS specs.
- Data Consumer - A consumer is interested in normalized blockchain data according to the BDS specs, with the purpose of building products on top of it.
- Business Data Indexing - Reading and processing of an already normalized BDS-compatible data to create high-level product-specific models (e.g. Subgraph or Ponder indexers, your custom data pipeline, etc).
Current state of blockchain data is fragmented and silod on one hand (with so many data providers each their own schema and semantics), and on the other hand, solutions like EVM JSON-RPC is not satisfying the current needs especially at higher production scale.
Here are some of the challenges and problems that the BDS working group aims to address:
- JSON-RPC is plain-text and usually over HTTP which means:
- High bandwidth usage for providers and consumers (vs a compact binary solution)
- High resources consumption (Memory/CPU) for serializing and deserializing the data
- RPC's standard JSON-RPC lacks advanced querying or filtering capabilities (e.g. all tokens of a wallet, transfers history etc).
- High-throughput use-cases such as indexing full blockchain history is very costly and time-consuming via RPC nodes (vs streaming columnar binary data solutions such as Apache Arrow, or simple Parquet files).
- Every 3rd-party provider or open-source tool has their own special flavor of Data Schema and Transport solution which, fragments the ecosystem. The only commonly adopted protocol is JSON-RPC, which has a lot of limitations, as described above.
- Node and clients are better suited to focus on core logic and consensus of the chain, and generate and deliver raw data as fast as possible to a "Data Provider" vs current JSON-RPC solution which is not scalable.
- Commonly agreed-upon standards (e.g. schemas, semantics) will create a robust and resilient ecosystem, and gives data consumers optionality between providers and many different chains, and gives providers lower barrier-to-entry in the market.
- The BDS working group will not focus on underlying database or storage technologies.
- The BDS working group will not have an opinion on querying languages or engines, and will be left to the data providers to decide.
(TODO discuss and modify the above in the community call)
NOTE: This is hypothetical example. TBD during actual BDS working group effort and discussions.
- EVM - Ethereum Virtual Machine based blockchains, whether L1 or L2s.
- Schemas - Common structure for Blocks, Transactions, Events & Logs, etc.
- Blocks
- Transactions
- Events
- ...TODO add actual schemas
- Open-source - Tools to work with EVM data that comply with BDS v1.0 standards.
- cryo - easiest way to extract blockchain data to parquet, csv, json, or python dataframes
- erpc - RPC proxy and load-balancer compatible with BDS v1.0 data schemas
- evm-jsonrpc-2-bds - (hypothetical example) translate JSON-RPC to BDS v1.0
- bds-2-subgraph - (hypothetical example) exposes BDS v1.0 compatible data sources to Subgraphs
- dune-2-bds - (hypothetical example) translate Dune API to BDS v1.0
- bds-arrow-go - (hypothetical example) Golang Apache Arrow definitions for BDS v1.0
- bds-arrow-typescript - (hypothetical example) TypeScript Apache Arrow definitions for BDS v1.0
- ...TODO add actual tools
- 3rd party - Proprietary providers of EVM data that comply with BDS v1.0 standards.
- Subsquid
- HyperSync
- Goldsky
- ...TODO add actual providers
- Schemas - Common structure for Blocks, Transactions, Events & Logs, etc.
TODO schemas to be added to the catalog based on existing work of top data providers
^ How the blockchain data ecosystem will look like with BDS adoption.
Here the list of relevant/discussed providers, projects and tools working on blockchain data. Not all of them are contributing to or follow the BDS:
- Dune Data Catalog
- Subsquid Archive Nodes
- Substreams Firehose
- Goldsky Mirror
- Envio Hypersync
- cryo datasets
- Primo Data Directory
- Feel free to add more to this list
Feel free to create a new Github Issue to discuss ideas and suggestions, examples of good topics:
- "Add
blockTimestampto EVM.Schema.Event" - "Require
chainIdon the EVM.Query.Transaction" - "Introduce a new schema for traces under EVM.Schema.Trace"
- "Add Solana architecture to BDS catalog"
- "For each BDS schema and field add which provider/tool offers such data"
- ...
Before creating a new PR, make sure your changes are properly discussed and generally accepted by the community by creating a new issue as described above.
When enough consensus is reached, create a new PR to add or modify definitions and standards.