Skip to content

address_appearances produces errors for 99% of blocks with Erigon 3 in archive mode. #224

@hut8

Description

@hut8

Version

Version 0.3.2-37-g559b654. Commit 559b65455d7ef6b03e8e9e96a0e50fd4fe8a9c86 (current main).

Platform

Linux [server name] 5.15.0-130-generic #140-Ubuntu SMP Wed Dec 18 17:59:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Description

I'm interested in collecting address_appearances. I have a sync'd Erigon 3 node that uses --prune.mode archive. I believe I have all of the JSON RPC APIs enabled: --http.api "web3,eth,erigon,trace,ots,net,debug,txpool"

I ran: cryo address_appearances --rpc http://127.0.0.1:8545 --verbose and expected that I would either get all of the parquet files that the documentation says would be created, or an error output.

Instead, I get this output, but only after it finishes:

cryo parameters                                                                                                                                                                                                                             
───────────────                                                                                                                                                                                                                             
- version: 0.3.2-37-g559b654                                                                                                                                                                                                                
- data:                                                                                                                                                                                                                                     
    - datatypes: address_appearances                                                                                                                                                                                                        
    - blocks: n=21,876,375 min=0 max=21,876,374 align=no reorg_buffer=0                                                                                                                                                                     
    - exclude failed items: false                                                                                                                                                                                                           
- source:                                                                                                                                                                                                                                   
    - network: ethereum                                                                                                                                                                                                                     
    - rpc url: http://127.0.0.1:8545                                                                                                                                                                                                        
    - max requests per second: unlimited                                                                                                                                                                                                    
    - max concurrent requests: unlimited                                                                                                                                                                                                    
    - max concurrent chunks: 4                                                                                                                                                                                                              
    - max retries: 5                                                                                                                                                                                                                        
    - initial backoff: 500                                                                                                                                                                                                                  
- output:                                                                                                                                                                                                                                   
    - chunk size: 1,000                                                                                                                                                                                                                     
    - chunks to collect: 21,778 / 21,877                                                                                                                                                                                                    
    - output format: parquet                                                                                                                                                                                                                
    - output dir: /home/liam/data                                                                                                                                                                                                           
    - report file: $OUTPUT_DIR/.cryo/reports/2025-02-20_06-01-13.575391.json                                                                                                                                                                
                                                                                                                                                                                                                                            
     
schema for address_appearances                                                                                                                                                                                                              
──────────────────────────────                                                                                                                                                                                                              
- block_number: uint32                                                                                                                                                                                                                      
- transaction_hash: binary                                                                                                                                                                                                                  
- address: binary                                                                                                                                                                                                                           
- relationship: string                                                                                                                                                                                                                      
- chain_id: uint64                                                                                                                                                                                                                          
                                                                                                                                                                                                                                            
sorting address_appearances by: block_number, transaction_hash, address, relationship                                                                                                                                                       
                                                                                                                                                                                                                                            
other available columns: block_hash                                                                                                                                                                                                         
                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                            
collecting data                                                                                                                                                                                                                             
───────────────                                                                                                                                                                                                                             
started at 2025-02-20 06:01:13.575                                                                                                                                                                                                          
   done at 2025-02-20 06:18:06.368                                                                                                                                                                                                          
                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                            
error summary                                                                                                                                                                                                                               
─────────────                                                                                                                                                                                                                               
(errors in 21778 chunks)                                                                                                                                                                                                                    
- Failed to get block: deserialization error: missing field `creationMethod` at line 1 column 28929 (1x)                                                                                                                                    
- Failed to get block: deserialization error: missing field `creationMethod` at line 1 column 4252 (1x)                                                                                                                                     
- Failed to get block: deserialization error: missing field `creationMethod` at line 1 column 10193 (1x)                                                                                                                                    
- Failed to get block: deserialization error: missing field `creationMethod` at line 1 column 16743 (1x)                                                                                                                                    
- Failed to get block: deserialization error: missing field `creationMethod` at line 1 column 3210 (1x)                                                                                                                                     
- Failed to get block: deserialization error: missing field `creationMethod` at line 1 column 3011 (1x)                                                                                                                                     
- Failed to get block: deserialization error: missing field `creationMethod` at line 1 column 6449 (1x)                                                                                                                                     
- Failed to get block: deserialization error: missing field `creationMethod` at line 1 column 1925 (4x)                                                                                                                                     
- Failed to get block: deserialization error: missing field `creationMethod` at line 1 column 4948 (1x)                                                                                                                                     
- Failed to get block: deserialization error: missing field `creationMethod` at line 1 column 6427 (1x)                                                                                                                                     
...                                                                                                                                                                                                                                         


collection summary                                         
──────────────────                                         
- total duration: 1012.793 seconds                         
- total chunks: 21,877                                                                                                                                                                                                                      
    - chunks errored:   21,778 / 21,877 (99.0%)                                                                                                                                                                                             
    - chunks skipped:       99 / 21,877 (0.0%)                                                                                                                                                                                              
    - chunks collected:      0 / 21,877 (0.0%)                                                                                                                                                                                              
- blocks collected: 0                                                                                                                                                                                                                       
    - blocks per second: 0.0                                                                                                                                                                                                                
    - blocks per minute: 0.0                                                                                                                                                                                                                
    - blocks per hour:   0.0                                                                                                                                                                                                                
    - blocks per day:    0.0                                                                                                                                                                                                                
- rows written: 0

This error: Failed to get block: deserialization error: missing field creationMethod at line 1 column 6427 seems to suggest that maybe Erigon isn't including this necessary data. So my problem could be better broken down if I had answers to a few questions:

  1. Is cryo tested against Erigon? If not, which JSON-RPC servers is it tested against?
  2. https://github.com/paradigmxyz/cryo?tab=readme-ov-file#json-rpc shows which methods are used, but not for all data sets. It seems to me like, from crates/freeze/src/datasets/address_appearances.rs, this uses eth_getBlockByNumber (second argument set to false to get just the hashes of transactions), eth_getLogs, eth_getTransactionByHash, eth_getTransactionReceipt, and trace_transaction. Is that right? I'll try to update the README in a PR.

When I collect the dataset traces, it only succeeds on the first 99 chunks then fails on everyone thereafter, so I think this is some compatibility issue in trace_block. I will update this shortly with what I find.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions