This repo includes code for spidering and importing claims, and ALSO for processing new claims entered by users.
./run_pipe.py
runs in crontab of the backend server
We are working on a microservice that the Node server running trust_claim_backend can call as each new claim is added, currently the crontab just updates every 5 min
python code to run separate steps of the pipeline, and later maybe to orchestrate
-
spider and save raw data to be turned into claims
-
clean and normalize the data into an importable format
-
import into signed claims (signed by our spider)
That's all the import data pipeline
Then
- dedupe, parse and decorate claims into nodes and edges
The nodes and edges will be used to feed the front end views
. penv/bin/activate
source .env
python3 ./run_publisher.py