Finnhub Realtime Pipeline

The project involves building a data pipeline that streams real-time trading data from the Finnhub.io using websockets. This data is then processed using Spark for analysis and realtime dashboard.

Architecture

All applications are containerized into Docker containers

Data source layer - Finnhub.io offering real-time trades for US stocks, forex and crypto using websockets
Data ingestion layer - A containerized Python application named producer connects to the Finnhub.io websocket. It encodes retrieved messages into Avro format, as specified in the schemas/trades.avsc file, and ingests messages into the Kafka broker.
Message broker layer - Messages from producer are consumed by the Kafka broker located in the broker container.
Stream processing layer - A Spark cluster consisting of one master and two worker node is set up. A PySpark application named processor is submitted to the Spark cluster manager, which delegates a worker for it. This application connects to the Kafka broker to retrieve messages, transforms them using Spark Structured Streaming, and loads them into Cassandra tables. The first query, which transforms trades into a feasible format, runs continuously, whereas the second query, involving aggregations, has a 5-second trigger.
Data serving layer - A Cassandra database stores and persists data from Spark jobs. Upon launching, the cassandra-setup.cql script runs to create the keyspace and tables.
Visualization layer - Grafana connects to the Cassandra database using HadesArchitect-Cassandra-Plugin and serves visualized data to users, exemplified in the Finnhub Sample BTC Dashboard. The dashboard refreshes every 500ms.

Dashboard

A live dashboard that constantly updates to display trade prices every 500 milliseconds.

Setup & deployment

Obtain an API Key: Register at Finnhub to generate your API key

Configure Environment Variables:

Locate and rename the [sample.env](sample.env) file to .env
Update the .env file with the generated API key and any necessary configurations

FINNHUB_API_TOKEN=abc123xyz
FINNHUB_STOCKS_TICKERS=["AAPL", "AMZN", "MSFT", "GOOG", "NVDA", "META", "TSLA", "NFLX",  "BINANCE:BTCUSDT", "IC MARKETS:1"]
FINNHUB_VALIDATE_TICKERS=1
KAFKA_SERVER=broker
KAFKA_PORT=29092
KAFKA_TOPIC_NAME=market
CASSANDRA_HOST=cassandra
CASSANDRA_PORT=9042
CASSANDRA_USERNAME=cassandra
CASSANDRA_PASSWORD=cassandra

Start the Service: Launch the services
```
make run
```
View Producer Logs: Check the logs of the producer container to monitor activity
Execute Spark Job: Submit the Spark job
```
make spark-job
```
Access Visualization: Open your browser and go to localhost:3003 to view the visualization. Enter admin for username and password.
Shutdown Containers: Stop all running containers
```
make down
```

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
cassandra		cassandra
grafana		grafana
images		images
processor		processor
producer		producer
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
finnhub.excalidraw		finnhub.excalidraw
sample.env		sample.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Finnhub Realtime Pipeline

Architecture

Dashboard

Setup & deployment

About

Uh oh!

Uh oh!

Languages

License

piyush-an/Finnhub-Realtime-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Finnhub Realtime Pipeline

Architecture

Dashboard

Setup & deployment

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages