Skip to content

Latest commit

 

History

History
75 lines (59 loc) · 3 KB

README.md

File metadata and controls

75 lines (59 loc) · 3 KB

Apache Kafka - Live Flight Data streaming

Technology used: Azure (VM, Blob Storage, Azure Functions, Azure Devops, Azure Data Lake Analytics, Azure Synapse), Kafka, python

Abstract

The goal of the project was to get to know how Apache Kafka streams works. For training purposes, I used an Azure free D2S machine and simulated the stream of the data to avoid memory problems on the Azure instance. Each event in the simulated streaming was created as a sample JSON file from an existing dataset. The streamed data was uploaded to an Azure Blob Storage and analyzed by a Data Factory, which created a table schema in the Azure Data Lake Analytics. That allowed me to perform queries in Azure Synapse Analytics.

Data Flow

Project Architecture

Installation

Pre-requisite:

  • active Azure account subscription

Kafka server setup

Setup Kafka server on Azure VM:

  1. Create a VM instance in azure : I personally used a ubuntu server 24.04 LTS
  2. Start the instance, the azure console should look like this:
  3. Kafka port: create an inbound security rule for port 9092
  4. Connect to the instance in ssh using the private key given at the VM creation
  5. install and start Docker Engine on the instance
make install_docker

  1. Kafka setup

Start the kafka server

make start_kafka

Create the topic: live flight position

make create_topic
  1. Create the storage account and blob storage in azure console

Run the producer

python src/kafka_producer.py

Run the consumer

python src/kafka_consumer.py

Azure Functions

The Goal is to create an azure function that will copy the live data (streamed by kafka into blob storage) every 10 mn into azure data lake gen2

  1. create a new azure function app in azure portal
  2. install azure function extension in your vscode, don't forget to connect to your azure account
  3. install azure cli in your terminal and connect. this installation is needed to add azure application environement variable to connect to blob storage and data lake storage
make install_azure_cli

verify that the installation worked and connnects to your azure account

az login
  1. Use the azure cli to add your connection string add your blob storage and data lake connection strings as environement variable for your azure function app
az functionapp config appsettings set --name <FunctionAppName> --resource-group <ResourceGroupName> --settings "BLOB_CONNECTION_STRING=your_blob_connection_string"

Replace with the name of your Function App, with the name of your resource group, and your_blob_connection_string with your actual connection string.