Skip to content

iurii-chernigin/audio-streaming-data-platform

Repository files navigation

Data Platform for an Audio Streaming service

This data platform is design to solve the following problems:

  • Business Intelligence: prepared data will allow analytics to assess the state of the business at various levels.
  • [plans] Proactive actions based on user activity: architechture should allow making offers to users in the app based on real-time data.

Table of Contents

  1. Architecture Design
  2. Data Generation
  3. Data Ingestion
  4. Data Warehouse
  5. Data Visualization (BI)

Architecture Design

plot

Requirements & SLA:

  • Data must be in a centralized repository;
  • Events should be written to the storage with a delay of no more than one minute;
  • Data model must reflect business processes;
  • Data must be clean and reliable;
  • Data must be accessible 24/7;
  • Data must be documented.

Data Generation

The project is based on events generated by https://github.com/viirya/eventsim. This events reflect user behaviour in a fake music web site (like Spotify).
Launch instructions are here: data-generation/readme

Data Ingestion

Kafka Deployment

Kafka is used to store events before they are sent to the data warehouse.

There are two options:

#terraform #kubernetes #docker #kafka

Kafka Consumers

Custom Java consumers are used to consume and send events from Kafka topics to Data Warehouse tables.

Link to Java application implementation: audio-streaming-java-consumer

#java #kafka-consumers

Data Warehouse

Data Warehouse is built on BigQuery.

Documentation of the data model (including tables specification & optimizations): audio-streaming-data-platform/data-warehouse

There are three main data layers:

  • Raw - raw data ingested from Kafka;
  • Core - cleaned and normalized data according to Data Vault 2.0;
  • Data Marts - wide tables that are easy to analyze and create reports & dashboards. This is the main entry point into the data for data analysts & scientists.

To transform the data, dbt with the dbtvault library is used: audio-streaming-dbt-datavault

#bigquery #dbt #dbtvault #data-vault

BI

Looker is used to create reports & dashboards. The dashboard in the picture below is available at the link: https://lookerstudio.google.com/s/iWa4oRy9nc4

plot

#looker

Usefull links

Plans

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published