GitHub - paulkarikari/ETL-pipeline-with-posgresql: ETL pipeline using PostgreSQL for user song log data

ETL PIPELINE WITH POSTGRESQL

INTRODUCTION

The purpose of this project is to create an ETL pipeline for the analysis of user activity of a music streaming app called sparkify. Currently, they don't have an easy way to query their data, which resides in a directory of JSON logs on user activity on the app, as well as a directory with JSON metadata on the songs in their app. This project creates an ETL pipeline to fetch the data and then creates a database with a star schema in which the data is organized and saved for analysis.

DATABASE SCHEMA

The star schema is used for the database with songplays as the fact table and users, songs, artists and time as dimension tables This makes the querying of the data much easier for analysis.

HOW TO RUN THIS PROJECT

Required Python Packages

pandas
psycopg2

NOTE: Consider using a virtual envronment.

FILES

create_tables.py : contains code for dropping and creating database tables.
etl.ipnyb notebook for trying out the ETL process.
etl.py contains code to run the entire ETL process
test.ipynb test the ETL process to make sure the tables are populated
sql_queries.py contains the SQL queries need for this ETL project

STARTUP

Make sure you have PostgreSQL and all the required packages installed on your machine. Edit the create_tables.py file to enter your database details. From the project directory run the following command:

$ python3 create_tables.py
$ python3 etl.py

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
data		data
README.md		README.md
create_tables.py		create_tables.py
etl.ipynb		etl.ipynb
etl.py		etl.py
sql_queries.py		sql_queries.py
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETL PIPELINE WITH POSTGRESQL

INTRODUCTION

DATABASE SCHEMA

HOW TO RUN THIS PROJECT

Required Python Packages

FILES

STARTUP

About

Releases

Packages

Languages

paulkarikari/ETL-pipeline-with-posgresql

Folders and files

Latest commit

History

Repository files navigation

ETL PIPELINE WITH POSTGRESQL

INTRODUCTION

DATABASE SCHEMA

HOW TO RUN THIS PROJECT

Required Python Packages

FILES

STARTUP

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages