Skip to content

Experience a comprehensive exploration of Spotify's musical landscape seamlessly transitioned from Tableau visualizations to SQL analysis. Dive into track inventory, streaming metrics, and sonic trends via interactive dashboards, while leveraging SQL queries for deeper insights into KPIs and cross-platform rankings.

License

Notifications You must be signed in to change notification settings

virajbhutada/spotify-track-analysis-and-recommendation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Screenshot 2024-06-11 173647

Spotify Data Analysis and Music Recommendation System 🎵📊

Gains comprehensive analysis of Spotify data using Machine-Learning, Python, SQL, and Tableau, along with a machine learning-based music recommendation system. It provides deep insights into Spotify's musical landscape, visualized through Tableau, analyzed with SQL, and enhanced with a music recommendation model.


Table of Contents


Overview

This project provides detailed insights into Spotify data through interactive visualizations created using Tableau and analytical queries executed with SQL. By analyzing various metrics and trends, users can gain valuable insights into the musical landscape, track popularity, and streaming behavior.


Project Details

  • Utilized the Onyx Data DataDNA Dataset Challenge - Spotify Most Streamed Songs 2023 Dataset - October 2023 to meticulously analyze and visualize Spotify's musical landscape, providing a deep dive into the most streamed songs on the platform in 2023.

  • Employed a combination of descriptive statistics, exploratory data analysis techniques, and SQL queries to extract meaningful insights from the dataset, uncovering nuanced patterns and trends within Spotify's vast catalog of music.

  • Leveraged Tableau as the primary data visualization tool to create interactive dashboards that offer comprehensive insights into various aspects of Spotify's musical ecosystem, including track popularity, streaming behavior, and audio attributes.

  • Explored diverse dimensions of the dataset, such as artist presence, track inventory, stream metrics, and sonic landscape, to provide a holistic understanding of Spotify's music trends and user preferences.

  • Designed visually engaging dashboards with intuitive navigation features to present findings in a clear and accessible manner, enabling stakeholders to easily interpret complex data and make informed decisions.

  • Demonstrated proficiency in data analysis and visualization by creating compelling visualizations that not only showcase key metrics but also tell a coherent story about the evolving dynamics of the music industry.

  • Provided actionable insights into track popularity dynamics, streaming trends across different time periods and geographical regions, and the impact of audio attributes such as tempo and acoustic profiles on user engagement.

  • Integrated SQL queries to perform additional analysis, including data aggregation, filtering, and joining, augmenting the insights gained from Tableau visualizations with deeper data exploration and manipulation capabilities.

  • Developed a music recommendation system using machine learning techniques to suggest songs based on user input.

  • Contributed to advancing knowledge in the field of music analytics by applying rigorous analytical techniques to a real-world dataset, thereby enabling stakeholders in the music industry to gain valuable insights and make strategic decisions based on data-driven evidence.


Tableau Data Vizzes

Analytical Dashboard Preview Description
Holistic Insights Holistic Insights The Holistic Insights dashboard provides a comprehensive overview of musical trends and metrics, including KPIs such as artistic presence, track inventory, and stream metrics.
Sonic Overview Sonic Overview The Sonic Overview dashboard offers detailed analysis of the sonic landscape, including minor and major stream modality distribution, cross-platform song rankings, and scatter plot charts.
Audio Analytics Showcase Audio Analytics Showcase The Audio Analytics Showcase dashboard features bar charts illustrating key stream dominance, BPM elite selection, and acoustic profile selection, along with temporal trends of added tracks.
Stream Metrics Stream Metrics The Stream Metrics dashboard provides in-depth analysis of streaming metrics, including total streams, average BPM analysis, peak streamed tracks, leading streamed artists, and playlist curated track stream analysis.

SQL Analysis

Data Exploration

  • Conducted exploratory data analysis using SQL queries to gain deeper insights into the Spotify Dataset.
  • Analyzed track popularity dynamics, streaming trends, and audio attributes through SQL-based data manipulation and aggregation techniques.

Schema Diagram

Spotify_Schema

The schema diagram provides a visual representation of the database structure, illustrating the relationships between different entities and attributes within the Spotify dataset.


ML and Python Analysis

Spotify Music Recommendation System

This project presents a Music Recommendation System built using the Spotify dataset. The system leverages advanced data analysis and machine-learning techniques to recommend songs based on user preferences.

Features

  • Data Extraction: Uses Spotify to fetch song data from the Spotify Web API.
  • Exploratory Data Analysis (EDA): Identifies key features and patterns in the Spotify dataset.
  • Feature Engineering: Selects relevant features to build an accurate recommendation model.
  • Recommendation System: Recommends songs based on user-input songs using cosine similarity.

EDA and Data Visualization

This project involves in-depth exploration and visualization of Spotify's dataset, using Python and statistical methods to derive insights.

  • Objective: Utilize Pandas for data manipulation, NumPy for computations, Matplotlib for detailed visualization, and Seaborn for aesthetic enhancements to uncover insights from Spotify's music catalog.

  • Key Analyses:

    • Popularity Analysis: Identify top and least popular songs, examining user preferences.
    • Correlation Studies: Use heatmaps to explore relationships between audio features like loudness, energy, and acousticness.
    • Regression Analysis: Investigate correlations among specific attributes to understand their impact on song popularity.
  • Temporal Trends: Visualize song distribution since 1992, analyze changes in song duration over time, and track duration variations across different genres.

  • Genre Dynamics: Highlight top genres by popularity, offering insights into global music consumption trends and evolutionary patterns in genre preferences.


Key Insights

Top 10 Most Popular Songs Top Genres by Popularity
Screenshot 2024-06-26 131132 Screenshot 2024-06-26 131222
Visualizes the top 10 most popular songs on Spotify based on their popularity score. Highlights the most popular genres on Spotify, giving an overview of listener preferences based on genre popularity metrics.
Correlation Heatmap Loudness vs. Energy
Screenshot 2024-06-26 131141 Screenshot 2024-06-26 131148
Illustrates the correlation between different audio features and song popularity, aiding in identifying relationships. Examines how loudness and energy relate to each other in songs, highlighting their influence on musical characteristics.
Popularity vs. Acousticness Total Songs Since 1992
Screenshot 2024-06-26 131154 Screenshot 2024-06-26 131207
Analyzes how a song's popularity correlates with its acousticness, providing insights into listener preferences. Visualizes the growth in the number of songs added to Spotify since 1992, showing trends in music production and streaming.
Change in Duration Duration by Genre
Screenshot 2024-06-26 131200 Screenshot 2024-06-26 131216
Shows how the average duration of songs has changed over time, reflecting shifts in music consumption and production trends. Compares the duration of songs across different genres, offering insights into genre-specific trends and listener preferences.

Steps to Build the Recommendation System

  1. Data Extraction:

    • Fetch detailed song information using Spotify's Web API. This involves accessing attributes such as track name, artist, duration, and audio features.
    • Example:
      sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=os.environ["SPOTIFY_CLIENT_ID"],
                                                                 client_secret=os.environ["SPOTIFY_CLIENT_SECRET"]))
  2. Feature Engineering:

    • Select essential features for recommendation, including danceability, energy, key, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence, tempo, duration, and popularity.
    • These features are crucial for understanding the musical characteristics that drive user preferences and recommendations.
  3. Model Training and Recommendation:

    • Utilize machine learning techniques, such as cosine similarity, to compare user-selected songs with the dataset and recommend similar tracks.
    • Example:
      user_selected_songs = [
          {'name': 'Come As You Are', 'year': 1991},
          {'name': 'Smells Like Teen Spirit', 'year': 1991},
          {'name': 'Lithium', 'year': 1992}
      ]
      
      recommended_songs = recommend_songs(user_selected_songs, all_songs_df)
      print("Recommended Songs:")
      for song in recommended_songs:
          print(f"- {song[0]} ({song[1]})")
  4. Model Evaluation:

    • Assess the recommendation system's performance using metrics such as precision, recall, and F1-score.
    • These metrics gauge how accurately the system predicts user preferences and similarity between songs.
  5. Deployment:

    • Deploy the recommendation system as a web application using frameworks like Flask or Django.
    • This step involves integrating the recommendation model into a user-friendly interface accessible via web browsers or mobile apps.
    • Example:
      @app.route('/recommend', methods=['POST'])
      def recommend():
          # Recommendation logic here
          return jsonify(recommended_songs)

Installation

To set up and run the Spotify data analysis and recommendation system on your local machine, follow these steps:

  1. Clone the repository:

    git clone https://github.com/yourusername/spotify-data-analysis-and-recommendation.git
    cd spotify-data-analysis-and-recommendation
  2. Install the required libraries:

    pip install -r requirements.txt
  3. Set up Spotify API credentials:

    • Create an app on the Spotify Developer's page.
    • Save your Client ID and Secret Key.
    • Set the environment variables:
      export SPOTIFY_CLIENT_ID='your_client_id'
      export SPOTIFY_CLIENT_SECRET='your_client_secret'

Usage

To use the recommendation system, follow these steps:

  1. Import necessary libraries:

    import spotipy
    from spotipy.oauth2 import SpotifyClientCredentials
    import pandas as pd
    import numpy as np
    from collections import defaultdict
    from sklearn.metrics import euclidean_distances
    from scipy.spatial.distance import cdist
    import difflib
    import os
  2. Authenticate and initialize Spotipy:

    sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=os.environ["SPOTIFY_CLIENT_ID"],
                                                               client_secret=os.environ["SPOTIFY_CLIENT_SECRET"]))
  3. Define functions to fetch song data and calculate recommendations (full code provided in the repository).

  4. Get song recommendations:

    recommended_songs = recommend_songs([
        {'name': 'Come As You Are', 'year': 1991},
        {'name': 'Smells Like Teen Spirit', 'year': 1991},
        {'name': 'Lithium', 'year': 1992},
        {'name': 'All Apologies', 'year': 1993},
    ])

Data Source

  • The dataset utilized in this project originates from the Onyx DataDNA Spotify Most Streamed Songs 2023 Challenge. This dataset offers invaluable insights into the most streamed songs on Spotify during 2023.

  • Key Dataset Information:

    • The dataset comprises 21 information-rich columns meticulously cleaned and primed for analysis, encompassing track metadata, streaming metrics, and audio attributes. This careful curation ensures accuracy and reliability, facilitating robust analysis and informed decision-making processes.
    • Featuring approximately 181K records, this dataset offers a comprehensive perspective of Spotify's musical landscape. It enables detailed examination of track popularity dynamics, streaming trends, and user preferences, empowering in-depth analysis for actionable insights.
  • Kaggle Spotify Datasets:


Analytical Approach

This project adopts a rigorous analytical approach combining descriptive and exploratory data analysis techniques, alongside SQL queries, to uncover intricate patterns and insights within the Spotify dataset. An array of visualization tools, including bar charts, scatter plots, and tables, are employed to present findings in a coherent and comprehensible manner.

Explore Further

For more visualizations and projects, visit my Tableau Public profile. Discover deeper insights into data analysis and visualization techniques through interactive dashboards and engaging storytelling.


Technologies Used

Python Statistics

Data Visualization Libraries:

Pandas NumPy Matplotlib Seaborn

Machine Learning Libraries:

Scikit-Learn TensorFlow Keras

Other Tools:

Spotipy Jupyter Notebook


Conclusion

In conclusion, the Spotify Data Insights project has provided a comprehensive exploration of Spotify's musical ecosystem through the combined use of Tableau for visualization and SQL for analysis. Throughout this project, we have:

  • Conducted detailed analysis and visualization of Spotify's most streamed songs in 2023, offering valuable insights into track popularity dynamics, streaming trends, and audio attributes.
  • Leveraged Tableau's interactive dashboards to present complex data in a clear and accessible manner, facilitating easy interpretation and decision-making for stakeholders in the music industry.
  • Employed SQL queries to perform additional data exploration and manipulation, enhancing the depth of analysis and uncovering nuanced patterns within the Spotify dataset.
  • Contributed to advancing knowledge in the field of music analytics by applying rigorous analytical techniques to a real-world dataset, enabling stakeholders to make data-driven decisions and strategic planning.
  • Demonstrated proficiency in data analysis, visualization, and storytelling, showcasing the evolving dynamics of the music industry and highlighting opportunities for innovation and growth.

Recommendations

As we move forward, here are some recommendations for further exploration and public awareness:

  • Continuously update and expand the dataset to capture evolving trends and dynamics within the music industry, enabling stakeholders to stay informed and adapt to changing consumer preferences.
  • Explore collaborative opportunities with other data-driven platforms and industries to leverage synergies and uncover new insights that drive innovation and growth.
  • Engage with the broader community through knowledge sharing and collaboration, fostering a culture of transparency and openness that encourages collective learning and development.
  • Advocate for data-driven decision-making and evidence-based strategies within the music industry, promoting a culture of innovation and experimentation that drives sustainable growth and success.

Connect with Me

LinkedIn Tableau Public Spotify


Repository Navigation

Clone Push Pull Issues


License License: MIT

This project is licensed under the MIT License. See the LICENSE file for details.

About

Experience a comprehensive exploration of Spotify's musical landscape seamlessly transitioned from Tableau visualizations to SQL analysis. Dive into track inventory, streaming metrics, and sonic trends via interactive dashboards, while leveraging SQL queries for deeper insights into KPIs and cross-platform rankings.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published