Unlock the Sounds of the Past: Discover Top Music Charts from the Past Decades with my New Interactive Tool
Photo by Adrian Korte on Unsplash

Unlock the Sounds of the Past: Discover Top Music Charts from the Past Decades with my New Interactive Tool

Growing up, music was always an essential part of my life. From my childhood days to college, every milestone in my life is associated with a specific song. However, with time, those songs became harder to find. That’s when I decided to create a dashboard that would bring back those musical memories from my childhood, school, and college.

With just a few clicks, I can now easily access top songs from the past, view their popularity, and even explore different genres. Now, whenever I’m feeling nostalgic or want to relive those happy memories with friends, all I have to do is select a specific year or a range of years and indulge myself in the nostalgia of the good old days. It’s amazing how music has the power to take us back in time and relive those special moments.

This dashboard has not only brought back those memories but also helped me discover new music and artists. I can’t wait to share it with my friends and family, and see what memories it brings back for them too. I hope you do too.

Part One:?Using Python and BigQuery to Extract and Store Data from the Spotify API

Part Two:?Exploring Music Data through SQL Analysis, Visualization, and Dashboard Guide

Key Considerations Before Diving into the Code: Setting Up Your Music Data Project

  1. Define your project goals clearly.
  2. Decide which data you want to collect to achieve these goals (this can always be adjusted later, but initial requirements will set the direction).
  3. Establish your Python environment and choose the data warehouse to store your music data.

After completing the previous steps, the next crucial step is to register for a developer account to gain access to the API documentation and obtain the necessary API keys for authentication and accessing the required API endpoints. The registration process is straightforward and is well-documented. You can refer to?this?guide for a step-by-step explanation.

Identify the data to collect

To ensure we hit the correct API endpoint, it’s important to identify the data we want to collect. For this project, we will be mining data from the past few decades. To better understand and organize the features, I have divided them into three sections:

Track Information:?track name, artist, album, duration, and popularity.

Album Information:?album name, release date, total tracks, and album artwork.

Artist Information:?artist name, artist_popularity, genres, images.

Let’s dive into the code! The search endpoint will be our main focus, but we’ll also use a few other endpoints to fill in missing data such as artist popularity and genre.

Search for tracks on Spotify by year:

def search_spotify_year(query, token, offset = 0, limit=50)
    """
        Args:
            query: A string representing the search query.
            token: A string representing the access token.
            offset: An integer representing the offset.
            limit: An integer representing the limit.

        Returns:
            A list of dictionaries representing the tracks returned by the search query.
    """
    search_url = "https://api.spotify.com/v1/search"

    headers = headers
    params = params

    response = requests.get(search_url, headers = headers, params = params)
    response_data = response.json()

    if 'tracks' in response_data:
        return response_data['tracks']['items']
    else:
        print(f"Error fetching data for offset {offset}: {response_data}")
        return []

        

One of the most satisfying aspects of working on this project has been improving my exception handling skills, which I applied throughout the code.

Next we define a function for retrieving the top 200 songs for a given year from the Spotify API.

def get_top_songs_year(year, token, total = 200)
    """

        Args:
            year: An integer representing the year to retrieve the top songs for.
            token: A string representing the access token.
            total: An integer representing the total number of songs to retrieve.

        Returns:
            A list of dictionaries representing the top songs for the given year.
    """
    tracks = []
    query = f"year:{year}"
    limit = 50

    for offset in range(0, total, limit):
        results = search_spotify_year(query,token,offset,limit)
        tracks.extend(results)

    return tracks[:total]        

this function returns a list of dictionaries for the top 200 songs for the given year. We use the artist ids and album ids from here and use it as an input to hit other endpoints.

   "2023":{
      "album":{},
      "artists":[],
      "disc_number":1,
      "duration_ms":131013,
      "explicit":false,
      "external_ids":{},
      "external_urls":{},
      "href":"https://api.spotify.com/v1/tracks/6AQbmUe0Qwf5PZnt4HmTXv",
      "id":"6AQbmUe0Qwf5PZnt4HmTXv",
      "is_local":false,
      "is_playable":true,
      "name":"Boy's a liar Pt. 2",
      "popularity":98,
      "preview_url":"https://p.scdn.co/mp3-preview/543d8d09a5530a1ab94dd0c6f83fc4ee3e0d7f96?cid=27a49912ef9d4cefb4450983fd28627f",
      "track_number":1,
      "type":"track",
      "uri":"spotify:track:6AQbmUe0Qwf5PZnt4HmTXv"
   }
}        

Given a list of artist IDs, return a dictionary containing the popularity of each artist.

def get_artist_popularity(artist_id)
    """
        Parameters:
        artist_id (list of str): A list of artist IDs.

        Returns:
        dict: A dictionary where the keys are artist IDs and the values are the popularity scores for each artist.
    """
    artist_popularities = {}
    for artist_id in artist_ids:
        # Set up the request to the Artist API
        artist_url = f"https://api.spotify.com/v1/artists/{artist_id}"
        response = requests.get(artist_url, headers= {"Authorization": f"Bearer {get_token()}"})
        if response.status_code == 200:
            artist_data = response.json()
            artist_popularity = artist_data["popularity"]
            artist_popularities[artist_id] = artist_popularity
        else:
            print(f"Error retrieving artist information for artist ID {artist_id}.")
            artist_popularities[artist_id] = None
    return artist_popularities:        

Given an artist ID, returns a dictionary of the artist ID and its corresponding genres from the Spotify API.

def get_artist_genre(artist_id)
    """

        Args:
            artist_id (str): The unique identifier for the artist on Spotify.

        Returns:
            dict: A dictionary where the keys are the input artist ID and the values are lists of genres.
    """
    artist_genres = {}
    for artist_id in artist_ids:
        # Set up the request to the Artist API
        artist_url = f"https://api.spotify.com/v1/artists/{artist_id}"
        response = requests.get(artist_url, headers= {"Authorization": f"Bearer {get_token()}"})
        if response.status_code == 200:
            artist_data = response.json()
            artist_genre = artist_data["genres"]
            artist_genres[artist_id] = artist_genre
        else:
            print(f"Error retrieving artist information for artist ID {artist_id}.")
            artist_genres[artist_id] = None
    return artist_genres:        

With the main function and helper functions in place, we can now extract all the required columns at once.

for year, tracks in top_songs_by_year.items()
    top_songs_by_year[year] = []
    for track in tracks:
        artist_id = track["artists"][0]["id"]
        artist_popularity = artist_popularities[artist_id]
        genres = artist_genres[artist_id]
        track_data = {
            "track_name": track["name"],
            "year": year,
            "artist_name": track["artists"][0]["name"],
            "album_name": track["album"]["name"],
            "duration_ms": track["duration_ms"],
            "song_popularity": track["popularity"],
            "release_date": track["album"]["release_date"],
            "total_tracks_in_album": track["album"]["total_tracks"],
            "album_cover": track["album"]["images"][0],
            "artist_popularity" : artist_popularity,
            "artist_genres" : genres
        }

        top_songs_by_year[year].append(track_data):        

To ingest this data to your BigQuery, you can use the Google Cloud SDK and the BigQuery Python library. Here are the general steps you can follow:

  1. Go to?cloud console?and create a BigQuery project: a dataset and a table to store the data.
  2. Authenticate your Google Cloud SDK using a service account key file.
  3. Use the BigQuery Python library to create a client object.
  4. Load the data from the pandas DataFrame into BigQuery using the?to_gbq?method of the pandas library.

After completing the setup process, create a dataset and table directly from the Python environment, and define the schema for ingesting the data.

schema = 
    bigquery.SchemaField("track_name", "STRING"),
    bigquery.SchemaField("year", "INTEGER"),
    bigquery.SchemaField("artist_name", "STRING"),
    bigquery.SchemaField("album_name", "STRING"),
    bigquery.SchemaField("duration_ms", "INTEGER"),
    bigquery.SchemaField("song_popularity", "INTEGER"),
    bigquery.SchemaField("release_date", "DATE"),
    bigquery.SchemaField("total_tracks_in_album", "INTEGER"),
    bigquery.SchemaField("album_cover", "STRING"),
    bigquery.SchemaField("artist_popularity", "INTEGER"),
    bigquery.SchemaField("artist_genres", "STRING", mode="REPEATED"),
]
table = bigquery.Table(table_ref, schema=schema)
table = bq_client.create_table(table)[        

In this article, we have discussed the process of mining data from the Spotify API and storing it in BigQuery for later use. We have walked through the different steps involved, including setting up the Python environment, obtaining API keys, defining functions for data extraction, and creating a BigQuery table with the appropriate schema.

Stay tuned for the next part of this series, where we will explore how to analyze and visualize the data we’ve extracted from the Spotify API using BigQuery and Tableau. I’ll also provide a guide to help you navigate through the tool.

If you want to see the complete code, check out my?GitHub repository.

Dipam Vasani

Senior Machine Learning Engineer at DocuSign

1 年

Good work :)

回复
Abhijeet Vichare

Amazon | Indiana University | BITS Pilani | Data Science

1 年

Great read!

要查看或添加评论,请登录

Asha Pondicherry的更多文章

  • Boxplots Using Matplotlib

    Boxplots Using Matplotlib

    Boxplots are used to visualize the data distribution and compare the distribution amongst various categorical groups…

  • Few Essential Pandas Functions.

    Few Essential Pandas Functions.

    This summer apart from volunteering in research work with Professor Julio, I also worked on a few Kaggle datasets which…

    2 条评论

社区洞察

其他会员也浏览了