登录查看更多内容

Unlock the Sounds of the Past: Discover Top Music Charts from the Past Decades with my New Interactive Tool

Asha Pondicherry

Data Analytics and Data Engineering

发布日期: 2023年4月26日

Growing up, music was always an essential part of my life. From my childhood days to college, every milestone in my life is associated with a specific song. However, with time, those songs became harder to find. That’s when I decided to create a dashboard that would bring back those musical memories from my childhood, school, and college.

With just a few clicks, I can now easily access top songs from the past, view their popularity, and even explore different genres. Now, whenever I’m feeling nostalgic or want to relive those happy memories with friends, all I have to do is select a specific year or a range of years and indulge myself in the nostalgia of the good old days. It’s amazing how music has the power to take us back in time and relive those special moments.

This dashboard has not only brought back those memories but also helped me discover new music and artists. I can’t wait to share it with my friends and family, and see what memories it brings back for them too. I hope you do too.

Part One:?Using Python and BigQuery to Extract and Store Data from the Spotify API

Part Two:?Exploring Music Data through SQL Analysis, Visualization, and Dashboard Guide

Key Considerations Before Diving into the Code: Setting Up Your Music Data Project

Define your project goals clearly.
Decide which data you want to collect to achieve these goals (this can always be adjusted later, but initial requirements will set the direction).
Establish your Python environment and choose the data warehouse to store your music data.

After completing the previous steps, the next crucial step is to register for a developer account to gain access to the API documentation and obtain the necessary API keys for authentication and accessing the required API endpoints. The registration process is straightforward and is well-documented. You can refer to?this?guide for a step-by-step explanation.

Identify the data to collect

To ensure we hit the correct API endpoint, it’s important to identify the data we want to collect. For this project, we will be mining data from the past few decades. To better understand and organize the features, I have divided them into three sections:

Track Information:?track name, artist, album, duration, and popularity.

Album Information:?album name, release date, total tracks, and album artwork.

Artist Information:?artist name, artist_popularity, genres, images.

Let’s dive into the code! The search endpoint will be our main focus, but we’ll also use a few other endpoints to fill in missing data such as artist popularity and genre.

Search for tracks on Spotify by year:

def search_spotify_year(query, token, offset = 0, limit=50)
    """
        Args:
            query: A string representing the search query.
            token: A string representing the access token.
            offset: An integer representing the offset.
            limit: An integer representing the limit.

        Returns:
            A list of dictionaries representing the tracks returned by the search query.
    """
    search_url = "https://api.spotify.com/v1/search"

    headers = headers
    params = params

    response = requests.get(search_url, headers = headers, params = params)
    response_data = response.json()

    if 'tracks' in response_data:
        return response_data['tracks']['items']
    else:
        print(f"Error fetching data for offset {offset}: {response_data}")
        return []

One of the most satisfying aspects of working on this project has been improving my exception handling skills, which I applied throughout the code.

领英推荐

Web3 And The Future of Music

Bernard Marr 2 年前

Music Tips and News for Artists - October 2022

Mike Warner 2 年前

The Future of AI-Driven Music Creation is Here

Yehonatan Ben Haim 8 个月前

Next we define a function for retrieving the top 200 songs for a given year from the Spotify API.

def get_top_songs_year(year, token, total = 200)
    """

        Args:
            year: An integer representing the year to retrieve the top songs for.
            token: A string representing the access token.
            total: An integer representing the total number of songs to retrieve.

        Returns:
            A list of dictionaries representing the top songs for the given year.
    """
    tracks = []
    query = f"year:{year}"
    limit = 50

    for offset in range(0, total, limit):
        results = search_spotify_year(query,token,offset,limit)
        tracks.extend(results)

    return tracks[:total]

this function returns a list of dictionaries for the top 200 songs for the given year. We use the artist ids and album ids from here and use it as an input to hit other endpoints.

   "2023":{
      "album":{},
      "artists":[],
      "disc_number":1,
      "duration_ms":131013,
      "explicit":false,
      "external_ids":{},
      "external_urls":{},
      "href":"https://api.spotify.com/v1/tracks/6AQbmUe0Qwf5PZnt4HmTXv",
      "id":"6AQbmUe0Qwf5PZnt4HmTXv",
      "is_local":false,
      "is_playable":true,
      "name":"Boy's a liar Pt. 2",
      "popularity":98,
      "preview_url":"https://p.scdn.co/mp3-preview/543d8d09a5530a1ab94dd0c6f83fc4ee3e0d7f96?cid=27a49912ef9d4cefb4450983fd28627f",
      "track_number":1,
      "type":"track",
      "uri":"spotify:track:6AQbmUe0Qwf5PZnt4HmTXv"
   }
}

Given a list of artist IDs, return a dictionary containing the popularity of each artist.

def get_artist_popularity(artist_id)
    """
        Parameters:
        artist_id (list of str): A list of artist IDs.

        Returns:
        dict: A dictionary where the keys are artist IDs and the values are the popularity scores for each artist.
    """
    artist_popularities = {}
    for artist_id in artist_ids:
        # Set up the request to the Artist API
        artist_url = f"https://api.spotify.com/v1/artists/{artist_id}"
        response = requests.get(artist_url, headers= {"Authorization": f"Bearer {get_token()}"})
        if response.status_code == 200:
            artist_data = response.json()
            artist_popularity = artist_data["popularity"]
            artist_popularities[artist_id] = artist_popularity
        else:
            print(f"Error retrieving artist information for artist ID {artist_id}.")
            artist_popularities[artist_id] = None
    return artist_popularities:

Given an artist ID, returns a dictionary of the artist ID and its corresponding genres from the Spotify API.

def get_artist_genre(artist_id)
    """

        Args:
            artist_id (str): The unique identifier for the artist on Spotify.

        Returns:
            dict: A dictionary where the keys are the input artist ID and the values are lists of genres.
    """
    artist_genres = {}
    for artist_id in artist_ids:
        # Set up the request to the Artist API
        artist_url = f"https://api.spotify.com/v1/artists/{artist_id}"
        response = requests.get(artist_url, headers= {"Authorization": f"Bearer {get_token()}"})
        if response.status_code == 200:
            artist_data = response.json()
            artist_genre = artist_data["genres"]
            artist_genres[artist_id] = artist_genre
        else:
            print(f"Error retrieving artist information for artist ID {artist_id}.")
            artist_genres[artist_id] = None
    return artist_genres:

With the main function and helper functions in place, we can now extract all the required columns at once.

for year, tracks in top_songs_by_year.items()
    top_songs_by_year[year] = []
    for track in tracks:
        artist_id = track["artists"][0]["id"]
        artist_popularity = artist_popularities[artist_id]
        genres = artist_genres[artist_id]
        track_data = {
            "track_name": track["name"],
            "year": year,
            "artist_name": track["artists"][0]["name"],
            "album_name": track["album"]["name"],
            "duration_ms": track["duration_ms"],
            "song_popularity": track["popularity"],
            "release_date": track["album"]["release_date"],
            "total_tracks_in_album": track["album"]["total_tracks"],
            "album_cover": track["album"]["images"][0],
            "artist_popularity" : artist_popularity,
            "artist_genres" : genres
        }

        top_songs_by_year[year].append(track_data):

To ingest this data to your BigQuery, you can use the Google Cloud SDK and the BigQuery Python library. Here are the general steps you can follow:

Go to?cloud console?and create a BigQuery project: a dataset and a table to store the data.
Authenticate your Google Cloud SDK using a service account key file.
Use the BigQuery Python library to create a client object.
Load the data from the pandas DataFrame into BigQuery using the?to_gbq?method of the pandas library.

After completing the setup process, create a dataset and table directly from the Python environment, and define the schema for ingesting the data.

schema = 
    bigquery.SchemaField("track_name", "STRING"),
    bigquery.SchemaField("year", "INTEGER"),
    bigquery.SchemaField("artist_name", "STRING"),
    bigquery.SchemaField("album_name", "STRING"),
    bigquery.SchemaField("duration_ms", "INTEGER"),
    bigquery.SchemaField("song_popularity", "INTEGER"),
    bigquery.SchemaField("release_date", "DATE"),
    bigquery.SchemaField("total_tracks_in_album", "INTEGER"),
    bigquery.SchemaField("album_cover", "STRING"),
    bigquery.SchemaField("artist_popularity", "INTEGER"),
    bigquery.SchemaField("artist_genres", "STRING", mode="REPEATED"),
]
table = bigquery.Table(table_ref, schema=schema)
table = bq_client.create_table(table)[

In this article, we have discussed the process of mining data from the Spotify API and storing it in BigQuery for later use. We have walked through the different steps involved, including setting up the Python environment, obtaining API keys, defining functions for data extraction, and creating a BigQuery table with the appropriate schema.

Stay tuned for the next part of this series, where we will explore how to analyze and visualize the data we’ve extracted from the Spotify API using BigQuery and Tableau. I’ll also provide a guide to help you navigate through the tool.

If you want to see the complete code, check out my?GitHub repository.

Dipam Vasani

Senior Machine Learning Engineer at DocuSign

1 年

Good work :)

Abhijeet Vichare

Amazon | Indiana University | BITS Pilani | Data Science

1 年

Great read!

1 次回应

查看更多评论

要查看或添加评论，请登录

Asha Pondicherry的更多文章

Boxplots Using Matplotlib

2022年7月25日

Boxplots Using Matplotlib

Boxplots are used to visualize the data distribution and compare the distribution amongst various categorical groups…
Few Essential Pandas Functions.

2022年7月18日

Few Essential Pandas Functions.

This summer apart from volunteering in research work with Professor Julio, I also worked on a few Kaggle datasets which…

2 条评论

Unlock the Sounds of the Past: Discover Top Music Charts from the Past Decades with my New Interactive Tool

Asha Pondicherry

Data Analytics and Data Engineering

Identify the data to collect

领英推荐

Asha Pondicherry的更多文章

社区洞察

其他会员也浏览了

AI DeepSongs Review – Real Time Music & Video Composer (AI DeepSongs App By Akshat Gupta)

What is Suno V3? An Exploration of the AI Song Generator Revolutionizing Music Creation

Creating music with AI - The death of an industry?

Can AI Compose Your Next Hit Song? A Look at the Future of Music

So how does AI and recorded music actually work together?

How Americans Discover Music

Burning Your Own Mix: Empowerment in the Digital Age

MelodyBox AI Review: The Future of Music Creation is Here!

Music, Radio & Algorithms!!!

BeatBuddy

Identify the data to collect

领英推荐

Asha Pondicherry的更多文章

Boxplots Using Matplotlib

Few Essential Pandas Functions.

社区洞察

其他会员也浏览了

AI DeepSongs Review – Real Time Music & Video Composer (AI DeepSongs App By Akshat Gupta)

What is Suno V3? An Exploration of the AI Song Generator Revolutionizing Music Creation

Creating music with AI - The death of an industry?

Can AI Compose Your Next Hit Song? A Look at the Future of Music

So how does AI and recorded music actually work together?

How Americans Discover Music

Burning Your Own Mix: Empowerment in the Digital Age

MelodyBox AI Review: The Future of Music Creation is Here!

Music, Radio & Algorithms!!!

BeatBuddy