Unlock the Sounds of the Past: Discover Top Music Charts from the Past Decades with my New Interactive Tool
Growing up, music was always an essential part of my life. From my childhood days to college, every milestone in my life is associated with a specific song. However, with time, those songs became harder to find. That’s when I decided to create a dashboard that would bring back those musical memories from my childhood, school, and college.
With just a few clicks, I can now easily access top songs from the past, view their popularity, and even explore different genres. Now, whenever I’m feeling nostalgic or want to relive those happy memories with friends, all I have to do is select a specific year or a range of years and indulge myself in the nostalgia of the good old days. It’s amazing how music has the power to take us back in time and relive those special moments.
This dashboard has not only brought back those memories but also helped me discover new music and artists. I can’t wait to share it with my friends and family, and see what memories it brings back for them too. I hope you do too.
Part One:?Using Python and BigQuery to Extract and Store Data from the Spotify API
Part Two:?Exploring Music Data through SQL Analysis, Visualization, and Dashboard Guide
Key Considerations Before Diving into the Code: Setting Up Your Music Data Project
After completing the previous steps, the next crucial step is to register for a developer account to gain access to the API documentation and obtain the necessary API keys for authentication and accessing the required API endpoints. The registration process is straightforward and is well-documented. You can refer to?this?guide for a step-by-step explanation.
Identify the data to collect
To ensure we hit the correct API endpoint, it’s important to identify the data we want to collect. For this project, we will be mining data from the past few decades. To better understand and organize the features, I have divided them into three sections:
Track Information:?track name, artist, album, duration, and popularity.
Album Information:?album name, release date, total tracks, and album artwork.
Artist Information:?artist name, artist_popularity, genres, images.
Let’s dive into the code! The search endpoint will be our main focus, but we’ll also use a few other endpoints to fill in missing data such as artist popularity and genre.
Search for tracks on Spotify by year:
def search_spotify_year(query, token, offset = 0, limit=50)
"""
Args:
query: A string representing the search query.
token: A string representing the access token.
offset: An integer representing the offset.
limit: An integer representing the limit.
Returns:
A list of dictionaries representing the tracks returned by the search query.
"""
search_url = "https://api.spotify.com/v1/search"
headers = headers
params = params
response = requests.get(search_url, headers = headers, params = params)
response_data = response.json()
if 'tracks' in response_data:
return response_data['tracks']['items']
else:
print(f"Error fetching data for offset {offset}: {response_data}")
return []
One of the most satisfying aspects of working on this project has been improving my exception handling skills, which I applied throughout the code.
领英推荐
Next we define a function for retrieving the top 200 songs for a given year from the Spotify API.
def get_top_songs_year(year, token, total = 200)
"""
Args:
year: An integer representing the year to retrieve the top songs for.
token: A string representing the access token.
total: An integer representing the total number of songs to retrieve.
Returns:
A list of dictionaries representing the top songs for the given year.
"""
tracks = []
query = f"year:{year}"
limit = 50
for offset in range(0, total, limit):
results = search_spotify_year(query,token,offset,limit)
tracks.extend(results)
return tracks[:total]
this function returns a list of dictionaries for the top 200 songs for the given year. We use the artist ids and album ids from here and use it as an input to hit other endpoints.
"2023":{
"album":{},
"artists":[],
"disc_number":1,
"duration_ms":131013,
"explicit":false,
"external_ids":{},
"external_urls":{},
"href":"https://api.spotify.com/v1/tracks/6AQbmUe0Qwf5PZnt4HmTXv",
"id":"6AQbmUe0Qwf5PZnt4HmTXv",
"is_local":false,
"is_playable":true,
"name":"Boy's a liar Pt. 2",
"popularity":98,
"preview_url":"https://p.scdn.co/mp3-preview/543d8d09a5530a1ab94dd0c6f83fc4ee3e0d7f96?cid=27a49912ef9d4cefb4450983fd28627f",
"track_number":1,
"type":"track",
"uri":"spotify:track:6AQbmUe0Qwf5PZnt4HmTXv"
}
}
Given a list of artist IDs, return a dictionary containing the popularity of each artist.
def get_artist_popularity(artist_id)
"""
Parameters:
artist_id (list of str): A list of artist IDs.
Returns:
dict: A dictionary where the keys are artist IDs and the values are the popularity scores for each artist.
"""
artist_popularities = {}
for artist_id in artist_ids:
# Set up the request to the Artist API
artist_url = f"https://api.spotify.com/v1/artists/{artist_id}"
response = requests.get(artist_url, headers= {"Authorization": f"Bearer {get_token()}"})
if response.status_code == 200:
artist_data = response.json()
artist_popularity = artist_data["popularity"]
artist_popularities[artist_id] = artist_popularity
else:
print(f"Error retrieving artist information for artist ID {artist_id}.")
artist_popularities[artist_id] = None
return artist_popularities:
Given an artist ID, returns a dictionary of the artist ID and its corresponding genres from the Spotify API.
def get_artist_genre(artist_id)
"""
Args:
artist_id (str): The unique identifier for the artist on Spotify.
Returns:
dict: A dictionary where the keys are the input artist ID and the values are lists of genres.
"""
artist_genres = {}
for artist_id in artist_ids:
# Set up the request to the Artist API
artist_url = f"https://api.spotify.com/v1/artists/{artist_id}"
response = requests.get(artist_url, headers= {"Authorization": f"Bearer {get_token()}"})
if response.status_code == 200:
artist_data = response.json()
artist_genre = artist_data["genres"]
artist_genres[artist_id] = artist_genre
else:
print(f"Error retrieving artist information for artist ID {artist_id}.")
artist_genres[artist_id] = None
return artist_genres:
With the main function and helper functions in place, we can now extract all the required columns at once.
for year, tracks in top_songs_by_year.items()
top_songs_by_year[year] = []
for track in tracks:
artist_id = track["artists"][0]["id"]
artist_popularity = artist_popularities[artist_id]
genres = artist_genres[artist_id]
track_data = {
"track_name": track["name"],
"year": year,
"artist_name": track["artists"][0]["name"],
"album_name": track["album"]["name"],
"duration_ms": track["duration_ms"],
"song_popularity": track["popularity"],
"release_date": track["album"]["release_date"],
"total_tracks_in_album": track["album"]["total_tracks"],
"album_cover": track["album"]["images"][0],
"artist_popularity" : artist_popularity,
"artist_genres" : genres
}
top_songs_by_year[year].append(track_data):
To ingest this data to your BigQuery, you can use the Google Cloud SDK and the BigQuery Python library. Here are the general steps you can follow:
After completing the setup process, create a dataset and table directly from the Python environment, and define the schema for ingesting the data.
schema =
bigquery.SchemaField("track_name", "STRING"),
bigquery.SchemaField("year", "INTEGER"),
bigquery.SchemaField("artist_name", "STRING"),
bigquery.SchemaField("album_name", "STRING"),
bigquery.SchemaField("duration_ms", "INTEGER"),
bigquery.SchemaField("song_popularity", "INTEGER"),
bigquery.SchemaField("release_date", "DATE"),
bigquery.SchemaField("total_tracks_in_album", "INTEGER"),
bigquery.SchemaField("album_cover", "STRING"),
bigquery.SchemaField("artist_popularity", "INTEGER"),
bigquery.SchemaField("artist_genres", "STRING", mode="REPEATED"),
]
table = bigquery.Table(table_ref, schema=schema)
table = bq_client.create_table(table)[
In this article, we have discussed the process of mining data from the Spotify API and storing it in BigQuery for later use. We have walked through the different steps involved, including setting up the Python environment, obtaining API keys, defining functions for data extraction, and creating a BigQuery table with the appropriate schema.
Stay tuned for the next part of this series, where we will explore how to analyze and visualize the data we’ve extracted from the Spotify API using BigQuery and Tableau. I’ll also provide a guide to help you navigate through the tool.
If you want to see the complete code, check out my?GitHub repository.
Senior Machine Learning Engineer at DocuSign
1 年Good work :)
Amazon | Indiana University | BITS Pilani | Data Science
1 年Great read!