An informed and methodical way to explore new styles of music based on a single track and machine learning algorithms.
TL;DR: We made a web app that introduces listeners to new music genres, ranging from classical to jazz, given a song they already love.
“Classical music is boring.”
Sure, you might agree with that — especially if you didn’t dedicate your childhood to immersing yourself in the world of Mozart and Beethoven.
Hear me out though. Classical music is actually kinda cool.
Believe it or not, a lot of the dance-y, more “pleasurable” songs have its roots in the compositions of a bunch of dead dudes.
This might be hard to believe at first glance. But allow me to show you a common thread between the boppin’ bangers of today and grandpa’s bedtime music.
I’m Nanette, and I think music is pretty neat. Whether it’s today’s hits, tropical house, or classical concertos—you name it, I’m down to give it a listen.
I’m also a backend & data engineering intern at Chartmetric. I currently study computer science & music at MIT, and the fusion of music technology fascinates me tremendously—particularly the rapid growth of the music streaming industry.
One of the most astounding parts of Chartmetric’s product is that it has so. much. data. 13.6 million songs, 3.9 million albums, 1.7 million artists across Spotify, Apple Music, YouTube, SoundCloud, TikTok, and more 😱
I was initially startled by the daunting amount of data analytics. Where do I start, what do I do, and how do I simply make sense of the mountains of numbers and statistics?
In the face of so much possibility, I thought back to my interests in streamed popular music and my background in classical music. As I played around with our data, I found commonalities in sonic features of pop and classical songs: tempo, loudness, energy level.
Wait, can this be used to “bridge” these different worlds of music? Can we somehow “identify” a song by its acoustical characteristics and compare them to songs in a different genre?
Genrecommender: Genre(-Specific) Recommending
I teamed up with another intern Ethan Houston to build a genre-specific music recommendation website. The project is broken into three parts: the web app, recommender, and data processing, as shown below:
What does Genrecommender do?
Given any (your favorite!) song, we’ll send back 5 (out of ~750K possible) recommendations from a genre. You can take your pick from classical, country, jazz, R&B, or pop.
How does it work?
All you need to do is input a song name (or Spotify track link) and select a target genre on the frontend. Within seconds, you have five brand new songs you can jam to directly on our website.
What do I get to see?
The songs are displayed in a force graph, so the nodes & edges are interactive. The center node represents your input song, and each recommendation node is linked with an edge: the closer the node, the better the match, and the bigger the node, the more popular it is:
By using Chartmetric data, we also include hard-to-find analytics for each recommendation, including:
- Spotify Playlist Count: total number of playlists the track is on
- Spotify Reach: total followers those playlists have
- Popularity: Spotify’s algorithmically calculated popularity
But wait, there’s more! There’s a couple more links to give you even more info about each match: 1) the Chartmetric page for the recommendation if you’re a numbers kind of person, or 2) the top Spotify playlist the song is on.
If you’re interested in the technical development of the project, read on. Otherwise, you can play around with the app here!
Step 0) What I Need (to Know): Background
- Context: We had a few weeks to create, design, and implement an original project using Chartmetric data.
- Key Concepts: ingesting data, SQL queries, cleaning/filtering data, APIs, shortest distance algorithm, training pipelines, Flask, React, web apps
Step 1) Data, Data, and More Data: Data Ingestion
To recommend songs, we need to collect songs to work with. By querying Chartmetric’s database, we gathered genre-specific data for ~750K songs: a song’s genre (from iTunes), artist (from Spotify), and acoustical features (from Spotify/Echo Nest).
Step 1.5) Gotta Clean it Up: Data Processing
To get rid of songs that got taken down or lacked acoustical features, clean-up duty was done in a Jupyter notebook, which simplified the filtering process with data visualizations. Each pool of potential matches in a genre was stored in a pandas dataframe, where each song was reformatted, organized, and prepped for further manipulation.
Step 2) Machine Learning Central: Shortest Distance Algorithm
With data on hand, the next step was to determine how to compare the songs.
To do so, we had to decide what features to use to vectorize each song. Though this seemed straightforward, picking the features was a particularly agonizing process: having too few properties produced subpar recommendations, while too many features resulted in overfitting to a particular set of songs in one genre (i.e., there were always the same five classical songs that was most “pop”).
Ultimately, we chose four Spotify audio features: danceability (beat consistency), energy (excitement), tempo (speed), and valence (happy vs. sad). These seemed to generally cover a song’s “dimensionalities” across the five genres we tested.
To make the comparison from song to song, we checked for how close the two songs were with shortest distance algorithms. After experimenting with Jaccard, Minkowski, and Euclidean distances, we decided that Euclidean was our best bet, which performed equally as well in trial assessments if not better, to prioritize simplicity.
After the songs were vectorized, we used Scikit-Learn (sklearn) pipelines to perform recommendations. After the training process, the pipeline would fit a user’s input to our model and output recommendations sorted by how “close” in distance they were to the input.
Step 3) Bridging the Backend and Frontend: Making an API
Now that the data science backend was done, we needed a way to trigger our algorithm and interact with the data. We chose built an API using Flask & Gunicorn to allow the website to make HTTP calls to our endpoints.
To keep recommendations lightweight, we used the Pickle library to package up Python objects and preserve them across requests. This allowed us to constantly reuse our genre-specific dataframes and trained pipeline.
Step 4) Take it to the Web: Application Development
Genrecommender is just one of many possible applications of music analytics. Music streaming is a booming industry with an ever-growing amount of data, which opens the doors to so many options of what to do with and what kinds of music you’ll stumble across.
And maybe you’ll find that classical music isn’t so boring after all.
Huge kudos to Josh Hayes for instilling confidence in my work and his incredible mentorship, Tomi Kalmi for overhauling the frontend, and Komala Prabhu for her support in advising the backend development.