I love music, and I like learning about the history of the music industry. One of my hobbies is keeping up with music chart archives and their development over time. There is an entire online culture of music fandoms which actively follows the charts from day-to-day, and I find the game interesting! So, I thought I'd start a miniseries on some of the tools I use to make exploring the charts a little easier.
Here's the first one. I'll introduce the Billboard music charts and provide some scripts for easily scraping and manipulating the publicly available chart archives. The goal is to provide the raw data in a way that's both easy to use in a hacky way and integrate in other systems.
Billboard is an entertainment media brand which maintains some of the longest-running, and certainly the most widely followed, music charts in the world. These charts try to quantify and document the most popular songs and albums in the United States. The two most popular and important charts are the Hot 100, which is a ranking of the 100 most popular songs in the U.S., and the Billboard 200, a ranking of the 200 most popular albums in the U.S. Both charts are published each week.
The way that we consume music has been revolutionized numerous times over the lifetime of these charts, most recently with the advent of digital streaming. Accordingly, the methodology behind the rankings of these charts has evolved over time. Though the exact formula for the Hot 100 is secret, the formula takes into account single sales, radio airplay as measured by Nielsen Soundscan, and a variety of digital streaming outlets ranging from Apple Music to Spotify to YouTube. This evolution is important when doing any sort of long-term chart analyses, but since this post is just an introduction to the charts and tools for exploring them, I'll ignore that for now.
The Billboard Chart Archives
Billboard recently made their entire chart archives available online. The Hot 100 archive stretches back to 1958, while the Billboard 200 archive stretches back to 1983. Notably, Billboard has hosted album and singles charts that precede these dates, but it's still great to be able to have these archives at our disposal! You can find the Hot 100 Archives here and the Billboard 200 Archives here.
At the bottom of this post, you'll find a simple Python script for scraping data from the chart archives, as well as the entire chart archives as of the chart week of January 30, 2016 in several formats for easy input to other programs. The script is lightweight, and doesn't seek to provide any classes or mechanisms for exploration, but I figured I'd share a few snippets to show how easy it is anyway.
A chart run is the entire chronological sequence of an album's chart appearances.
Most Hot 100 Appearances
The artists with the most Hot 100 chart entries.
Billboard Chart Scraper
Hot 100 Archive (September 6, 1958 - January 30, 2016)
Billboard 200 Archive (November 5, 1983 - January 30, 2016)
List of dates missing from archives