Personal Project: Finding Link Between Rain and Grunge, Part 1

Whenever my girlfriend and I are out driving and listening to Portland’s alternative radio stations, we’re convinced we hear more grunge songs when it’s rainy. We swear that the dreary weather subconsciously — or consciously, for all we know — causes DJs and program directors to play more songs by Nirvana, Pearl Jam, and the like. It’s become a running joke whenever they play “Black Hole Sun”, and not just because of the irony of “washing away the rain.”

Now, back in the days when Kurt Cobain was still alive, such a preposterous suggestion would be nearly impossible to correlate. Assuming you didn’t know anyone at the stations to feed you the playlists, you would need to listen 24/7 to write down the songs they played. Plus, you would need to track whether or not it’s raining. And you couldn’t really track it from where you lived, so you’d have to camp out next to the station to get the most accurate readings. Hasta la vista, social life. And if you missed just one reading, well… bummer, dude.

But now, thanks to combining the magic of the Interwebs and statistical analysis, we can let computers do the heavy lifting for us. Over the span of a few articles, let’s see if this fake science is actually pseduoscience instead.

Methodology

This is going to be a multiple step process, so here’s my plan of attack:

  1. Obtain song playlists for Portland’s alternative radio stations, 94/7 and Radio 102.3, over a certain period of time
  2. Obtain historical data from the closest weather stations to these stations over that same period
  3. Separate songs into grunge/not-grunge, and weather into rainy/not-rainy
    • “Overcast” may also become a weather category
  4. For each radio/weather station combination, determine the ratio of grunge/n0n-grunge songs when it’s raining and not raining

I’m sure there’ll be a hypothesis step wedged somewhere in between № 3 and 4. We don’t need to worry about the actual science this second. Now, it’s time to gather some data. On to Step № 1!

Step № 1: Gathering Music Data

Finding Data on 94/7 FM

Alternative station 94/7 FM has been broadcasting in Portland for almost 22 years. Like just about every station website I’ve seen, they include a “now playing” feature on their homepage that shows the current song and artist.

First thing we need to see is if it’s a DOM element. If so, then it’ll likely be updated by AJAX, which means we should be able to track what happens in the browser. I right-click the title to inspect it.

The song and artist are just a couple of divs — exactly what I was hoping for. I wait around for a couple minutes to see if the site hits an external API of some kind. Sure enough, an AJAX request from TuneGenie shows up in the list:

Following the request URL gives me this data:

jQuery112403641058078830235_1489475504591({"meta": {"status": 200, "base_url": "http://knrk.tunegenie.com"}, "response": [{"artistlink": "/music/the-cranberries/", "sid": 42245, "played_at_display": "12:12 AM", "sslg": "linger", "songlink": "/music/the-cranberries/_/linger/", "concertslink": "/music/the-cranberries/_concerts/", "played_at": "2017-03-14T00:12:25-07:00", "albumslink": "/music/the-cranberries/_albums/", "artist": "The Cranberries", "song": "Linger", "videolink": "/music/the-cranberries/_/linger/_video/", "aslg": "the-cranberries", "lyriclink": "/music/the-cranberries/_/linger/_lyric/", "campaignlink": "/contest/the-cranberries/", "topttrackslink": "/music/the-cranberries/_toptracks/"}, {"artistlink": "/music/empire-of-the-sun/", "sid": 1145676144, "played_at_display": "12:08 AM", "sslg": "high-and-low", "songlink": "/music/empire-of-the-sun/_/high-and-low/", "concertslink": "/music/empire-of-the-sun/_concerts/", "played_at": "2017-03-14T00:08:48-07:00", "albumslink": "/music/empire-of-the-sun/_albums/", "artist": "Empire of the Sun", "song": "High and Low", "videolink": "/music/empire-of-the-sun/_/high-and-low/_video/", "aslg": "empire-of-the-sun", "lyriclink": "/music/empire-of-the-sun/_/high-and-low/_lyric/", "campaignlink": "/contest/empire-of-the-sun/", "topttrackslink": "/music/empire-of-the-sun/_toptracks/"}]});

Let’s strip away the JSONP wrapper from the output and make it pretty:

{
  "meta": {
    "status": 200,
    "base_url": "http://knrk.tunegenie.com"
  },
  "response": [
    {
      "artistlink": "/music/the-cranberries/",
      "sid": 42245,
      "played_at_display": "12:12 AM",
      "sslg": "linger",
      "songlink": "/music/the-cranberries/_/linger/",
      "concertslink": "/music/the-cranberries/_concerts/",
      "played_at": "2017-03-14T00:12:25-07:00",
      "albumslink": "/music/the-cranberries/_albums/",
      "artist": "The Cranberries",
      "song": "Linger",
      "videolink": "/music/the-cranberries/_/linger/_video/",
      "aslg": "the-cranberries",
      "lyriclink": "/music/the-cranberries/_/linger/_lyric/",
      "campaignlink": "/contest/the-cranberries/",
      "topttrackslink": "/music/the-cranberries/_toptracks/"
    },
    {
      "artistlink": "/music/empire-of-the-sun/",
      "sid": 1145676144,
      "played_at_display": "12:08 AM",
      "sslg": "high-and-low",
      "songlink": "/music/empire-of-the-sun/_/high-and-low/",
      "concertslink": "/music/empire-of-the-sun/_concerts/",
      "played_at": "2017-03-14T00:08:48-07:00",
      "albumslink": "/music/empire-of-the-sun/_albums/",
      "artist": "Empire of the Sun",
      "song": "High and Low",
      "videolink": "/music/empire-of-the-sun/_/high-and-low/_video/",
      "aslg": "empire-of-the-sun",
      "lyriclink": "/music/empire-of-the-sun/_/high-and-low/_lyric/",
      "campaignlink": "/contest/empire-of-the-sun/",
      "topttrackslink": "/music/empire-of-the-sun/_toptracks/"
    }
  ]
}

I’ve highlighted the relevant data that we need: An artist name, a song title, and a complete ISO 8601-formatted timestamp. This is great and all for finding out what’s currently playing, but can we determine what was already played?

Here’s the original TuneGenie API URL, with each query string parameter on its own line:

https://api.tunegenie.com/v1/brand/nowplaying/
    ?callback=jQuery112403641058078830235_1489475504591
    &apiid=m2g_bar
    &b=knrk
    &count=1
    &since=2017-03-14T00:08:48-07:00
    &_=1489475504594
1

Immediately two of the query string parameters jump out at me: since and count. since is another complete ISO 8601 timestamp, and count appears to be a limiter on the number of results that could be returned.

…Except we didn’t get only one result back in the JSON response above. We got two. Furthermore, as the night wore on, the list grew and grew, capping at 25 songs regardless of what count was set at. This tells us that not only can we get songs in the past, but we only need to focus on since.

Now that we know we can see multiple songs, how far back can we go? We find that out by altering since in the URL.

  • Setting since back an hour, to 2017-03-13T23:08:48-07:00, yields 25 songs ranging from 11:16 PM to 12:56 AM, including the two returned in my original output.
  • Setting it back a day, to 2017-03-13T00:08:48-07:00, gives us songs from 12:08 AM to 1:45 AM the previous day.
  • Going back a week to 2017-03-07T00:08:48-08:00 (stupid DST) gives us a similar list from that day.
  • Same thing going back a month to 2017-02-14T00:08:48-08:00.
  • Same thing going back three months to 2016-12-14T00:08:48-08:00.
  • However, going back six months, to 2016-09-14T00:08:48-07:00, caused me to run up against TuneGenie’s hard limit. As of this writing, the earliest song data I could retrieve was that Bishop Briggs’ “Wild Horses” played at 10:49 PM on December 13, 2016.

Thus, we have a moving 90-day window for retrieving song data from TuneGenie’s API. Delicious.

Retrieving the Data from 94/7 FM

We now have the acceptable date range, but the 25 song limit makes manual data retrieval out of the question. Doing some rough math, 25 songs appears to take up around an hour and 40 minutes, giving us 1 song played every 4 minutes on average. Presuming that rate stays constant, we can determine how many requests we need to make to get us all the data we can by taking the total number of minutes over 90 days, dividing that by 4 (for minutes per song), and then dividing that by 25 (for songs per API request):

So it would take us roughly 1,300 requests to collect this data. Definitely need a script to do the hard work for us. However, now we need to know if we can access it programmatically. It’s possible TuneGenie has some sort of mechanism to prevent any old third party user from accessing its data outside a browser. Therefore, we need to run a few tests to see if what we want is possible.

I’m going to use PHP for these tests due to its familiarity and ease of use. With any luck, we can complete a request, retrieve the response, strip away the JSONP wrapper and extract the data without running into any problems.

file_get_contents()

My first attempt is by passing the URL into file_get_contents(), like so:

<?php

$url = 'https://api.tunegenie.com/v1/brand/nowplaying/?callback=stripMe&apiid=m2g_bar&b=knrk&count=1&since=2017-03-14T00%3A08%3A48-07%3A00&_=1489475504594';

$output = file_get_contents($url);

var_dump($output);

You may have noticed that I changed callback from the jQuery default callback name to stripMe. This is so that it’s easier for me to regex away the JSONP wrapper from the output later on. callback can more or less be anything when you use this method.

Fortunately, we’re not blocked from requesting data this way. Running this code locally returned the full JSONP string:

Because I’ve always felt file_get_contents() was kinda janky, let’s try some other methods.

cURL Library

My second attempt, using cURL, is as follows:

<?php

$url = 'https://api.tunegenie.com/v1/brand/nowplaying/?callback=stripMeCURL&apiid=m2g_bar&b=knrk&count=1&since=2017-03-14T00%3A08%3A48-07%3A00&_=1489475504594';

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
curl_close($ch);

var_dump($output);

We again get the data back, so they aren’t blocking these kinds of requests either.

Not Linux-y enough for you? Let’s try using wget as well.

wget

To run wget, we’ll need to use one of PHP’s system call functions. I prefer to use exec(), but use whichever one’s appropriate for your situation.

<?php

$url = 'https://api.tunegenie.com/v1/brand/nowplaying/?callback=stripMeWGET&apiid=m2g_bar&b=knrk&count=1&since=2017-03-14T00%3A08%3A48-07%3A00&_=1489475504594';

exec('wget -N -O - "' . $url . '"', $output);

var_dump($output);

I’ll admit the wget syntax was a bit tricky to set up. The -N flag turns on timestamping and the -O flag specifies an output file (- means print to standard output). Additionally, because our URL has a query string, and since & is a special character in commands, we need to wrap it in quotation marks. It would be akin to writing:

wget -N -O - "https://api.tunegenie.com/v1/brand/nowplaying/?callback=stripMeWGET&apiid=m2g_bar&b=knrk&count=1&since=2017-03-14T00%3A08%3A48-07%3A00&_=1489475504594"

This call dumps the output into the $output variable. The other two methods stored this output as a string, but exec() places it in an array, albeit one with a single item.

Great Success!

So we’ve found out that we can access 94/7’s data in numerous ways. From then all, it’s all a matter of looping through 90 days of data, then storing title, artist and timestamp in a database — the code for which will be shown in a future update. The next post in this series will cover my attempts at extracting data from the other alternative station, Radio 102.3.

Endnotes

  1. The original URL as copied from the dev console had the datestamp URL-encoded, using %3A instead of colons. Because the URL still worked with colons, I opted for readability and decoded the entities for all non-code URLs in this post.

2 thoughts on “Personal Project: Finding Link Between Rain and Grunge, Part 1”

  1. I know this article is older now, but it was a great help. So let me return the favor. I’ve found you can drop the count parameter altogether, and use the “until” parameter.

    ie, https://api.tunegenie.com/v1/brand/nowplaying/?callback=jQuery112403641058078830235_1489475504591&apiid=m2g_bar&b=knrk&since=2018-12-31T00:00:0-07:00&until=2019-02-01T19:24:00-07:00&_=1489475504594

    Sometimes it’ll error out because it takes a bit to download, but if you fiddle with the dates you’ll get a lot more data at once.

Leave a Reply

Your email address will not be published. Required fields are marked *