Music preferences of Spotify users from Poland, Sweden and Great Britain

Polish version poland 640

Let me start by saying that analyzing the music is for me extremely interesting and also challenging. Starting from gathering the data, preparing analysis and finally making insights. I would like to thank my friends Wojtek Szmidt, Ryszard Latecki, Jakub Krukowski, and Mikołaj Poncylisz, for their time and patience for answering my questions.

Why Spotify?

Use Youtube as a source or Spotify? Even if Spotify is less popular then Youtube, I decided to use Spotify as a source, because of quality ( well prepared, ordered and powerfull data) and possibility to access them by using Spotify API. I assumed that in case of data from YouTube, popularity and number of views might be affected by video features, actors, dancers, etc. Unfortunately Spotify is not free from defects. Not every track is avaiable at Spotify, some genres are less represented for example polish rock, disco polo. I suppose that some of record labels  don't allow to publish music on Spotify, becasue of fear of potential financial losses. On the other hand selecting songs on mobile app in free version of Spotify is not possible. User can select the artis and wait for recommendation algoritm to work.

About the data

Data is based on daily reports from, ;that include the most frequently listened 200 songs via Spotify, filtered by specified countries ( Great Britain, Poland and Sweden ). Selected data period is 1/1/2016 - 17/2/2017.Track features, urls to preview song and all other data used on my reports come from Spotify API. At this dataset, there is no marker for streams generated by premium or freemium users.

Reports are interactive. Click to get findings.

Report I -  TOP #1 

Do you know how many times Ed Sheeran and his top track "Shape of You" was at #1 ? 
By using logistic regression, I was trying to find the feature of track that has the highest impact on popularity. List of all features is here. After defining threshold of popularity as top 5% of songs sorted descending by number of streams, preparing data set, data steps at SAS I finally get the most important variable.  
Loudness - the overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db"

Admittedly surprise, I decided to ask my friends who know the sound engineering. It turn out that this variable is a technical parameter, used for measuring the sound quality. Value that is closest to 0 means that track has very good quality, there is no noises, sounds are clear. Obtaining this effect required  mastering. Nowadays it is a common and standard procedure before launching a track. After that I decided to exclude from logit model all songs that were released earlier then tree years ago. Variable loudness stopped being important.  Danceability has stand out, but not enough significantly.  Over 70% of he most popular tracks on Spotify are pop or derivatives of pop.  This genre has more danceability factors bynature.

I'm inclined to believe that the answer to the following question: why a track is in the top 5% of the most popular songs stays inside the budgets for marketing campaigns.

Select aggregation type for date (daily, weekly, monthy, quartely or yearly) and click on the tracks. You can compere parametres of any track with  the average of the most popular songs from #1



Find joyful, hot songs or sad and melancholy, for dance/ or definitely not. Values of variables were assigned by Spotify engineers by using machine learning methods.
If you want to read more about valence in music, I recommend you this article. that describes relation between personality and music preferences. 


Value of average valence, has decline gradually at the begin of 2017.The largest drop was 14/2 (PL & SE). It can be assumed that the drop is affected by "Saint Valentine's Day", which is a time when people show feelings of love, affection and potentially looking for more romantic music (that is usually less euphoric)

At GB the day with the lowest level of the Valence was 16/1/2017 (at PL&SE it was the second "saddest" day from the begin of 2017).
This day falls "Blue Monday" that is called the most depressing day of a year.

Raport V Comparing features of listened tracks by user country.

  Least dance music from 01.01.2016 -17/2/2017, was listened by Swedish users.



              USED Tools - Tableau, R, Sas E.G, Excel,Notepad++.

              If you have any questions about analysis, gathering data, preparing tableau reports, feel free to contact me vie email - This email address is being protected from spambots. You need JavaScript enabled to view it.