Let me start by saying that analyzing the music is for me extremely interesting and also challenging. Starting from gathering the data, preparing analysis and finally making insights. I would like to thank my friends Wojtek Szmidt, Ryszard Latecki, Jakub Krukowski, and Mikołaj Poncylisz, for their time and patience for answering my questions.
Use Youtube as a source or Spotify? Even if Spotify is less popular then Youtube, I decided to use Spotify as a source, because of quality ( well prepared, ordered and powerfull data) and possibility to access them by using Spotify API. I assumed that in case of data from YouTube, popularity and number of views might be affected by video features, actors, dancers, etc. Unfortunately Spotify is not free from defects. Not every track is avaiable at Spotify, some genres are less represented for example polish rock, disco polo. I suppose that some of record labels don't allow to publish music on Spotify, becasue of fear of potential financial losses. On the other hand selecting songs on mobile app in free version of Spotify is not possible. User can select the artis and wait for recommendation algoritm to work.
About the data
Data is based on daily reports from www.spotifycharts.com, ;that include the most frequently listened 200 songs via Spotify, filtered by specified countries ( Great Britain, Poland and Sweden ). Selected data period is 1/1/2016 - 17/2/2017.Track features, urls to preview song and all other data used on my reports come from Spotify API. At this dataset, there is no marker for streams generated by premium or freemium users.
Reports are interactive. Click to get findings.
Report I - TOP #1
Loudness - the overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db"
Admittedly surprise, I decided to ask my friends who know the sound engineering. It turn out that this variable is a technical parameter, used for measuring the sound quality. Value that is closest to 0 means that track has very good quality, there is no noises, sounds are clear. Obtaining this effect required mastering. Nowadays it is a common and standard procedure before launching a track. After that I decided to exclude from logit model all songs that were released earlier then tree years ago. Variable loudness stopped being important. Danceability has stand out, but not enough significantly. Over 70% of he most popular tracks on Spotify are pop or derivatives of pop. This genre has more danceability factors bynature.
I'm inclined to believe that the answer to the following question: why a track is in the top 5% of the most popular songs stays inside the budgets for marketing campaigns.
Find joyful, hot songs or sad and melancholy, for dance/ or definitely not. Values of variables were assigned by Spotify engineers by using machine learning methods.
If you want to read more about valence in music, I recommend you this article. that describes relation between personality and music preferences.
Value of average valence, has decline gradually at the begin of 2017.The largest drop was 14/2 (PL & SE). It can be assumed that the drop is affected by "Saint Valentine's Day", which is a time when people show feelings of love, affection and potentially looking for more romantic music (that is usually less euphoric)
At GB the day with the lowest level of the Valence was 16/1/2017 (at PL&SE it was the second "saddest" day from the begin of 2017).
This day falls "Blue Monday" that is called the most depressing day of a year.