Skip to content

Latest commit

 

History

History
126 lines (71 loc) · 6.74 KB

README.md

File metadata and controls

126 lines (71 loc) · 6.74 KB

Spotify-Data-Analysis

A Scala implementation

Introduction

As a part of one of our courses ; Big Data Analytics, we have completed a project in which we have analysed the various features of a Spotify dataset such as danceability,popularity and valence. We have implemented this using Spark which is an open-source, distributed processing system which is used for big data workloads. It utilizes a technique called 'memory caching' in which computer applications temporarily store data in a computer's main memory (RAM) and optimized query execution for fast queries against data of any size. Simply put, Spark is a fast and general engine for large-scale data processing.

In this project, we have focussed majorly on the following four decades:

80s : 1980-1990

90s : 1990-2000

Early 2ks : 2000-2010

Late 2ks : 2010 to present date

image

We are mainly interested in analysing how people’s taste in music has evolved over these decades.

Dataset Overview

This dataset contains more than 25 lakh songs collected from Spotify Web API. The features, as mentioned above include song, artist, release date as well as some characteristics of song such as acousticness, danceability, loudness, tempo and so on. The date ranges from 1921 to 2020. The features and what they indicate are briefly explained as and when they are required along this readme file.

Analysis

1. Number of songs that got released in each decade:

image

Conclusion :

As we can clearly see there is almost two times increase in the number of songs that has been released in the last decade to the previous decades

2. Relation Between the Popularity of the Artists and the Popularity of their song:

image

Conclusion :

It is observed that the popularity of the artists has slightly influenced the popularity of the songs during the first three decades. Surprisingly the percentage of songs from unpopular artists that became popular is very high during this current decade. Therefore we can safely conclude that the current generation’s likes and dislikes depend on the quality of the content and not just mere popularity of the artists.

3. Popularity over the decades:

image

Conclusion :

We can clearly see that there is a very clear and gradual increase in the number of songs that became popular as the decades progress. This can also mean that the number of Spotify listeners has grown significantly over the years.

4. High Valence or Low Valence?

Valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

image

Conclusion :

Notice that in most cases in each decade the songs with high valence is slightly more popular but in the case of the current 2ks it is very evident and the difference in the values are noticeable

5. Instrumentalism Vs Vocalness

Instrumentalness : Predicts whether a track contains no vocals. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content.

image

Conclusion :

In all the four decades the Spotify listeners have distinctly preferred vocal songs over instrumental music. This information is clearly seen as almost ten times of instrumental song listeners listen to vocal songs.

6. Danceability & Energy:

Danceability:
Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable. Energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.

image

Conclusion :

In all the four decades the we can safely say that nearly half or more percentage of danceable songs are high energetic songs. This trend can be especially seen in the early 2ks where almost 71.5 % danceable songs were so high on energy.

7. Speechiness every 2 decades

Speechiness: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value.

image

Conclusion :

The number of songs with high speechiness is significantly high during the first two decades and it gradually decreases. This implies that podcasts was popular then than now

8. Correlation:

Correlation is a statistic that measures the degree to which two variables move in relation to each other.

Pearson’s Correlation Coefficient formula:

image

8.1 ENERGY AND LOUDNESS

image

Conclusion:

Loudness and Energy share a small positive correlation.

8.2 ACOUSTICNESS AND LOUDNESS

image

Conclusion:

Loudness and Acousticness share a small negative correlation.

8.3 TEMPO AND ENERGY

image

Conclusion:

Tempo and Energy share a very small positive correlation which can almost be neglected.

Note: The results have been recorded from the outputs obtained after running the scala code. For visualising the analysed values, the plots are created in jupyter notebook.The part of the dataset obtained from kaggle that has been used for the analysis can be found using the following link:

https://amritavishwavidyapeetham-my.sharepoint.com/:f:/g/personal/cb_en_u4aie19030_cb_students_amrita_edu/EiEqzaNHJy9DhwpXgrAEauQBT9mUd67CDI7waO6j7xmPpQ?e=V2umSY