Analysis and Clustering of Pakistani Music by Lyrics: A Study of CokeStudio Pakistan
Keywords:
Clustering, Lyrics Analysis, Cultural Exploration, Unsupervised Learning, Text Classification, Processing, Natural Language, CokeStudioAbstract
This research explores the application of unsupervised learning techniques to categorize and understand the lyrical content of CokeStudio songs. In a world where music transcends cultural boundaries, this study delves into the rich linguistic tapestry of lyrics, unraveling emotions, themes, and cultural nuances. We begin by employing Natural Language Processing (NLP) and analysis techniques to uncover the emotional underpinnings of these lyrical compositions. This emotional layering becomes the foundation for the subsequent clustering process. Multiple unsupervised learning algorithms, including K-Means, Hierarchical Clustering, and DBSCAN, are employed to categorize songs into thematic clusters. The quality of these clusters is assessed using the silhouette score, with the optimal number of clusters determined as 5, achieving a score of 0.41641. Furthermore, we develop a robust classification model utilizing machine learning algorithms such as Logistic Regression, Decision Trees, Random Forest, Gradient Boosting, and Multinomial Naive Bayes for evaluation of our clustering. This model assigns CokeStudio songs to thematic clusters based on the results of topic modeling, enhancing our understanding of the cultural and emotional dimensions of these compositions. Logistic Regression, with SMOTE applied to NMF values, emerges as the best-performing model, achieving an impressive testing score of 89.47%. The research findings not only illuminate the intricate emotions and narratives woven into CokeStudio songs but also emphasize the practical application of machine learning in music analysis. By identifying and classifying thematic clusters within song lyrics, this study enriches our comprehension of cultural expressions through music and opens avenues for personalized music recommendations.
Downloads
Published
How to Cite
Issue
Section
License
This is an open Access Article published by Research Center of Computing & Biomedical Informatics (RCBI), Lahore, Pakistan under CCBY 4.0 International License