User Preference Segmentation for Movie Recommendation Systems Using Clustering Algorithms

Mirali Mammadzade *

Department of Computer Science, University of Lodz, Lodz, Poland.

*Author to whom correspondence should be addressed.


Abstract

Background: Online streaming platforms generate vast amounts of user interaction data, and recommendation systems use clustering and machine learning techniques to analyze this data and deliver personalized movie suggestions.

Aims: Movie recommendation platforms collect a large amount of information about user activity through ratings and viewing behavior. Since users usually have different interests, recommendation systems may struggle to provide equally accurate suggestions for everyone. The aim of this study is to examine whether clustering algorithms can group users according to their movie preferences and rating habits. The study also explores how unsupervised machine learning methods can help identify hidden behavioral patterns in recommendation data.

Study Design: The research was conducted as a quantitative study using unsupervised machine learning and clustering analysis.

Place and Duration of Study: The study was performed using the MovieLens 25M dataset in duration of 45 days between end of February 2026 and beginning of April 2026.

Methodology: The MovieLens 25M dataset was selected because it contains a large number of movie ratings created by users with different viewing interests. Several preprocessing steps were carried out before model implementation. These steps included removing incomplete records, filtering inactive users with insufficient rating activity, scaling numerical variables, grouping genre-related features, and converting sparse rating information into user-based feature matrices. After preprocessing and activity-based filtering, the final analysis was conducted on 861 active users from the original MovieLens dataset.

Principal Component Analysis was applied to reduce dimensionality and simplify visualization of user groups. Different clustering algorithms were implemented using Python and scikit-learn, including K-Means Clustering, Hierarchical Clustering, Agglomerative Clustering, and DBSCAN. Cosine Similarity was additionally used to compare similarity between users according to their rating behavior. The quality of clustering results was evaluated using Silhouette Score, Davies–Bouldin Index, inertia values, and visual interpretation of cluster separation.

Results: The analysis showed that clustering methods were able to separate users into several meaningful groups based on movie interests and rating behavior. K-Means produced the clearest and most balanced cluster structure among the selected algorithms. Some user groups mainly preferred action, adventure, and science fiction movies, while others showed stronger interest in drama, romance, thriller, and documentary genres. PCA visualization showed visible separation between the major user groups after dimensionality reduction. DBSCAN also identified smaller groups of users with unusual or inconsistent rating activity. Overall, the clustering results helped reveal hidden behavioral differences between users inside the recommendation dataset.

Conclusion: The findings of this study show that unsupervised machine learning techniques can be useful for analyzing user behavior in movie recommendation systems. Clustering methods make it easier to identify users with similar viewing habits and genre preferences. The results also show that clustering quality depends strongly on how user features are prepared before modeling. In practical applications, this type of analysis may help recommendation platforms better understand audience behavior and provide more personalized movie suggestions.

Keywords: Movie recommendation systems, clustering analysis, unsupervised learning, MovieLens dataset, recommendation systems, behavioral segmentation


How to Cite

Mammadzade, Mirali. 2026. “User Preference Segmentation for Movie Recommendation Systems Using Clustering Algorithms”. Journal of Engineering Research and Reports 28 (6):1-12. https://doi.org/10.9734/jerr/2026/v28i61909.

Downloads

Download data is not yet available.