Predictive Analysis of Employee Attrition Using Machine Learning
Mirali Mammadzade *
Department of Computer Science, University of Lodz, Lodz, Poland.
*Author to whom correspondence should be addressed.
Abstract
Background: Employee attrition is a critical organizational issue that affects productivity, operational efficiency, and workforce stability, making employee retention an important strategic priority. Advances in machine learning have enabled organizations to analyze complex workforce data and predict employee turnover more accurately, supporting proactive retention strategies.
Aims: Employee attrition has become an important organizational challenge because persistent workforce turnover may negatively affect productivity, operational continuity, institutional knowledge retention, and long-term organizational stability. In recent years, predictive analytics and machine learning approaches have increasingly been adopted in workforce analytics to identify employees who may demonstrate elevated resignation risk before actual turnover occurs. The primary objective of this study is to evaluate the predictive effectiveness of supervised machine learning classification algorithms for employee attrition prediction using HR analytics data. The research additionally investigates how workplace conditions, compensation-related variables, employee satisfaction indicators, and professional experience influence workforce turnover behavior and organizational retention patterns.
Study Design: This study was conducted using a quantitative experimental research design based on supervised machine learning classification methods and comparative predictive analytics evaluation.
Place and Duration of Study: The experimental analysis was performed using the IBM HR Analytics Employee Attrition dataset between February 2026 and April 2026.
Methodology: The dataset contained demographic, financial, behavioral, and workplace-related variables associated with employee retention conditions and organizational stability patterns. Before model implementation, several preprocessing procedures were applied, including duplicate inspection, categorical feature transformation, feature scaling, and exploratory data analysis. Employee attrition status was defined as the target variable of the supervised classification task. Four supervised machine learning classification algorithms were implemented and comparatively evaluated using Python-based machine learning libraries, including Support Vector Machine, XGBoost, LightGBM, and CatBoost classifiers. The dataset was divided into training and testing subsets using an 80:20 ratio to evaluate predictive capability on previously unseen employee observations. Model performance evaluation was conducted using Accuracy, Precision, Recall, F1-score, and ROC-AUC metrics. Feature importance analysis and ROC curve evaluation were additionally performed to identify the organizational and behavioral variables contributing most strongly to employee turnover prediction.
Results: The experimental findings demonstrated noticeable differences in predictive capability among the evaluated classification models. Support Vector Machine achieved the strongest overall classification performance with an Accuracy score of 0.864 and ROC-AUC value of 0.816. LightGBM additionally demonstrated stable classification behavior and maintained balanced Precision performance throughout testing evaluation. In contrast, XGBoost and CatBoost produced comparatively lower Recall and F1-score values when identifying minority attrition observations. The findings additionally revealed that TotalWorkingYears, Age, MonthlyIncome, OverTime, and WorkLifeBalance were among the most influential variables associated with workforce attrition behavior. Employees with lower professional experience, lower compensation levels, excessive overtime exposure, and weaker workplace satisfaction conditions appeared more likely to demonstrate resignation tendencies.
Conclusion: The findings of this research confirm that supervised machine learning techniques can provide effective support for employee attrition prediction and workforce analytics applications. Among the evaluated models, Support Vector Machine demonstrated the strongest overall predictive capability under the implemented experimental conditions. The study additionally highlights the practical importance of predictive HR analytics in supporting employee retention planning, organizational decision-making, and workforce stability management. Furthermore, the results emphasize that preprocessing consistency, balanced model evaluation, and feature importance analysis play important roles in improving classification reliability and interpretability within workforce analytics systems.
Keywords: Employee attrition, workforce analytics, supervised learning, predictive analytics, HR analytics, employee turnover, classification models, machine learning.