Reinforcement Learning–driven Retention Strategy Optimisation for Telecom Customer Churn

David U. Ater

Electrical and Electronics Engineering Department, University of Uyo, Akwa Ibom State, Nigeria.

Kufre M. Udofia *

Electrical and Electronics Engineering Department, University of Uyo, Akwa Ibom State, Nigeria.

Akaninyene B. Obot

Electrical and Electronics Engineering Department, University of Uyo, Akwa Ibom State, Nigeria.

*Author to whom correspondence should be addressed.


Abstract

Customer churn mitigation in telecommunications extends beyond identifying at-risk subscribers to determining optimal retention actions under cost and operational constraints. This study formulated churn mitigation as a sequential decision optimisation problem and proposed a reinforcement learning–based framework for adaptive retention management. Modelling customer–operator interactions as a Markov Decision Process and employing a Deep Q-Network (DQN), the framework learns retention policies that maximise long-term outcomes rather than relying on static, rule-based interventions. Unlike traditional linear models, the proposed framework leverages DQN's high-dimensional feature extraction capabilities to process complex subscriber data—including call detail records, recharge patterns, and service complaint history. The system maintains computational efficiency as the state space expands, enabling application in large-scale telecommunications networks. Predicted churn risk and behavioural indicators are incorporated into the environment state, while retention actions are optimised using a cost‑aware reward structure. Experimental evaluation was conducted on the publicly available Kaggle “Telco Customer Churn” dataset (7,043 customers, 26.5% churn rate). The DQN policy was compared against three baselines: no intervention, static rule‑based (fixed churn probability threshold), and uniform incentive. Key performance metrics include cumulative reward, average reward per customer, and action distribution. Results show that the DQN‑based policy achieves a higher cumulative reward (0.85 normalised average reward per customer) than the rule‑based (0.65) and no‑intervention (0.42) baselines, while reducing high‑cost incentive use from 50% (rule‑based) to 10%. The policy converges stably within 1,000 training episodes. By explicitly separating churn prediction from retention decision‑making, this work advances churn management toward adaptive, data‑driven policy optimisation suitable for real‑world telecommunications environments.

The quantitative results reported herein are illustrative demonstrations based on a simulated environment derived from the Kaggle dataset; they do not constitute empirical evidence of real-world performance. The primary contribution is the methodological formulation of churn mitigation as a sequential decision problem and the architectural design of a reinforcement learning framework for adaptive retention policy optimisation.

Keywords: Telecommunications churn mitigation, reinforcement learning, Deep Q-Network, Markov Decision Process, retention policy optimisation, cost-aware decision-making, customer relationship management.


How to Cite

Ater, David U., Kufre M. Udofia, and Akaninyene B. Obot. 2026. “Reinforcement Learning–driven Retention Strategy Optimisation for Telecom Customer Churn”. Journal of Engineering Research and Reports 28 (6):409-24. https://doi.org/10.9734/jerr/2026/v28i61936.

Downloads

Download data is not yet available.