Calculating CLV, churn and retention

Research question: What are the lifetime values, churn and retention rates of different segments of online subscribers?


As explained in this article, the customer lifetime value (CLV) calculation attempts to determine the revenue that a customer is expected to produce for a business during their lifetime as a paying customer. This can enable the business to make decisions about products or services, and how much money to invest into marketing, acquisition and retention. It also offers a better understanding of high churn activity, customer lifecycle and profits generated by individual segments. A number of KPIs can be derived from such an analysis, such as retention rate and time before churn. The brief for this project was to calculate CLV for each segment, ascertain certain patterns from a dataset, such as retention rates with segments, and if it were possible to predict churn times.

The CLV Dataset

The dataset represents a small sample of monthly subscription data from a web hosting provider, and the period covered is from 2016 to 2021. Three groups of users from a broad value-based segmentation are represented: Hobbyist, Developer and Corporate. This is in place of a simple cohort model, where customers would be grouped by acquisition date.  The average monthly spend is detailed, along with the date of capture and if relevant the date of defection (subscription cancellation). A sample from the dataset can be seen below:

Id CPM (£) Acquisition time Defection time Segments
1 109.99 18/05/2020 21/07/2020 Corporate
2 74.99 14/12/2020 20/12/2021 Developer
3 52.99 20/10/2016 26/12/2017 Hobbyist
4 89.99 05/06/2019   Corporate
5 71.99 11/08/2020 27/11/2020 Developer
6 79.99 20/12/2021   Corporate

The Probabilistic Model

The model that has been employed is the General Retention Model (GRM). Expanding on the Simple Retention Model (SRM), the GRM allows for time-varying retention rates and cash flows, as well as cash flows that are dependent on the time of cancellation. It also demonstrates how to calculate the CLV in such scenarios. In this particular discrete GRM, however, hazard rates are estimates. Noting how these three metrics (retention rate, probability density function, and survival function) are related is crucial. Specifically, we may calculate the remaining two variables based on the knowledge of any one of the three.

During the period T, where T is a random variable, a business gains customers that create cash flow at intervals t = 0, 1, 2,..., until cancellation. Customers can cancel at any time, possibly for a fee, and never come back and the likelihood of retaining a customer at a point of time, t, is denoted by rt. In contrast to the more stringent assumption made by the SRM, the GRM allows for rt to be different from r at time t. In addition, we assume that a customer cancelling at time t is unrelated to cancelling at time t′ ≠ t. We looked at the following metrics:

  • CLV: If defection happens at the beginning of time period t, there will be t cash flow if the cancellation occurs at the start of the period and t-1 cash flow if it occurs at the end.  A discount rate, d, is assumed here.
  • Cancellation/churn time: We assume that all subscribers belonging to a certain segment are maintained during each subscription term with probability r (retention rate) for all subscription periods. In addition, suppose that a client defection happening during one period is independent of defection occurring during any other period. T is a random variable representing the cancellation time and follows a geometric distribution given these assumptions.
  • Estimation of retention rates: In theory, the retention rate r is assumed to be known, but in fact, this is not always the case. We thus propose to predict the latter based on the available facts. In any business, some consumers will cancel, but not all. Customers who have not yet cancelled are considered censored. Therefore, we can assert that the business has not yet observed this customer defection.

CLV Analysis and Results

The first set of results illustrated is the average CLV per segment. In figure 1 we are able to see that the least profitable segment is Corporate (£1553.81), followed by Developer (£1713.21) and then Hobbyist (£1768.97).

CLV by Segment

Figure 1 - CLV by Segment

In table 1 and figure 2 we can see the Corporate segment is the least loyal and those in the Hobbyist segment are the most loyal, with a monthly retention rate of 97.16%. We can, therefore, assume at this point, that those who spend the most money are not the most loyal. These results help us to see that half of the Corporate segment customers cancel their subscriptions before the end of the first year of subscription. 

Churn and Defection

Table 1 - Churn and Defection

Estimated Defection Time

Figure 2 - Estimated Defection Time

Cumulative Retention Function

Figure 3 - Cumulative Retention Function

The first columns of table 2 summarise the actions of customers. We may therefore see each month the number of customers who have cancelled their subscriptions, the number of consumers who are no longer being observed (censored), and the retention and churn rates that correlate.

Customer Lifetime Value Analysis

Table 2 - Customer Lifetime Value Analysis

The cumulative churn function provides the proportion of lost customers from the study's inception. Let's examine the Developer sector, in particular the period [5.6], which corresponds to the sixth month following subscription. The cumulative churn function has a value of 27.9%. This indicates that over a quarter of users cancelled their subscriptions within the first six months of signing up.

The probability density function indicates the likelihood that a subscriber is retained for the first t-1 period and then cancels in period t. The probability that a subscriber in this segment will cancel their membership during the sixth month is 0.025 (2.5%). The hazard rate is the likelihood that a client will cancel at time t if they have not already cancelled. The likelihood that a Developer segment client would terminate their membership during the sixth month is 4.2% if they were a customer during the previous month. CLV (per period) and CLTV estimations are displayed in the final two columns. We can observe that a Developer segment user who quits their membership in the sixth month would have an average CLTV of £386.78.

Probability Density and Hazard Rate

Table 3 - Probability Density and Hazard Rate

Table 4 is likewise presented as a result of analysing the comparison of segments. It includes the outcomes of three distinct tests: the Log-rank test, the Wilcoxon test, and the Tarone-Ware test. These tests rely on the Chi-square statistic. The greater the significance of the differences between segments, the lower the p-value. Evidently, none of the tests are significant at the alpha = 5% criterion.

Segment Comparison

Table 4 - Segment Comparison


In conclusion, calculating customer lifetime value is crucial for online subscription businesses to optimise their marketing strategies and improve customer retention rates. By understanding the CLV of different segments of customers, businesses can allocate resources more effectively and tailor their marketing strategies to target customers with a higher CLV. Using machine learning and predictive analytics can significantly improve the accuracy of CLV calculations, and businesses should consider investing in these technologies to gain a competitive advantage. Overall, the study of CLV is an essential aspect of online subscription businesses, and further research in this area can lead to improved business performance and customer satisfaction.

Recommendations for Future Study

Recommendations for further study in the area of online subscription customer lifetime value analysis could include a more in-depth examination of the different metrics that can be used to calculate CLV, such as purchase frequency and customer retention rates. Additionally, further research could be conducted to identify the best marketing strategies for retaining subscribers and methods to predict customer churn. Further exploration of machine learning and predictive analytics could also be beneficial to develop more accurate and precise models for predicting CLV. It would also be valuable to compare the results of CLV analyses across different industries and business models to identify commonalities and differences in customer behaviour and trends.