Not all CLV models are the same (series)

This is the first in a series of articles where I will share a little “inside baseball” on how our data science team here at Theta approaches the CLV calculations in a unique way that goes beyond the open-source models. In this first article I’ll tee up the topics we’ll cover in subsequent posts.

Written by Ethan Anderson, Data Scientist, Theta | September 2022


If you work for a private equity firm assessing an investment opportunity or a business aiming to better understand your customer base for a forward-looking strategy, understanding customer lifetime value (CLV) is an invaluable tool. But be aware that, although the CLV metric is widely recognized and used by many businesses to gain insight into customer value, not all CLV models are the same. 

When an analyst or data scientist is tasked with measuring CLV, there are many open-source software packages using standard probabilistic models built on the Buy Til You Die (BTYD) framework, such as the Pareto-NBD model for repeat purchasing and the Gamma-Gamma model for spend. These models demonstrate reasonably good predictive accuracy with academic or straightforward, plain-vanilla business challenges. However, while we certainly advocate “shaving with Occam’s Razor”, we have found, throughout the hundreds of companies on which we’ve run these models, that we see better – oftentimes significantly better –  results using our large and growing collection of model enhancements.  We’ve made these improvements over the years to more accurately and more robustly characterize a customer’s journey throughout their lifetime with a company. These enhancements are critical to fully, consistently, and reliably understand and forecast their behavior and in turn the future value of a business.

One of the things I find most interesting about our approach at Theta is the way our models layer in a variety of additional data and business context to ensure the insights and stories we weave together give an accurate picture of where the value lies within a company’s customer base. I wholeheartedly believe that CLV analysts and forecasters are not “curve-fitters.” We are storytellers. If we neglect important details about how customers behave at a particular firm, we risk producing CLV estimates that misrepresent those customers (and ultimately, the aggregated insights that arise about the business as a whole).

Cross-Cohort Dynamics

One of the most foundational building blocks of understanding CLV is the core assumption that a customer base is heterogeneous. This manifests itself in a number of different ways, including differences across customers within a given cohort, and dynamics across customer segments and acquisition cohorts. As we start grappling with these issues, we need to answer numerous questions. How big must each cohort be? How much time should it cover? There are many more, and they are deceptively non-trivial to address. In contrast to open-source packages, this process is automated at Theta. Another important variable we consider is accurately forecasting the CLVs for recently acquired customers with a short transaction history. There is no one-size-fits-all solution to this problem and this is where open source packages leave you on your own.

Customer Lifecycle Effects

Understanding that not all CLV models are the same, it’s important to remember, as I mentioned before, that our goal as data scientists is not to be “curve-fitters,” but to be standout story-tellers who capture important details about a customer’s journey. These details show up in cohort or individual-level behavioral data. The problem with the basic BTYD approach on its own is that its baseline assumption holds a relatively limited view of a customer’s lifetime relationship with any brand or organization and does not capture all nuances of customer behavior. For example, customers may behave differently at different points of their lifecycle. One of the more unique improvements over publicly-available models that we employ is the inclusion of more customer-specific covariates in our framework. Common BTYD models such as the Pareto/NBD and BG/NBD suffer from computational inefficiencies or the lack of expressions to introduce time varying covariates respectively, making the introduction of complex, customer-base driven covariates intractable. Using our custom BTYD model and robust covariates that are directly dependent on an individual customer or group of customers, we tackle customer dependent issues in a contextually meaningful way.

The problem with the basic BTYD (Buy Til You Die) approach on its own is that its baseline assumption holds a relatively limited view of a customer’s lifetime relationship with any brand or organization and does not capture all nuances of customer behavior.

Seasonality and Time Varying Covariates

Many businesses have some sort of seasonality such as holiday sales or seasonal demand spikes or falls. Most open-source CLV modeling packages stick pretty tightly to the baseline BTYD structure and don’t allow for the incorporation of other useful behavioral information through time-varying covariates. We think it’s critical to capture such effects and use an array of bespoke, time-varying covariates that go beyond capturing simple seasonality or one-off spikes and can be tailored to fit the data at hand.

Common Demand Shocks

Seasonality tells only a small part of the granular customer-level behavioral story. Major irregular events such as recessions, the recent pandemic, and other marketplace changes are equally important to understand and model properly. Incorporation of time-varying and time-invariant covariates can be used to capture the repeat purchasing curve during times of temporarily elevated or depressed customer behavior from a macro scale event. This introduces significant computational complexity, however, and requires these unique covariates to be applied in the right way. Our custom BTYD model allows us to not only efficiently account for these macro-driven anomalies, but also gain insights into how they may affect customer behavior for the long term.

Greater Numerical Robustness and Computational Speed

As mentioned above, our custom BTYD model approach leads to more computationally efficient and robust procedures. Beneath the model framework itself, we implement various mathematical and infrastructure optimizations that put our models on a computationally higher level. The majority of publicly available packages for forecasting CLV are written entirely in the R programming language. Though R is a versatile and powerful language in its own right, refactoring computationally expensive sections of the codebase using clever mathematical tricks and a higher performance language has increased the speed of our estimation more than ten-fold compared to full R counterparts. These enhancements have allowed us to scale the models to run for Fortune 100 companies. 

Enhanced Model Estimation

For all open source packages, Maximum Likelihood Estimation (MLE) carries all of the weight of model estimation. Though MLE can be an efficient procedure to simultaneously get all of the parameters of a repeat purchase or spend model, there are cases when it struggles with aggregated metrics of interest in the forecast period. Given this is the primary metric for understanding the predictive capabilities of a model, it may sometimes be appropriate to adjust model estimation procedures based on what we know about the data and our models.

Spend Dynamics

I’ve shared how important it is not to just fit a tracking curve and assume it will adequately describe an entire customer base’s behavior. This is no less true when forecasting spend than it is for repeat purchasing. The primary sub-model for forecasting the spend of a customer base used by open source packages is the gamma-gamma (GG) model. The GG historically does a good job at capturing the average transaction value observed across a customer base and projecting this forward. However, there are several key pitfalls of the GG model, including the inability to introduce covariates. We have made additional improvements in the spend models used by the publicly available packages to allow for the incorporation of both time-varying and time-invariant covariates that work around the base model’s pitfalls. 


So, because not all CLV models are the same, we have developed many critical enhancements to typical BTYD frameworks with custom models for purchasing and spend that more accurately and robustly characterize a customer’s journey throughout their lifetime with a company. Our framework introduces bespoke covariates to both purchasing and spend that tell a realistic story regarding a customer’s behavior while staying computationally efficient.  In future posts we will continue to dive deeper into these enhancements.