# Theta Glossary

#### Talk About CLV and CBCV

CLV and CBCV draw from the worlds of data science, analytics, finance and marketing, all of which have a language of their own. This glossary contains some of the common terms you may encounter in discussions about CLV and CBCV.

- AlgorithmA mathematical formula or statistical process used to analyze data.
- Alpha RiskThe maximum probability of making a Type I error. This probability is established by the experimenter and often set at 5%.
- Alternative Hypothesis Statement of a change or difference; assumed to be true if the null hypothesis is rejected.
- Analytics The systematic analysis of data. Analytics applications and toolkits contain mathematical algorithms and computational engines that can manipulate large datasets to uncover patterns, trends, relationships, and other intelligence that allow users to ask questions and gain useful insights about their business, operations, and markets.
- Anomaly DetectionThe process of identifying rare or unexpected items or events in a dataset that do not conform to other items in the dataset and do not match a projected pattern or expected behavior. Anomalies are also called outliers, exceptions, surprises or contaminants and they often provide critical and actionable information.
- AnonymizationMaking data anonymous; severing of links between people in a database and their records to prevent the discovery of the source of the records.
- ANOVAOne-way ANOVA is a generalization of the 2-sample t-test, used to compare the means of more than two samples to each other.
- ANOVA Table The ANOVA table is the standard method of organizing the many calculations necessary for conducting an analysis of variance.
- Artificial Intelligence The apparent ability of a machine to apply information gained from previous experience accurately to new situations in a way that a human would.
- AI The apparent ability of a machine to apply information gained from previous experience accurately to new situations in a way that a human would.
- Batch ProcessingBatch data processing is an efficient way of processing high volumes of data where a group of transactions is collected over a period of time.
- Bayes TheoremA theorem based on conditional probabilities. It uses relevant evidence, also known as conditional probability, to determine the probability of an event, based on prior knowledge of conditions that might be related to the event.
- Behavioral Analytics Using data about people’s behavior to understand intent and predict future actions.
- Behavioral Data Behavioral data provides information about a customer’s interaction with your business.
- Behavioral MarketingCreates more targeted and personalized offers to customers and prospective customers based on the knowledge of actions they previously performed.
- Beta Distribution A family of continuous probability distributions set on the interval [0, 1] having two positive parameters denoted by alpha (α) and beta (β). These parameters appear as exponents of the random variable and control the shape of the distribution.
- Beta-Geometric/NBD ModelModels the dropout process as a geometric distribution with a beta mixing distribution, and models the purchase frequency process as a negative binomial distribution.
- Beta RiskThe risk or probability of making a Type II error.
- Big Data Extremely large datasets of structured, unstructured, and semi-structured data that are often characterized by the five Vs: volume of data collected, variety of data types, velocity at which the data is generated, veracity of the data, and value of it.
- Binary Logistic RegressionY variable takes on one of two outcomes (levels), e.g. pass/fail, agree/disagree.
- BI The general term used for the identification, extraction and analysis of data.
- Business Intelligence The general term used for the identification, extraction and analysis of data.
- Business Valuation The process of determining the economic value of a business; used to determine the fair value of a business for a variety of reasons, such as sale value and mergers and acquisition.
- Buy Till You Die A model that helps explain the buying patterns of non-contractual customers by describing the rate at which customers make purchases and the rate at which they drop out or “die.” BTYD models all jointly model two processes: (1) a repeat purchase process, which explains how frequently customers make purchases while they are still "alive"; and (2) a dropout process, which models how likely a customer is to churn in any given time period. Common versions of the BTYD model include the Pareto/NBD model and the Beta-Geometric/NBD model.
- BTYD A model that helps explain the buying patterns of non-contractual customers by describing the rate at which customers make purchases and the rate at which they drop out or “die.” BTYD models all jointly model two processes: (1) a repeat purchase process, which explains how frequently customers make purchases while they are still "alive"; and (2) a dropout process, which models how likely a customer is to churn in any given time period. Common versions of the BTYD model include the Pareto/NBD model and the Beta-Geometric/NBD model.
- Buying Behavior The series of actions and interactions that a consumer performs before, during, and after making a commercial transaction.
- Churn The rate at which customers stop purchasing or doing business with your company. It is sometimes called customer attrition, customer turnover or customer defection. To calculate the rate of customer churn, divide the number of churned customers (over a given period) by the initial number of total customers: Churn rate = Churned Customers / Total Initial Customers
- Classification Analysis A systematic process for obtaining important and relevant information about data and assigning data to a particular group or class.
- Clickstream Analytics The analysis of web activity by users through the items they click on a page.
- Clinical Data Repository An aggregation of granular patient-centric health data usually collected from multiple-source IT systems and intended to support multiple uses. When a CDR holds data specifically organized for analytics it meets the definition of a clinical data warehouse.
- CDR An aggregation of granular patient-centric health data usually collected from multiple-source IT systems and intended to support multiple uses. When a CDR holds data specifically organized for analytics it meets the definition of a clinical data warehouse.
- Closed-loop Action A Voice of the Customer (VOC) term that refers to the act of following up with customers who have provided either extremely positive or negative feedback.
- Closed-loop Feedback The process of acting on direct or indirect customer input collected from sources such as customer surveys, contact center interactions, and social media comments.
- Clustering Analysis The process of identifying objects that are similar to each other and clustering them in order to understand the differences as well as the similarities within the data.
- CLV to CACCLV to CAC is the ratio of Customer Lifetime Value to Customer Acquisition Cost. This metric helps you optimize your marketing campaigns to acquire profitable customers.
- Coefficient of Variation A standardized measure of dispersion of a probability distribution or frequency distribution.
- Cohort A group of customers that shares some relevant acquisition characteristic(s). In virtually every case, cohorts are defined primarily by the time period of acquisition. A cohort definition can also include other acquisition-related characteristics, such as channel, promotional campaign, product purchased, etc.
- Common Data Element Standardized, precisely defined questions paired with a set of specific allowable responses, used systematically across different sites, studies, or clinical trials to ensure consistent data collection. CDEs may consist of a single data element, such as height, gender, or date of birth, or a collection of connected questions, such as a survey instrument used as a depression index or a quality-of-life scale.
- CDE Standardized, precisely defined questions paired with a set of specific allowable responses, used systematically across different sites, studies, or clinical trials to ensure consistent data collection. CDEs may consist of a single data element, such as height, gender, or date of birth, or a collection of connected questions, such as a survey instrument used as a depression index or a quality-of-life scale.
- Comparative Analysis Data analysis that compares two or more data sets or processes to detect patterns within very large data sets.
- Confidence Interval A range of values which is likely to contain the population parameter of interest with a given level of confidence.
- Continuous Data Data from a measurement scale that can be divided into finer and finer increments such as temperature, time or pressure.
- Correlation Analysis A means to determine a statistical relationship between variables, often for the purpose of identifying predictive factors among the variables. A technique for quantifying the strength of the linear relationship between two variables.
- Covariate A variable that may affect the result of what is being studied.
- Customer Acquisition Cost The total cost of acquiring a new customer. To calculate average CAC, total up all of the costs that go into your customer acquisition tactics for a given period. Then divide by the number of customers acquired during that period. CAC can also be calculated for individual customers.
- CAC The total cost of acquiring a new customer. To calculate average CAC, total up all of the costs that go into your customer acquisition tactics for a given period. Then divide by the number of customers acquired during that period. CAC can also be calculated for individual customers.
- Customer-Base Audit A systematic review of the buying behavior of a firm’s customers using data captured by its transaction systems. It provides an understanding of how customers differ in their buying behavior and how their buying behavior evolves over time.
- Customer Base Value Analysis A process that employs various CLV models and customer behavior data to assess the lifetime value of a customer or cohort and the overall health of a company’s customer base. (See also Predictive Customer Base Analysis.)
- Customer-Based Corporate ValuationA method that uses customer metrics to assess a firm’s underlying value; exploits basic accounting principles to make revenue projections from the bottom up instead of from the top down.
- CBCV A method that uses customer metrics to assess a firm’s underlying value; exploits basic accounting principles to make revenue projections from the bottom up instead of from the top down.
- Customer Centricity Identifying your most valuable customers, then focusing most of your firm’s efforts on them and finding more customers like them.
- Customer Engagement Refers to the level of interaction between your brand and your customers across all channels over the course of your entire relationship. It can refer to what customers do to interact with your brand, as well as, how loyal they are towards your brand.
- Customer Experience The customer’s perception of their interactions with your business, including all the actions, messages, and engagement you perform across all customer touchpoints.
- CX The customer’s perception of their interactions with your business, including all the actions, messages, and engagement you perform across all customer touchpoints.
- Customer Journey The full-cycle relationship between a business and its customers, from brand awareness to sales.
- Customer Lifecycle The various stages a consumer goes through before, during and after they complete a transaction. It can also refer to the phases a customer passes through during the course of an ongoing relationship with a brand.
- Customer Lifetime Value The net present value of all variable profits and costs associated with a customer, inclusive of customer acquisition costs, until the customer ends his/her relationship with the firm. The discount rate should be equal to the relevant cost of capital associated with the business unit the customer was acquired into.
- CLV The net present value of all variable profits and costs associated with a customer, inclusive of customer acquisition costs, until the customer ends his/her relationship with the firm. The discount rate should be equal to the relevant cost of capital associated with the business unit the customer was acquired into.
- Customer Loyalty The measure of a customer’s likelihood to do repeat business with a company or brand.
- Customer Retention A measure of an organization’s ability to keep customers.
- Customer Segmentation The process of segmenting individuals into groups based on common characteristics and attributes.
- Dark Data Information assets that organizations collect, process and store during regular business activities, but generally fail to use for other purposes such as analytics.
- Dashboard A graphical representation of analyses performed by algorithms.
- Data Aggregation The process of collecting data from multiple sources for the purpose of reporting or analysis.
- Data Analyst A person responsible for the tasks of modeling, preparing and cleaning data for the purpose of deriving actionable information from it.
- Data Architecture The overall design for the structure, policies, and rules that define an organization’s data and how it will be used and managed.
- Data Cleansing The process of removing or correcting errors from a dataset, table, or database. Also called data scrubbing, the process finds duplicate data and other inconsistencies, like typos and numerical sets that don’t add up.
- Data Footprint The total amount of storage space across all storage systems required to accommodate an organization’s full collection of data. Also known as Storage Footprint.
- Data Governance A framework of people, policies, processes and technologies that define how you manage your organization’s data.
- Data Hygiene The ongoing processes to ensure data is free of errors and in a format that enables it to be used for a variety of purposes.
- Data Identification The process of locating data – structured, unstructured or semi-structured ─ and specifying its key qualities that identify what it is.
- Data Indexing A process for organizing data to maximize a query’s efficiency while searching.
- Data Integration The process of combining data from different sources into a single, unified view. Integration begins with the ingestion process, and includes steps such as cleansing, ETL (extract, transform, load) mapping, and transformation.
- Data Integrity The overall accuracy, completeness, and consistency of data. Data integrity also refers to how well the data complies to regulatory requirements and security standards.
- Data Lake A vast pool of raw data — unstructured, semi-structured, or structured — in an organization whose purpose is not yet defined, either at the company or department level. There are no limits on data sizes, files, sources, or structure; data is kept in its original format and machine-to-machine data logs flow in real time.
- Data Mining The act of extracting useful information from large datasets. Data mining is often done by business users employing analytics tools to uncover patterns, trends, anomalies, relationships, dependencies, and other useful intelligence.
- Data Orchestration The process of taking siloed data from multiple data storage locations, combining and organizing it, and making it available for data analysis tools.
- Data Pipeline A set of tools and processes used to automate the movement and transformation of data between a source system and a target repository.
- Data Preparation A task that prepares data for analytics. It involves combining data from various sources, then cleansing and transforming it. If done via a self-service interface, business users can access and manipulate the data they need with minimal training – and without asking IT for help.
- Data Privacy The policies and practices for handling data in ways that protect it from unauthorized access or disclosure.
- Data Repository A place to hold data, make data available for use, and organize data in a logical manner. Data repositories may have specific requirements concerning subject or research domain; data re-use and access; file format and data structure; and the types of metadata that can be used.
- Data Retention Storing of information for a specified period. Data retention is primarily relevant to businesses that store data to service their customers and comply with government or industry regulations.
- Data Science A discipline that incorporates statistics, data visualization, computer programming, data mining, machine learning and database engineering to solve complex problems.
- Data Scientist A person employed to analyze and interpret complex data, especially for assisting a business in its decision-making.
- Data Security The tools, technologies, tactics and other methods used to keep data safe from harm, alteration, unauthorized access or exposure, disaster, or system failure, and, at the same time, readily accessible to legitimate users and applications.
- Data Sharing The ability and practice of distributing the same sets of data resources with multiple users or applications while maintaining data accuracy across all entities consuming the data.
- Data Silo The existence of data that is maintained and used by one group or department, but which is not easily or fully accessible by other groups in the same organization.
- Data Storage Covers the implementation and maintenance of the physical hardware or cloud-based infrastructure you use to collect, store, and manage your data, such as servers, data management platforms, data warehouses, and data lakes.
- Data Warehouse A central repository for structured, filtered data in an organization or department that has already been processed for a specific purpose (is currently in use). Data is stored in files and folders that help organize the data, provide a multidimensional view, and support strategic decision-making.
- Data Wrangling The process of taking raw data and transforming it into a format that is compatible with established databases and applications. The process may include structuring, cleaning, enriching, and validating data as necessary to make raw data useful.
- Database A facility for organizing, storing, managing, safeguarding, and controlling access to data. Common types of databases include relational database management systems (RDBMS), in-memory databases, object-oriented databases (OODBMS), NoSQL databases, and SQL databases – each with its own advantages.
- Demographic Data Data relating to the characteristics of a human population.
- Descriptive Analytics Condensing big numbers into smaller pieces of information. This is similar to summarizing the data story. Rather than listing every single number and detail, there is a general theme and narrative.
- Diagnostic Analytics Reviewing past performance to determine what happened and why. Businesses use this type of analytics to complete root cause analysis.
- Direct to Consumer A sales strategy where manufacturers and CPG (consumer packaged goods) brands sell their products directly to their customers instead of selling them through retailers and wholesalers.
- DTC A sales strategy where manufacturers and CPG (consumer packaged goods) brands sell their products directly to their customers instead of selling them through retailers and wholesalers.
- Discrete Data Data which is not measured on a continuous scale. Examples are binomial (pass/fail), Counts per unit, Ordinal (small/medium/large), and Nominal (red/green/blue). Also known as attribute or categorical data.
- Discriminant Analysis A statistical analysis technique used to predict cluster membership from labeled data.
- Due Diligence Reasonable steps taken to satisfy a legal requirement, especially in buying or selling something; the process a potential investor engages in to assess the desirability, value, financial viability and potential of an investment before committing capital.
- Empirical Model An equation derived from the data that expresses a relationship between the inputs and an output (Y=f(x)).
- Event A set of outcomes of an experiment to which a probability is assigned.
- Evidential data Data that is used as evidence to support a specific belief or proposition. It can be analyzed, presented, converted, etc. to validate, prove, or disprove a specific position.
- Exit Strategy A plan for a venture capitalist or private equity firm to get out of financial investment or sell off tangible business assets once certain predetermined conditions have been met.
- Exploratory Analysis An approach to data analysis focused on identifying general patterns in data, including outliers and features of the data that are not anticipated by the experimenter’s current knowledge or preconceptions.
- Extract, Load, Transform The processes a data pipeline uses to replicate data from a source system into a target system such as a data warehouse. It’s a modern variation of the older process of extract, transform, and load (ETL), in which transformations take place before the data is loaded
- ELT The processes a data pipeline uses to replicate data from a source system into a target system such as a data warehouse. It’s a modern variation of the older process of extract, transform, and load (ETL), in which transformations take place before the data is loaded
- ETL A three-phase process in which data is extracted, transformed and loaded into an output data container.
- Extract, Transform, Load A three-phase process in which data is extracted, transformed and loaded into an output data container.
- F-test A hypothesis test for comparing variances.
- Finite-horizon CLV A set period of time for estimating the value of a customer.
- Fit The average outcome predicted by a model. (See also overfitting and underfitting.)
- Gamma Distribution A family of continuous probability distributions defined in terms of two positive parameters, a scale parameter denoted by k and a shape parameter denoted by theta (θ). The exponential distribution, erlang distribution, and the chi-square distribution are all special cases of the gamma distribution.
- Heavy-Tailed Distribution Distributions whose tails aren’t exponentially bounded. Unlike the bell curve with a normal distribution, heavy-tailed distributions approach zero at a slower rate and can have outliers with very high values.
- Heuristically Involving or serving as an aid to learning, discovery, or problem-solving by experimental and trial-and-error methods.
- Histograms Representation of frequency of values by intervals.
- Historic CLV The sum of all the transactions made by a customer in the past. It can be calculated using this formula: Historic CLV = (Transaction 1+ Transaction 2 + Transaction 3….+ Transaction N) x AGM Where N is the latest transaction and AGM is the Average Gross Margin (your profit margin after considering the cost of producing it).
- Ideal Buyer Profile The ideal customer that your business can sell to; a fictitious persona or company profile whose pain points precisely match the ones that your product seeks to solve. Also known as an Ideal Buyer Profile.
- Ideal Customer Profile The ideal customer that your business can sell to; a fictitious persona or company profile whose pain points precisely match the ones that your product seeks to solve. Also known as an Ideal Buyer Profile.
- Key Performance Indicator A metric used to periodically track and evaluate performance toward the achievement of a specific objective or target.
- KPI A metric used to periodically track and evaluate performance toward the achievement of a specific objective or target.
- Linear Regression A type of regression analysis used to analyze the direct association between a dependent variable that must be continuous and one or more independent variable(s) that can be any level of measurement, nominal, ordinal, interval, or ratio. A linear regression tests the changes in the mean of the dependent variable by the predictors included in our model, the independent variable(s).
- Logistic Regression Investigates the relationship between response (Ys) and one or more predictors (Xs) where Ys are categorical, not continuous and Xs can be either continuous or categorical. Types of logistic regression are:Binary logistic regression: The response variable can only belong to one of two categories.Multinomial logistic regression: The response variable can belong to one of three or more categories and there is no natural ordering among the categories.Ordinal logistic regression: The response variable can belong to one of three or more categories; there is a natural ordering among the categories.
- Lookalike Audience A group of people on a social network that are similar to a group of your existing customers, based on factors like demographics, behavior, location, etc.
- LTV The lifetime spend of customers in aggregate. LTV is an aggregate metric, unlike CLV, which is calculated at the individual-customer level.
- Lifetime Value The lifetime spend of customers in aggregate. LTV is an aggregate metric, unlike CLV, which is calculated at the individual-customer level.
- Machine Learning A type of artificial intelligence that provides computers the ability to learn from new data without being explicitly programmed. The computer can identify patterns of behavior that a human would likely not see.
- ML A type of artificial intelligence that provides computers the ability to learn from new data without being explicitly programmed. The computer can identify patterns of behavior that a human would likely not see.
- Marketing Attribution The process of associating specific marketing activities to customer sales and conversions.
- Mean The average of a data set.
- Median The middle of the set of numbers.
- Mixture Density A probability density function that is parameterized by two or more density functions and mixture parameters.
- Mode The most common number in a data set.
- Naïve BayesA classification technique based on Bayes Theorem with an assumption of independence among predictors. In simple terms, it assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.
- Negative Binomial Distribution A negative binomial distribution (also called the Pascal Distribution) is a discrete probability distribution for random variables in a negative binomial experiment.
- Net Promoter Score The industry standard measure of customer experience (CX) and customer loyalty.
- NPS The industry standard measure of customer experience (CX) and customer loyalty.
- Nominal Logistic Regression Models the relationship between a set of predictors and a nominal response variable. A nominal response has at least three groups that don’t have a natural order.
- Nominal Variable A variable that has two or more categories, but there is no intrinsic ordering to the categories. Also known as Categorical Variable.
- Normal Distribution A probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean; the normal distribution appears as a bell curve.
- NoSQL Database Developed for handling unstructured data that SQL cannot support because of the lack of structure.
- H0 Statement of no change or difference; assumed to be true until sufficient evidence is presented to reject it.
- Null Hypothesis Statement of no change or difference; assumed to be true until sufficient evidence is presented to reject it.
- Object Storage Data storage that manages data as objects, as opposed to other storage architectures like file systems that manage data as a file hierarchy, and block storage which manages data as blocks within sectors and tracks. Also known as S3 or S3-compatible storage. S3 is from Amazon Simple Storage Service, which is the most well-known object storage system. S3-compatible storage means that the storage employs the S3 API as its “language.”
- Omnichannel Marketing The seamless integration of online and offline marketing channels a company uses to interact with customers.
- One Sample T-test Statistical test to compare the mean of one sample of data to a target. Uses the t-distribution.
- Optimization Analysis The process of finding optimal problem parameters subject to constraints. Optimization algorithms heuristically test a large number of parameters. configurations in order to find an optimal result, determined by a characteristic function (also called a fitness function).
- Ordinal Logistic Regression A statistical analysis method that can be used to model the relationship between an ordinal response variable and one or more explanatory variables.
- Ordinal Variable Similar to a nominal variable but has a clear ordering of the categories.
- Outlier Detection An object that deviates significantly from the general average within a dataset or a combination of data; numerically distant from the rest of the data and therefore indicates something unusual and generally requires additional analysis.
- Overfitting Producing an analysis that corresponds too closely or exactly to a particular set of data, so that it may fail to reliably fit to additional data or predict future observations.
- Paired T-test A test used to compare the average difference between two samples of data that are linked in pairs. Special case of the 1-sample t-test. Uses the t-distribution.
- Pareto Distribution Skewed distribution that has a ski slope shape — right skewed where mode and minimum are equal; has the longest tail of all probability distributions. It’s used to model any variable that has a minimum value and for which the probability density decreases geometrically towards zero.
- Pareto/GGG Model A variation of the Pareto/NBD model that accounts for some level of regularity for inter-transaction times.
- Pareto/NBD Model Models the dropout process as a Pareto Type II distribution and the purchase frequency process as a negative binomial distribution.
- Pareto Type II Distribution A heavy-tail probability distribution used in business, economics, actuarial science, queueing theory and Internet traffic modeling; essentially a Pareto distribution that has been shifted so that its support begins at zero. Also known as a Lomax Distribution.
- Pattern Recognition Identifying patterns in data via algorithms to make predictions about new data coming from the same source.
- Persona A representation of a group of leads or customers with similar characteristics, features, and/or behaviors.
- Personalization The process of tailoring communications or customer experiences taking into account the unique aspects of each individual.
- Personally identifiable information Information (data) that can be used to verify a person's identity.
- PII Information (data) that can be used to verify a person's identity.
- Poisson Distribution A discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event.
- PAV The net present value of variable profits, ignoring customer acquisition costs (CAC).
- Post-acquisition value The net present value of variable profits, ignoring customer acquisition costs (CAC).
- Power (1-beta) The ability of a statistical test to detect a real difference when there is one; the probability of correctly rejecting the null hypothesis. Determined by alpha and sample size.
- Predictive Analytics A form of advanced analytics which examines data or content to answer the question “What is going to happen?” or more precisely, “What is likely to happen?,” and is characterized by techniques such as regression analysis, forecasting, multivariate statistics, pattern matching, predictive modeling, and forecasting.
- Predictive Customer-Based Analysis A predictive analytics process that employs various customer lifetime value (CLV) models and customer behavior data to assess CLV and the overall health of a company’s customer base, as well as predict the future behaviors of a customer or cohort.
- Predictive Modeling The process of developing a model that will most likely predict a trend or outcome.
- Prescriptive Analytics Builds on predictive analytics by including actions to make data-driven decisions by looking at the impacts of various actions.
- Probabilistic Modeling A statistical approach that uses the effect of random occurrences or actions to forecast the possibility of future results.
- Probability Distribution A statistical function that describes all the possible values and likelihoods that a random variable can take within a given range. Probability distributions may be discrete or continuous.
- Product Centricity The practice of bringing new products to market and finding customers that want to buy them (as opposed to customer centricity.)
- Psychographics Psychographics can be described as the study of consumers based on their ideas, actions, and interests.
- Purchase Behavior The series of actions and interactions that a consumer performs before, during, and after making a commercial transaction.
- Qualitative data A type of data that cannot be counted, measured or easily expressed using numbers, mostly reflected textually.
- Quantitative data Information that can be counted or measured, in other words numerically recognized and analyzed.
- Query A request input by a user and executed by a database management system (DBMS). It can be a request for data results from a database, action on the data or both.
- Range Difference between the largest and smallest measurement in a data set.
- Real-Time DataData that is created, processed, stored, analyzed and visualized within milliseconds.
- Recency, Frequency, and Monetary value A simple method to determine customer value:Recency: How long ago did a customer purchase?Frequency: How often/consistently does a customer purchase?Monetary: How much does a customer spend on average?
- RFM A simple method to determine customer value:Recency: How long ago did a customer purchase?Frequency: How often/consistently does a customer purchase?Monetary: How much does a customer spend on average?
- Regression Analysis Statistical processes for estimating the relationships between a dependent variable and one or more independent variables; can be used to explain the past and predict future events; the most common form is linear regression.
- RDBMS The software used to store, manage, query, and retrieve data stored in a relational database, where the data points are related to one another. A RDBMS is an advanced version of a database management system where data is typically stored in the form of tables rather than files.
- Relational Database Management System The software used to store, manage, query, and retrieve data stored in a relational database, where the data points are related to one another. A RDBMS is an advanced version of a database management system where data is typically stored in the form of tables rather than files.
- Residual The difference between reality (an actual measurement) and the fit (model output).
- RLV The amount of additional value we expect to collect from a customer over a specific and defined time period.
- Residual Lifetime Value The amount of additional value we expect to collect from a customer over a specific and defined time period.
- Scalability The ability of a system or process to maintain acceptable performance levels as workload or scope increases.
- Semi-structured data Information that doesn’t reside in a relational database but that has some organizational properties that make it easier to analyze. With some processes, it may be possible to store it in a relational database. For example, an email or a text file.
- Sentiment Analysis The application of statistical functions and probability theory to comments people make on the web or social networks to determine how they feel about a product, service or company.
- SCV A complete set of the data an organization has about a customer, which can be accessed in a single location.
- Single Customer View A complete set of the data an organization has about a customer, which can be accessed in a single location.
- Single-variance Test Compares the variance of one sample of data to a target. Uses the Chi-square distribution.
- SaaS A software licensing and delivery model in which a software provider delivers an application to users on the internet via a website or app. Unlike traditional software products, SaaS software is licensed on a subscription basis and is centrally hosted.
- Software-as-a-service A software licensing and delivery model in which a software provider delivers an application to users on the internet via a website or app. Unlike traditional software products, SaaS software is licensed on a subscription basis and is centrally hosted.
- SQL Database A relational database that stores data in tables and rows. Data items (rows) are linked based on common data items to enable efficiency, avoid redundancy, and facilitate easy, flexible retrieval.
- Stochastic Model A model for estimating probability distributions of potential outcomes by allowing for random variation in one or more inputs over time.
- Structured Data Data that can be neatly formatted into rows and columns and mapped to predefined fields. It’s typically stored in Excel spreadsheets or relational databases.
- T-Distribution A way of describing a set of observations where most observations fall close to the mean, and the rest of the observations make up the tails on either side; used for smaller sample sizes, where the variance in the data is unknown.
- T-Test A statistical test that compares the means of two samples.
- Test for Equal Variance Compares the variance of two samples of data against each other. Uses the F distribution.
- Time Series Analysis Analysis of well-defined data measured at repeated measures of time to identify time-based patterns.
- Time Series Model A specific way of analyzing a sequence of data points collected over an interval of time; data points are recorded at consistent intervals over a set period of time rather than just recording the data points intermittently or randomly.
- Transaction Log A record of all transaction activity.
- Transactional Data Data that relates to the conducting of business, such as accounts payable and receivable data or product shipments data.
- Two Sample t-test A statistical test to compare the means of two samples of data against each other. Uses the t-distribution.
- Type I Error The error that occurs when the null hypothesis is rejected when, in fact, it is true.
- Type II Error The error that occurs when the null hypothesis is not rejected when it is, in fact, false.
- Underfitting Describe a data model that can’t interpret the correlation between input and output variables; most often occurs when there is insufficient data or the wrong type of data for the task at hand.
- Unstructured Data Data that can’t be organized into rows and columns or processed and analyzed via conventional tools and methods. It is stored in its native format, and is often qualitative rather than quantitative, such as documents, web pages, email, social media content, mobile data, images, audio, video, and more.
- Valuation The process of determining the worth of an asset or company.
- Variance The average squared deviation for all values from the mean.
- Voice of the Customer The process of collecting and analyzing customer input to determine their needs, preferences, and expectations about a brand’s products, services, and experiences.
- VOC The process of collecting and analyzing customer input to determine their needs, preferences, and expectations about a brand’s products, services, and experiences.

If you have questions about these terms or any other, Theta has the answers. Submit your questions or contact us to discuss how we can put a CBCV to work for you.