Integrating Experimentation into Marketing Measurement

Table of Contents

Introduction

Understanding advertising effectiveness is crucial for any marketing strategy because it directly impacts resource allocation, campaign optimization, and overall return on investment (ROI). By measuring how well advertisements perform, marketers can determine which messages resonate with their target audience, identify underperforming channels, and refine their creative approach to boost engagement. Effective ad analysis also helps pinpoint the ideal balance between reach, frequency, and targeting precision, ensuring that budgets are not wasted on ads that fail to drive revenue. Moreover, it provides valuable insights into consumer behavior, helping businesses adjust to changing preferences and trends. Ultimately, understanding ad effectiveness enables data-driven decision-making, empowering marketers to create more impactful campaigns that achieve measurable outcomes and foster long-term brand growth.

Integrating experimentation into marketing measurement is one of the most effective ways to achieve advertising effectiveness. You can optimize resource allocation and improve ROI by embedding controlled experiments, such as AB tests or randomized controlled trials (RCTs), into your marketing processes and analytics. In a recent study, those advertisers on an online advertising platform who used ad experiments for measurement saw substantially higher performance than those who did not. An e-commerce advertiser running 15 experiments (versus none) saw about 30% higher ad performance in the same year and 45% in the year after. While this evidence is correlational, it’s reasonable to assume that, in today’s data-driven landscape, experimentation, personalization, and automation are not just a best practice; they are becoming a competitive necessity.

However, integrating an experimentation strategy into marketing measurement can be complex, often requiring large-scale organizational changes and careful planning. This means clearly articulating objectives, establishing a hierarchy for measurement and analytics, selecting the right types of metrics, and determining a system of ground truths and methodologies. You must decide on your marketing and business goals, such as prioritizing ROI or top-line growth. By clearly understanding these goals, you can more effectively design experiments and integrate these with observational analytics to refine your strategies. This ensures that the integration of experimentation is not just a technical procedure but a crucial part of a larger, comprehensive strategy to achieve business success.

In this article, we provide high-level guidance on how you can succeed with integrating experimentation into your marketing measurement.

Why Experimentation is Necessary

In the 20th century, the field of marketing experienced a dramatic transformation driven by advancements in data collection, analytics, and communication technologies. Early in the century, marketing effectiveness was primarily assessed through anecdotal evidence and crude measures, such as sales increases and consumer feedback. The rise of mass media—newspapers, radio, and television—ushered in an era of broad audience outreach, leading to the development of audience metrics such as radio ratings and TV viewership statistics. The mid-century saw a growing interest in market research, with the establishment of industry giants like Nielsen providing quantitative insights into consumer behavior.

By the late 20th century, computers had revolutionized data analysis, enabling sophisticated consumer segmentation and predictive modeling. It became common practice to use econometric models to determine the relationship between the various factors in a marketing model.  In particular, the field of Observational Causal Inference (OCI) seeks to identify causal relationships from observational data when no experimental variation and randomization are present.

However, as two of the authors recently noted: “Despite its widespread use, a growing body of evidence indicates that OCI techniques often stray from correctly identifying true causal effects [in marketing analytics].[1] This is a critical issue because incorrect inferences can lead to misguided business decisions, resulting in financial losses, inefficient marketing strategies, or misaligned product development efforts.” One of the most common and longstanding OCI techniques in marketing measurement is media and marketing mix models (MMM).

In our recent note, we called on the business and marketing analytics community to embrace experimentation and to use experimental estimates to validate and calibrate OCI models. The community response was vivid, including a contextualizing piece on AdExchanger.

It should be pointed out that this is not a new observation. Many early papers in OCI advocated for experimental validation of modeling results. For example, Figure 1 shows the abstract from a paper by M. L. Vidale and H.B. Wolfe in 1957.

Figure 1. The abstract from "An Operations-Research Study of Sales Response to Advertising."
(Vidale and Wolfe)
Figure 1. The abstract from “An Operations-Research Study of Sales
Response to Advertising.” (Vidale and Wolfe)

What is new is that, in the modern internet era, wide-scale experimentation is now both possible and widely accessible. It’s still not easy, but it is doable.

Types of Experiments in Marketing

In its broadest sense, marketing experimentation refers to any intentionally designed intervention that can help marketers measure the effects of their actions. This includes deliberate variations in spend, share, allocation, or other strategic and tactical decisions made for the purpose of measurement.

For instance, a marketer might introduce intentional variation in daily or weekly spending for a specific channel to estimate its impact on outcomes. By analyzing how performance changes with these fluctuations, marketers can better isolate and quantify the channel’s true effect.

In more extreme cases, experimentation might involve “going dark”—completely halting marketing activity in a specific channel or geographic location. By observing the performance drop (or lack thereof) when marketing is paused, marketers can try to measure the incremental impact of that channel. While this approach can yield insights, it comes with risks (such as confounding variable bias), particularly in high-stakes environments where even short-term losses are undesirable. And it clearly is not an RCT where we know that effect estimates will be unbiased on average.

Tests with Treatment and Control Groups

Narrowing the focus, experimentation can be defined as specifically designed tests that involve treatment and control groups to estimate effects. Under this definition, experimentation encompasses a wide spectrum of tests, ranging from basic ad platform tools to more rigorous methodologies.

Many advertising platforms, like Google and Facebook/Meta, provide split (or A/B) testing tools. These often self-serve tools enable marketers to compare various tactics or creative assets without the need for control groups, using only exposed, non-overlapping audiences.  Split testing tools divide the audience into two or more groups, each receiving a different version of the ad. Marketers might also run simultaneous campaigns with varying parameters to observe performance differences.

While these tools can be useful for directional insights, split tests are typically used to optimize specific campaign elements because they fall short of delivering incremental measurements.

The Gold Standard: Randomized Control Trials (RCTs)

Randomized Control Trials (RCTs) are often called the gold standard of effectiveness research. In an RCT, ad exposure is fully randomized across users, with some users serving as a control group who do not see the ad or campaign being measured. This level of rigor ensures that the treatment effect (the ad’s impact) can be isolated and measured without bias on average.

RCTs are widely recognized as the most reliable method for causal inference. However, RCTs are often challenging to execute. Many marketers lack the ability to control ad exposure at the user level, particularly when working across multiple platforms or channels. Privacy regulations and restrictions on user-level data access have further complicated the implementation of RCTs in recent years.

Most ad platforms offer RCTs but sometimes these are not usable without dedicated support personnel (and they often require more effort to implement successfully).

A Practical Middle Ground: Cluster-Level Randomized Experiments

When user-level randomization is not feasible, cluster-level randomization and experiments can offer a practical alternative. In cluster-level randomization, the assignment of experimental ads is managed at broader levels, like geographic regions, rather than at the level of the individual user. With geo experiments, the most common type of cluster experiments, ad exposure is varied at a geographic level – such as ZIP codes, designated market areas (DMAs), or cities – rather than at the level of individual consumers. Some regions serve as test groups, receiving the ad campaign, while others act as controls.

Geo experiments allow marketers to measure the incremental impact of campaigns while avoiding some of the complexities of user-level RCTs. They are particularly valuable when privacy or technological restrictions limit access to granular user data or when there might be spillover effects (a spillover effect is an unintended impact of a marketing intervention or campaign on individuals, groups, or regions that were not directly targeted by the campaign. This can occur when the influence of an advertisement, message, or promotion “spills over” to adjacent groups or regions, leading to indirect exposure and potential behavior changes outside the intended treatment group). Figure 2 below provides an overview of the different types of experiments available to marketers in different situations (source: Figure 1 in this article):

Figure 2. Taken from "It’s time to close the experimentation gap in advertising:
Confronting myths surrounding ad testing."
Figure 2. Taken from “It’s time to close the experimentation gap in advertising:
Confronting myths surrounding ad testing.”

There is also another reason that clustered experimentation is sometimes desirable. Choosing a small sub-population, or experimenting within a restricted demographic or geography is often a way to mitigate perceived risk. If key stakeholders are uncomfortable experimenting on the entire population, or worried about the potential impact of spillover effects, isolating to a small sub-population can be a good compromise.

However, clustered experiments are not without challenges. They require careful planning, significant resources, and rigorous execution to ensure clean results. Marketers must account for regional differences, external factors, and spillover effects (where the impact of a campaign in one region influences neighboring regions). It can also be challenging to hold out large cities with attractive contiguous market areas from campaigns, making it challenging to create balanced test and control market groups.

Successful Experimentation Requires a Commitment Across the Organization

Organizational success with experimentation requires more than just tools and processes. Most of the time, it requires a cultural shift and support. Executives must encourage teams to test hypotheses, embrace failure as a learning opportunity, and prioritize data-driven decision-making. Executive buy-in is critical to ensure experimentation becomes a core part of your marketing strategy. Here are a set of essential steps that can help you succeed:

Staff and Endorse Marketing Analytics Appropriately

The foundation of a successful experimentation program lies in having the right people and organizational support. This starts with hiring a dedicated data scientist or analytics team with expertise in marketing measurement and experimental design and analysis. These experts will be responsible for designing, running, and analyzing experiments and ensuring that insights are actionable.

Equally important is securing executive endorsement. A dotted reporting line to a C-level executive can signal the strategic importance of marketing analytics and experimentation. This endorsement helps prioritize the initiative across the organization and ensures that resources are allocated effectively.

Foster a Culture of Experimentation

For experimentation to thrive, firms must embed it into their organizational culture. This means fostering curiosity, encouraging data-driven decision-making, and rewarding teams for testing assumptions – even when experiments don’t yield the desired outcomes.

Leadership plays a critical role in shaping this culture. By promoting the value of experimentation and celebrating learnings from both successes and failures, executives can inspire teams to embrace testing as a core part of their workflow.

Depending on the setup of your wider analytics organization and if there is a central experimentation team and platform, it can be wise to formally link the marketing analytics group up with the platform team. Research suggests that organizations with mostly decentralized decisions but a single authority that sets consistent implementation thresholds achieve more robust returns to experimentation. Experiment-based innovation and learning further thrive on cross-pollination, which the central team can facilitate.

One of the most challenging obstacles within an organization is overcoming the silos that exist between various departments, such as analytics, planning, strategy, marketing, finance, and leadership. These silos can hinder communication, collaboration, or the flow of information, ultimately impacting the organization’s ability to make data-informed decisions and execute effective strategies.

Commit to a Learning Agenda and Hold the Marketing Analytics Team Accountable

Bridging these departmental gaps requires a concerted effort to foster a culture of collaboration and open communication. One powerful approach to breaking down barriers is committing to a learning agenda that encourages cross-departmental engagement with shared objectives. By aligning all teams around common goals and promoting continuous learning, commitment to a joint learning agenda can be the single most important step in transforming organizational dynamics.

Ask the marketing analytics team to set clear objectives and a roadmap for experimentation. Every experiment should begin with a specific, measurable goal. The team needs to be able to answer questions like: What do we want to learn? What are the hypotheses we are testing? How will the results influence our decisions? How will we use the results in the wider measurement framework, e.g., to validate and calibrate OCI models? Clear objectives ensure that experiments are focused and actionable. They also help prioritize testing efforts, directing resources toward questions with the highest potential impact.

Create Feedback Loops

The true value of experimentation lies in its ability to inform decision-making. Firms need to establish feedback loops where insights from experiments inform future campaigns, strategies, and even the design of new experiments. Regularly reviewing and acting on experimental results, possibly following a fixed-timed process, ensures that insights drive tangible business outcomes. This iterative approach fosters continuous improvement and adaptation to changing market dynamics.

Tactics that Lead to Successful Experimentation

To integrate experimentation into marketing measurement effectively, marketing analytics teams must establish a clear framework that balances rigor and practicality. Here’s how marketers can get started:

Align Hypotheses, Objectives, and Governance

Commit to a learning agenda as a practical first step that fosters cross-departmental collaboration and aligns all relevant teams around shared objectives, helping to overcome organizational or communicational silos.

Start with Broad Interventions


If your team is new to experimentation, begin with simpler interventions, such as introducing controlled variations in spending or campaign parameters. For example, randomly adjusting daily spending across campaigns can help identify baseline performance trends and directional insights.

Leverage Platform Tools and External Know-How


Modern marketing platforms like Google Ads and Meta Ads Manager include built-in experimentation tools. These platforms allow firms to test different variables – such as targeting criteria or bidding strategies – directly within their campaigns. Use these tools as a stepping stone. While these tests may not meet the highest standards of rigor, they can provide valuable learnings when executed thoughtfully. Ensure you understand the limitations of these tools, particularly around randomization and confounding.

Similarly, if you are primarily active on one or a couple of ad platforms, the provided attribution tools can provide reasonably reliable estimates of your advertising effectiveness. Build on these insights directly to validate and calibrate OCI models if you have those.

Firms can also turn to specialized vendors like Optimizely, Eppo, Adobe Target, or Game Data Pros for more complex needs. These vendors provide advanced capabilities for designing and analyzing experiments and building related software tools. Investing in these tools can streamline the experimentation process and make it easier to scale testing efforts.

Incorporate Cluster-Level Experiments


Whenever feasible, prioritize RCTs. Collaborate with platforms, publishers, or third-party measurement providers to implement RCTs that deliver unbiased causal estimates. RCTs may not always be practical, but they should remain the gold standard you aspire to. One particular caveat is to make sure there is enough statistical power: insufficient budget or duration can undermine the reliability of the experiment and results. To address this, ensure an adequate budget, duration, and holdout is applied based on power calculations.

As your experimentation capabilities mature, explore geo and other cluster-level randomized experiments to measure the incremental impact of campaigns. Partner with data scientists or measurement specialists to effectively design and execute these tests. Geo experiments can bridge the gap between observational measurement and user-level RCTs.

Set up OCI model(s)

Once your marketing efforts involve more than two channels and you’re looking to scale up, it is time to build a comprehensive measurement framework that captures the full scope of these marketing activities. This involves cataloging marketing activities, i.e., listing all current and upcoming campaigns, channels, and tactics, along with their associated costs and KPIs. The figure in this article may be helpful for this exercise. Then set up a holistic measurement model, e.g., a media or marketing mix model, that includes all these activities plus control variables, trends, and adstock. This article provides an introduction to how you can do this using an open-source package.

A holistic model serves as the baseline for measuring the incremental impact of experiments and provides a framework for interpreting results in the context of broader marketing dynamics. Figure 3, taken from a presentation by Meta, visualizes how different OCI approaches can come together with experimentation.

Figure 3. Taken from a Presentation by Meta.
Figure 3. Taken from a Presentation by Meta.

Validate OCI Model(S)

Take the outputs from split tests, trusted attribution models, geo experiments, and RCTs to validate and calibrate your observational measurement models. To start, you can compare experimental and observational model results to ensure that they are “similar.” Similar can mean that both approaches pick the same winning ad variant/strategy or directionally agree. If the results are inconsistent, update the observational model to achieve similarity.

A somewhat more advanced approach uses experiment results to choose between OCI models. The marketing analytics team can build an ensemble of different models and then pick the one that agrees most closely with the ad experiment results for the KPI of interest, e.g., cost per incremental conversion or sales.

Calibrate OCI Model(s)

The most advanced and quantitative approach incorporates experiment results into the OCI model directly. Getting this right requires a robust understanding of statistical modeling. In a Bayesian modeling framework, the experimental results can enter your model as a prior. In a Frequentist model, they can serve to define a permissible range on the coefficient estimates: Say your experiment shows a 150% return-on-ad-spend with a 120% lower and 180% upper confidence bound; you can constrain your model’s estimate for that channel to that range.

Under a machine learning approach, you can use multi-objective optimization. Meta’s Robyn package does this: You can set it to not only optimize for statistical fit to observational data but also for minimal deviation from experimental results. This article provides a detailed walk-through of this relatively novel idea.

Identify Channels that Have Too Little Data for OCI Models to Work

OCI Models, like all machine learning models, require data for creation and calibration. For example, an advertising channel must have a volume of historical data above a minimal threshold and variations in spend and exposure in order to be meaningfully incorporated into an MMM model.  

If an MMM model has an advertising channel with too little data, several strategies can help address the issue. For example, incorporating prior knowledge through Bayesian methods can help stabilize estimates when data is sparse. Grouping similar channels with shared characteristics also allows performance to be estimated collectively, assuming similar behavior. In either case, experiments can quickly generate additional data to validate assumptions.

Integrating Experiments Pays Off

In conclusion, integrating experimentation into marketing measurement is essential for improving the accuracy and reliability of advertising effectiveness insights. While observational methods like MMM and OCI models provide valuable insights, they can suffer from biases without experimental validation. Controlled experiments can help calibrate and enhance these models by offering unbiased causal estimates.

However, success with experimentation requires work and planning. It requires an organizational commitment to data-driven decision-making, cross-departmental collaboration, and continuous learning. By aligning hypotheses, leveraging platform tools, fostering a culture of testing, and iteratively improving OCI models with experimental data, organizations can optimize resource allocation, better measure performance, and seize new growth opportunities across channels. Ultimately, experimentation transforms marketing from intuition-based strategies to a rigorously tested framework that drives both short-term results and long-term growth.

The effort is worth it though. Evidence is mounting that OCI can often stray far from the estimates of RCTs and  that firms that embrace experimentation as an analytics strategy do better. It’s not either OCI or ad experiments. It’s OCI and ad experiments.

We hope our article will help you get started.


[1] For the case of advertising, e.g., see Blake, Nosko & Tadelis (2015), Gordon et al. (2019), or Gordon, Moakler & Zettelmayer (2022); for the case of pricing, Bray, Sanders & Stamatopoulos (2024).

Web Stores Industry Survey – 2025 Edition

Best practices for monetizing mobile games as part of a larger revenue optimization strategy are changing. A recent Appcharge report indicated that 72% of the 100 top-grossing mobile games have a web store for off-platform purchases.  Solutions for operating and managing web stores, and approaches to monetization vary widely. 

Because this is a fast-moving space, we periodically conduct surveys to checkpoint industry adoption.  This survey digs into emerging best practices for operating web stores in the industry.  When the survey is completed, we will publish the results and insights via our blog, our Substack, and at the Game Revenue Optimization Mini-Summit at GDC this year. The more people participate in the survey, the better the insights for everyone.

Stick with us to the end of the survey for a special surprise !

Combating Misinformation in Business Analytics: Experiment, Calibrate, Validate

This article originally appeared as a guest post on Eric Seufert’s Mobile Dev Memo, written by Dr. Julian Runge, an Assistant Professor of Marketing at Northwestern University, and William Grosso, the CEO of Game Data Pros.

Observational Causal Inference (OCI) seeks to identify causal relationships from observational data, when no experimental variation and randomization are present. OCI is used in digital product and marketing analytics to deduce the impact of different strategies on outcomes like sales, customer engagement, and product adoption. OCI commonly models the relationship between variables observed in real-world data.

In marketing, one of the most common applications of OCI is in Media and Marketing Mix Modeling (m/MMM). m/MMM leverages historical sales and marketing data to estimate the effect of various actions across the marketing mix, such as TV, digital ads, promotions, pricing, or product changes, on business outcomes. Hypothetically, m/MMM enables companies to allocate budgets, optimize campaigns, and predict future marketing and product performance. m/MMM typically uses regression-based models to estimate these impacts, assuming that other relevant factors are either controlled for or can be accounted for through statistical methods.

However, MMM and similar observational approaches often fall into the trap of correlating inputs and outputs without guaranteeing that the relationship is truly causal. For instance, if advertising spend spikes during a particular holiday season and sales also rise, an MMM might attribute this increase to advertising, even if it was primarily driven by seasonality or other external factors.

When a new drug is tested in a clinical trial, randomized control trials are the gold standard because they eliminate bias and confounding, ensuring that any observed effect is truly caused by the treatment. No one would trust observational data alone to conclude that a new medication is safe and effective. While not usually dealing in questions of life and death, the stakes in business analytics can also be very high. Solely relying on observational causal inference is a risk that needs to be taken in full awareness of the limitations of the approach. (Photo by Michał Parzuchowski on Unsplash) 

Observational Causal Inference Regularly Fails to Identify True Effects

Despite its widespread use, a growing body of evidence indicates that OCI techniques often stray from correctly identifying true causal effects. This is a critical issue because incorrect inferences can lead to misguided business decisions, resulting in financial losses, inefficient marketing strategies, or misaligned product development efforts.

Gordon et al. (2019) provide a comprehensive critique of marketing measurement models in digital advertising. They highlight that most OCI models are vulnerable to endogeneity (where causality flows in both directions between variables) and omitted variable bias (where missing variables distort the estimated effect of a treatment). These issues are not just theoretical: the study finds that models frequently misattribute causality, leading to incorrect conclusions about the effectiveness of marketing interventions, highlighting a need to run experiments instead.

A more recent study by Gordon, Moakler, and Zettelmeyer (2023) goes a step further, demonstrating that even sophisticated causal inference methods often fail to replicate true treatment effects when compared to results from randomized controlled trials. Their findings call into question the validity of many commonly used business analytics techniques. These methods, despite their complexity, often yield biased estimates when the assumptions underpinning them (e.g., no unobserved confounders) are violated—a common occurrence in business settings.

Beyond the context of digital advertising, a recent working paper by Bray, Sanders and Stamatopoulos (2024) notes that “observational price variation […] cannot reproduce experimental price elasticities.” To contextualize the severity of this problem, consider the context of clinical trials in medicine.

When a new drug is tested, RCTs are the gold standard because they eliminate bias and confounding, ensuring that any observed effect is truly caused by the treatment. No one would trust observational data alone to conclude that a new medication is safe and effective. So why should businesses trust OCI techniques when millions of dollars are at stake in digital marketing or product design?

Indeed, OCI approaches in business often rely on assumptions that are easily violated. For instance, when modeling the effect of a price change on sales, an analyst must assume that no unobserved factors are influencing both the price and sales simultaneously. If a competitor launches a similar product during a promotion period, failing to account for this will likely lead to overestimating the promotion’s effectiveness. Such flawed insights can prompt marketers to double down on a strategy that’s ineffective or even detrimental in reality.

Prescriptive Recommendations from Observational Causal Inference May Be Misinformed

If OCI techniques fail to identify treatment effects correctly, the situation may be even worse when it comes to the policies these models inform and recommend. Business and marketing analytics are not just descriptive—they often are used prescriptively. Managers use them to decide how to allocate millions in ad spend, how to design and when to run promotions, or how to personalize product experiences for users. When these decisions are based on flawed causal inferences, the business consequences could be severe.

A prime example of this issue is in m/MMM, where marketing measurement not only estimates past performance but also directly informs a company’s actions for the next period. Suppose an m/MMM incorrectly estimates that increasing spend on display ads drives sales significantly. The firm may decide to shift more budget to display ads, potentially diverting funds from channels like search or TV, which may actually have a stronger (but underestimated) causal impact. Over time, such misguided actions can lead to suboptimal marketing performance, deteriorating return on investment, and distorted assessments of channel effectiveness. What’s more, as the models fail to accurately inform business strategy, executive confidence in m/MMM techniques can be significantly eroded.

Another context where flawed OCI insights can backfire is in personalized UX design for digital products like apps, games, and social media. Companies often use data-driven models to determine what type of content or features to present to users, aiming to maximize engagement, retention, or conversion. If these models incorrectly infer that a certain feature causes users to stay longer, the company might overinvest in enhancing that feature while neglecting others that have a true impact. Worse, they may even make changes that reduce user satisfaction and drive churn.

The Problem Is Serious – And Its Extent Currently Not Fully Appreciated

Nascent large-scale real-world evidence suggests that, even when OCI is implemented on vast, rich, and granular datasets, the core issue of incorrect estimates remains. Contrary to popular belief, having more data does not solve the fundamental issues of confounding and bias. Gordon et al. (2023) show that increasing the volume of data without experimental validation does not necessarily improve the accuracy of OCI techniques. It may even amplify biases, making analysts more confident in flawed results.

The key point to restate is this: Without experimental validation, OCI is at risk of being incorrect, either in magnitude or in sign. That is, the model may not just fail to measure the size of the effect correctly—it may even get the direction of the effect wrong. A company could end up cutting a channel that is actually highly profitable or investing heavily in a strategy that has a negative impact. Ultimately, this is the worst-case scenario for a company deeply embracing data-driven decision-making.

A/B tests, geo-based experiments, and incrementality tests can help establish causality with high confidence and calibrate and validate observational models. For a decision tree guiding your choice of method, e.g., consider Figure 1 here. In digital environments, the gold standard of conducting a randomized control trial is often feasible, for example, testing different versions of a web page or varying the targeting criteria for ads. (Photo by Jason Dent on Unsplash) 

Mitigation Strategies

Given the limitations and risks associated with OCI, what can companies do to ensure they make decisions informed by sound causal insights? There are several remedial strategies.

The most straightforward solution is to conduct experiments wherever possible. A/B tests, geo-based experiments, and incrementality tests can all help establish causality with high confidence. (For a decision tree guiding your choice of method, please see Figure 1 here.)

For digital products, RCTs are often feasible: for example, testing different versions of a web page or varying the targeting criteria for ads. Running experiments, even on a small scale, can provide ground truth for causal effects, which can then be used to validate or calibrate observational models.

Another approach is bandit algorithms that conduct randomized trials in conjunction with policy learning and execution. Their ability to learn policies “on the go” is the key advantage they bring. This however requires a lot of premeditation and careful planning to leverage them successfully. We want to mention them here, but advise to start with simpler approaches to get started with experimentation.

In reality, running experiments (or bandits) across all business areas is not always practical or possible. To help ensure that OCI models produce accurate estimates for these situations, you can calibrate observational models using experimental results. For example, if a firm has run an A/B test to measure the effect of a discount campaign, the results can be used to validate an m/MMM’s estimates of the same campaign. This process, known as calibrating observational models with experimental benchmarks, helps to adjust for biases in the observational estimates. This article in Harvard Business Review summarizes different ways how calibration can be implemented, emphasizing the need for continuous validation of observational models using RCTs. This iterative process ensures that the models remain grounded in accurate empirical evidence.

In certain instances, you may be highly confident that the assumptions for OCI to produce valid causal estimates are met. An example could be the results of a tried-and-tested attribution model. Calibration and validation of OCI models against such results can also be a sensible strategy.

Another related approach can be to develop a dedicated model that is trained on all available experimental results to provide causal assessments across other business analytics decisions and use cases. In a way, such a model can be framed as a “causal attribution model.”

In some situations, experiments and calibrations may not be feasible due to budget constraints, time limitations, or operational challenges. In such cases, we recommend using well-established business strategies to cross-check and validate policy recommendations derived from OCI. If the models’ inferences are not aligned with these strategies, double- and triple-check. Examples for such strategies are:

  • Pricing: Purchase history, geo-location, or value-based pricing models that have been extensively validated in the academic literature
  • Advertising Strategies: Focus on smart creative strategies that align with your brand values rather than blindly following model outputs
  • Product Development: Prioritize features and functionalities based on proven theories of consumer behavior rather than purely data-driven inferences

By leaning into time-tested strategies, businesses can minimize the risk of adopting flawed policies suggested by potentially biased models.

If in doubt, err on the side of caution and stick with a currently successful strategy rather than implementing ineffective or harmful changes. For recent computational advances in this regard, take a look at the m/MMM package Robyn. It provides the ability to formalize a preference for non-extreme results in addition to experiment calibration in a multi-objective optimization framework.

To see clearly and avoid costly mistakes, treat observational causal inference as a starting point, not the final word. Wherever possible, run experiments to validate your models and calibrate your estimates. If experimentation is not feasible, be critical of your models’ outputs and cross-check with established business strategies and internal expertise. Without such safeguards, your business strategy could be built on misinformation, leading to misguided decisions and wasted resources. (Photo by Nathan Dumlao on Unsplash)

A Call to Action: Experiment, Calibrate, Validate

In conclusion, while OCI techniques are valuable for exploratory analysis and generating hypotheses, current evidence suggests that relying on them without further validation is risky. In marketing and business analytics, where decisions directly impact revenue, brand equity, and customer experiences, businesses cannot afford to act on misleading insights.

“Combating Misinformation” may be a strong frame for our call to action. However, even misinformation on social media is sometimes shared without the originator knowing the information is false. Similarly, a data scientist who invested weeks of work into OCI-based modeling may deeply believe in the accuracy of their results. These results would however still misinform business decisions with potential to negatively impact share- and stakeholders.

To avoid costly mistakes, companies should treat OCI as a starting point, not the final word.

Wherever possible, run experiments to validate your models and calibrate your estimates. If experimentation is not feasible, be critical of your models’ outputs and always cross-check with established business strategies and internal expertise. Without such safeguards, your business strategy could be built on misinformation, leading to misguided decisions and wasted resources.

Optimizing Across the Free-to-Play Marketing Mix with Bandit Algorithms

The authors would like to acknowledge the invaluable assistance of the following team members in achieving the results discussed in this article: Bill Grosso, Pallas Horwitz, Jordan Nafa, Chase Ruyle, James Sprinkle, Taylor Steil, and John Szeder.

In game publishing, revenue optimization allows development teams, designers, and artists to make money from their creative work. The proceeds pay for ongoing and fund future work. A key challenge with this undertaking is that players’ preferences and behaviors continuously evolve, and so marketing strategies must adapt to engage players and monetize game content effectively.

Multi-armed bandit (MAB) algorithms have emerged as a powerful tool in this quest [1, 2]. MABs optimize across a number of given variants, e.g., different ads, offers, content, or other elements of the player experience. They roll out the variant that maximizes a specified reward, e.g., views, click-throughs, or purchases. The more arms an MAB has, the more options it can consider for optimization. Besbes, Gur, and Zeevi [3] succinctly summarize the challenge for the bandit: to optimize effectively, it needs “to acquire information about arms (exploration) while simultaneously optimizing immediate rewards (exploitation); the price paid due to this trade-off is often referred to as the regret, and the main question is how small can this price be as a function of the horizon length T.”

If the bandit also has access to contextual data describing different contexts within which it can optimize, it is called a contextual MAB. Then, the bandit can find the optimal variant conditional on this data, converging to an optimal personalized strategy [4]. Li et al. [4] describe this for the case of news article recommendation: the “algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.”

In this article, we explore the application of MABs across the 4Ps of the marketing mix in games: Product, Price, Promotion, and Place. It first addresses MAB applications to each of the 4Ps in the context of digitally published games, then discusses how bandits can help orchestrate activities “on the fly,” drawing on John White’s “Moving Worlds” concept [2] and a recent case study conducted by Game Data Pros.

Digital games today are often published under the free-to-play model. Throughout this article, we often refer to free-to-play games without explicitly mentioning them.

Figure 1: The more arms a multiarmed bandit has, the more options it can consider for optimization. If it also has access to contextual data, the bandit can find the optimal variant conditional on this data, learning an optimal personalized strategy [4]. (Photo by Hiroshi Tsubono on Unsplash)

Bandit Algorithms in Game Product Optimization

The product is at the heart of the game experience, encompassing game mechanics, storytelling, difficulty levels, and progression systems. MABs can support product optimization by dynamically personalizing the gaming experience based on player interactions. E.g., a key optimization problem in the product dimension of the game marketing mix is game difficulty, often personalized via so-called dynamic difficulty adaptation (DDA) systems [5].

Bandit algorithms can help steer such systems [6]. They can adjust game difficulty on the fly, ensuring that players are constantly challenged yet not frustrated. By analyzing player performance data, these algorithms can identify patterns and modulate the game’s difficulty to maintain an optimal balance between challenge and enjoyment. This approach can lead to higher player retention and increased revenue through sustained engagement [5].

Similarly, MAB algorithms can tailor other parts of a game, e.g., design elements, narratives, and character interactions, to align with individual player preferences [7].

Game Price and Promotion Optimization with Bandit Algorithms

Pricing and promotion strategies are critical components of a game’s revenue model, especially in free-to-play games using the in-app purchase model [8]. MAB algorithms offer a robust framework for optimizing these elements by continuously learning from player responses to various pricing and promotional offers [9].

In the realm of in-game purchases, bandit algorithms can target and customize offers based on player behavior and purchasing patterns. For example, a player who frequently buys cosmetic items might be more responsive to exclusive, time-limited offers on new skins or outfits. MABs can dynamically adjust the frequency, type, and timing of promotional offers to maximize in-game purchase conversion rates and revenue. Additionally, MABs can optimize shop designs, ensuring that the most appealing and profitable items are prominently featured based on real-time player preferences.

Starter packs (or beginner bundles) are crucial elements of free-to-play game monetization. They provide a premium experience tailored to the early game and help players onboard successfully with a game. Game Data Pros just concluded a research collaboration with academics where we investigated the application of bandit methods for the targeting of such packs. The paper was published a week ago as part of the proceedings of the 17th IEEE Conference on Games [10]. Give it a read!

Figure 2: The levers of the marketing mix—and their interactions—offer a plethora of ways to reach your target audience in gaming. Optimally orchestrating these levers and ways can be a daunting task. Bandit algorithms can help! (Photo by José Martín Ramírez Carrasco on Unsplash)

Optimizing Place in the Game Marketing Mix with Bandit Algorithms

The concept of “place” in the marketing mix refers to the distribution channels and touchpoints through which players interact with and learn about (new) game content. MAB algorithms can optimize across these touchpoints to enhance the overall player experience and drive revenue growth. E.g., advertising and cross-promotion are key areas in which MABs can excel. By analyzing player engagement data, these algorithms can determine the most effective ad creatives, formats, and placements [11]. This ensures that players are exposed to ads that are not only relevant but also likely to generate higher click-through rates and conversions.

One key application in this area is the timing and frequency of in-game pop-ups, ads, and informational content. Bandit algorithms can analyze player interaction data to determine the optimal moments to present these elements. By doing so, they ensure that players receive relevant and timely content without feeling overwhelmed or interrupted, thereby enhancing engagement and reducing churn.

Moreover, contextual bandit algorithms can optimize the distribution of story elements, ads, and game content across different player segments [4]. By identifying the most effective touchpoints for each segment, these algorithms ensure that players receive content that resonates with their preferences and enhances their overall gaming experience.

“Moving Worlds” and Orchestration in Multidimensional Optimization

The coordination of different marketing activities is crucial to achieving a cohesive strategy. Effective orchestration ensures that the various dimensions—Product, Price, Promotion, and Place—work harmoniously to achieve overarching marketing goals. For instance, aligning user acquisition campaigns with in-game monetization strategies can lead to more efficient spending and higher returns on investment [12].

This orchestration is a multidimensional optimization problem that is not fully known. In reality, the hope is that many smaller, separate optimizations will approximate and get close to an overall optimal solution. Across these optimizations, interactions and relevant factors often only emerge at runtime and are unknown ex-ante.

John White calls this phenomenon “Moving Worlds” [2] and notes that “the value of different arms in a bandit problem can easily change over time” (p. 63). MABs afford us an important advantage over more static methods here: if appropriately configured, they can learn about emergent external factors and adapt the optimization strategy accordingly.

We just ran a large-scale MAB for a cross-promotion campaign that highlighted this advantage. Let’s dive in and take a closer look at how this worked.

Figure 3: If appropriately configured, multi-armed bandits can learn about a changing world and adapt the optimization strategy accordingly. (Photo by Javier Allegue Barros on Unsplash)

Case Study: How a Bandit Adapted to a “Moving World”

To illustrate a practical application, let’s delve into a recent case study in which Game Data Pros utilized an MAB algorithm to optimize ad creatives for the cross-promotion of a new major game title. This case study demonstrates how bandit algorithms can adapt to a “Moving World” and achieve effective orchestration of emergent interaction effects between different marketing initiatives.

Working with a large game publisher, we faced the challenge of promoting a new major game title to the existing player base. The goal was to identify the most effective ad creatives for cross-promotion, maximizing player engagement and conversion rates. We used an MAB algorithm to optimize across different ad creatives, focusing on the color scheme and the in-game character used to advertise the new game.

As the bandit algorithm went live, it initially identified a creative variant that resonated well with the target audience. However, during the campaign, other marketing activities, such as social media promotions and influencer partnerships, impacted and shifted player preferences. These activities highlighted specific features of the new game title, making players more receptive to ad creatives that aligned with this new messaging.

Recognizing this shift, the MAB adapted quickly, reallocating exposure to the creatives that reflected the updated messaging. This dynamic adaptation ensured that the cross-promotion campaign rolled out a creative variant with 30% higher click-through than the worst-performing variant. Under a naïve strategy, e.g., mixing across ad variants with equal probability, player engagement with the ads (as measured by players clicking the ad) would have been substantially lower.

This case study underscores the importance of using MABs in a “Moving World.” The ability to adapt to changing player preferences and align with other marketing activities is crucial for maximizing the effectiveness of cross-promotion campaigns. By leveraging MAB algorithms, game developers can ensure that their marketing strategies remain agile and responsive, driving sustained revenue growth.

Figure 4: The share of traffic allocated to different ad variants in an MAB test we ran recently. The top panel shows traffic allocation for the first 24 hours, the bottom panel for the first ~80 hours. Character B, on a dark background, achieved the highest click-through (and looked like the possible winner) initially. Over time, Character A—fueled by other marketing activities that “moved the world”—took over and drove home the win as shown in the bottom panel.

Method: How Did the Bandit Do It?

In any decision-making process, we face a dilemma between two strategies: exploration, in which we learn as much as possible about the available options, and exploitation, in which we choose the best option based on our current knowledge of the available options [3]. These strategies are naturally opposed. By exploring possible options, we are potentially missing out on rewards that could be exploited now. By exploiting our current knowledge of the best outcome, we are potentially missing out on future rewards that could be discovered through exploration. The goal of any bandit algorithm is to strike a balance between exploration and exploitation such that we maximize the total reward (minimize the total regret [3]).

In this case study, we used four ads—two in-game characters on either a light or dark background—and a simple MAB algorithm called Thompson Sampling [13] to balance exploration and exploitation and optimize the click-through rate. In short, Thompson Sampling allows the bandit to balance the exploration/exploitation tradeoff by assigning users ads based on their probability of having the best click-through rate. For example, if we are 75% sure that an ad has the best click-through rate, the bandit will show that ad to 75% of players, while the remaining 25% of traffic will be used to explore other options.

Initially, all ads are shown to players in equal proportion: 25% of traffic for each (see Figure 4). As players are shown the ad variants, the bandit updates the traffic distribution based on observed click-through behavior. As more data is collected, the bandit detects differences in click-through rates between different ads. Eventually, the algorithm converges and directs nearly all incoming traffic to the best-performing ad.

In this case study, the light ad variants had a lower click-through rate than the dark variants and were, thus, discarded quickly. Character B, on a dark background, was ahead initially, then Character A—fueled by other marketing activities—took over and drove home the win. The MAB automatically orchestrated with the other marketing activities and changed what ad variant received the most traffic.

Conclusion

MAB algorithms offer a powerful solution for optimizing revenue across the 4Ps of the free-to-play game marketing mix. By dynamically personalizing product experiences, optimizing pricing and promotions, and enhancing the effectiveness of distribution touchpoints, these algorithms can drive sustainable engagement and revenue growth. Moreover, the importance of orchestration cannot be overstated. The implicit capability of MABs to adapt to emergent interactions and a “Moving World” [2] makes them a must-have tool in free-to-play engagement and revenue optimization.

References

[1] Rothschild, Michael. “A two-armed bandit theory of market pricing.” Journal of Economic Theory 9, no. 2 (1974): 185-202.

[2] White, John. “Bandit algorithms for website optimization.” O’Reilly Media, Inc., 2013.

[3] Besbes, Omar, Yonatan Gur, and Assaf Zeevi. “Stochastic multi-armed-bandit problem with non-stationary rewards.” Advances in Neural Information Processing Systems 27 (2014).

[4] Li, Lihong, Wei Chu, John Langford, and Robert E. Schapire. “A contextual-bandit approach to personalized news article recommendation.” In Proceedings of the 19th International Conference on World Wide Web, pp. 661-670. 2010.

[5] Ascarza, Eva, Oded Netzer, and Julian Runge. “Personalized Game Design for Improved User Retention and Monetization in Freemium Mobile Games.” Available at SSRN 4653319 (2023). Personalized Game Design for Improved User Retention and Monetization in Freemium Mobile Games

[6] Missura, Olana. “Dynamic difficulty adjustment.” PhD diss., Universitäts- und Landesbibliothek Bonn, 2015.

[7] Amiri, Zahra, and Yoones A. Sekhavat. “Intelligent adjustment of game properties at run time using multi-armed bandits.” The Computer Games Journal 8, no. 3 (2019): 143-156.

[8] Waikar, S. “Why Free-to-Play Apps Can Ignore the Old Rules About Cutting Prices.” Stanford Business Insights, 2022. Why Free-to-Play Apps Can Ignore the Old Rules About Cutting Prices

[9] Misra, Kanishka, Eric M. Schwartz, and Jacob Abernethy. “Dynamic online pricing with incomplete information using multiarmed bandit experiments.” Marketing Science 38, no. 2 (2019): 226-252.

[10] Runge, Julian, Anders Drachen, and William Grosso. “Exploratory Bandit Experiments with ‘Starter Packs’ in a Free-to-Play Mobile Game.” Proceedings of the 17th IEEE Conference on Games (COG), 2024.

[11] Schwartz, Eric M., Eric T. Bradlow, and Peter S. Fader. “Customer acquisition via display advertising using multi-armed bandit experiments.” Marketing Science 36, no. 4 (2017): 500-522.

[12] William Grosso. The Origins of Revenue Optimization. Game Data Pros Blog, (2024). The Origins of Revenue Optimization | GDP

[13] Russo, Daniel J., Benjamin Van Roy, Abbas Kazerouni, Ian Osband, and Zheng Wen. “A tutorial on thompson sampling.” Foundations and Trends® in Machine Learning 11, no. 1 (2018): 1-96.

Is Mobile App Revenue Moving Off-Platform? Industry Survey Indicates Landslide Changes in Web Store Adoption

During GDC 2024 in San Francisco, we hosted the Revenue Optimization in Games Mini-Summit. Industry leaders gave four fascinating presentations about revenue optimization in gaming, including an overview of the complete survey.

See our Reporting from the Game Revenue Optimization Mini-Summit follow-up post to learn more!

At Game Data Pros, a lot of our recent work on personalization has focused on what the Deconstructor of Fun podcast refers to as “Off-Platform Payments” and what Liquid & Grit calls “Web Stores”. We think it’s a big and important trend in the games industry. But how big? And how important?

To find out, we distributed a survey on LinkedIn, Twitter, and in the Deconstructor of Fun and Mobile Dev Memo communities. While this sampling approach is imperfect, it should yield decent indications of what’s happening in the marketplace. We collected a large number of responses over about two weeks. After cleaning the data from fraudulent responses based on the provided e-mail addresses and patterns in timing and response behavior, we had a sample of 26 high-quality responses across different companies and backgrounds. While that number is too small to draw conclusions, it is a good start to gather indications.

As a little introductory data point, here’s where respondents in the sample say they get their mobile gaming news (multiple responses possible):

The top news sources among survey respondents are Deconstructor of Fun, LinkedIn, and Mobile Dev Memo. Professional communities for the win, yay!

Of course, the responses here might be impacted by how we distributed the survey. But it’s nice to see two mobile / gaming communities — that I personally trust and frequent — land in the top three.

Now, let’s dive in.

Web stores are a major market trend

Respondents believe that the adoption of web stores in the market is far from perfect and that there is still ample potential for mobile game developers to move payments off-platform. The community is split on the question of how widespread adoption is: Half of respondents think that at least 50% of companies have started running a web store, the other half thinks that most companies are not yet doing it:

The community is split in their beliefs on how widespread web store adoption is. By the way, you can see all questions and the full survey here.

Another question we asked provides us with a more direct read:

Actual web store adoption among respondents outpaces respondents’ beliefs about adoption in the wider market.

17 respondents are live with a web store in one or more games in their company / their company portfolio. Another four indicate that they’re planning to go live soon, and three are not live and don’t seem to be planning to go live with a web store. Beliefs about adoption, i.e. the results of the previous question, may hence underestimate how many companies are actually already live with a web store.

(Side note: Our sample likely overestimates actual web store adoption as people with interest in the topic are more likely to respond.)

Off-platform payment activity expected to be significant

Now, being live with a web store doesn’t mean that a lot of revenue is going through it. To assess what the market thinks about the economic significance of web stores, we asked respondents for their estimates of what share of revenue will be moving off-platform in one and in five years:

Three quarters of respondents believe that 30%+ of mobile game revenue will be generated off-platform in five years. Wow.

Only 27% of respondents believe that only 10% or less of mobile game revenue will be off-platform in a year from now. And nobody thinks that off-platform payment activity will be that low in five years.

73% of respondents believe that 30% or more of mobile game revenue will be generated off-platform in five years. 15% even think that a staggering 70% or more of overall mobile game revenue will run through web stores in five years. Mull on that.

A windfall for game creators?

Next, we asked participants about their expectations for the revenue impact of web stores:

Three quarters of respondents say that revenue for mobile game devs — after platform fees — will increase by 10%+.

76% of respondents indicate that revenue after platform fees for mobile game developers will increase by 10% or more. A third thinks that the revenue windfall will clock in at 30% or more, with two respondents expecting a post-fee revenue jump of 70% plus!

The exact impact for different game developers will certainly depend on the genres and monetization behaviors in the respective publishing portfolio. If a game’s revenue is driven by a relatively small set of high-value and high-spending players, and the company very successfully entices these players to use a (personalized) web store, such outcomes seem possible. They are, however, unlikely to materialize for the broader market to this extent.

Nonetheless, these results serve to show how much is at stake. Up to 30% of overall mobile game revenue is bound to be re-distributed, by law and/or through strategic maneuvering by major market participants.

Is everybody of the same opinion?

No. Opinions diverge on the importance of web stores and on what revenue share should go to content creators versus platform operators. Our sample is a little too small to slice and dice it. However, if we look at indicators of “web store bullishness” across the two most important community news sources in our sample, we notice an interesting pattern:

While they’re trending strong, not everyone is equally bullish on web stores and off-platform payments.

Respondents who list Mobile Dev Memo (MDM) as their most important source of mobile (gaming) news, appear much more bullish on web stores than respondents who primarily follow Deconstructor of Fun (DoF). 75% of MDMers think that current web store adoption sits at 50% or more while only 12.5% of DoFers think so. 100% of MDMers believe that 30% or more of revenue will go through web stores in a year from now while only 25% of DoFers do. Expectations start converging in the longer term: 75% of MDMers see 50%+ of revenue off-platform in five years, and almost 40% of DoFers agree with that perspective.

Bear in mind that these are indications at best. The sample is simply too small for anything more. They would align with this perspective though: The MDM community has on average more business-minded and less purely gaming-focused members — which seems reasonable. After all, off-platform payments may become even more critical for app developers outside gaming, such as in health, news, music, and other content distribution.

So, is this it?

And, no, again. Our survey also asked respondents about the main challenges they face in web store adoption and how they plan to overcome them. For a talk covering the full results, join us for the Revenue Optimization in Games Mini-Summit and Happy Hour on March 20, 2 pm, in downtown San Francisco. Four experts from different corners of the industry will talk about their recent work and what they see in the market. During a reception following the talks, you will have a chance to connect with the speakers and us to discuss game monetization and its future.

We’re excited to see you there!

Off-Platform Payments and Web Stores – Industry Survey

At Game Data Pros, a lot of our recent work on personalization has focused on what the Deconstructor of Fun podcast refers to as “Off-Platform Payments” and what Liquid & Grit calls “Web Stores“.

We think it’s a big and important trend in the games industry. But how big? And how important?

That’s what this survey wants to find out. We’ll report back on our blog in early March with what we’ve learned. And, as an extra incentive, we’ll be giving out $100 Amazon gift cards to two respondents. This survey is 12-15 questions long, depending on your answers, and should take ~5 minutes to complete.

Dear Digital-First Advertisers, Are You Media or Marketing Mix Modeling?

As the adoption of MMM among digitally native businesses increases and matures, awareness of the differences between the two can open up new pathways for excellence in marketing analytics.

(Scroll to the end of the article for a TL;DR.)

MMM, commonly used to abbreviate marketing mix modeling, is experiencing a surge in interest among digital-first advertisers. App publishers, game companies, direct-to-consumer businesses, and others are all embracing a new measurement standard as private and regulatory privacy initiatives are rocking the data infrastructure of digital advertising. In lieu of deterministic attribution and measurement based on user-level data and identity graphs, advertisers are flocking to probabilistic measurement from coarser data and identity graphs such as at the campaign-, state-, DMA-, or country-level. Especially MMM, as the most comprehensive and holistic of probabilistic measurement methods, is finding adoption as marketers want to mitigate a risk of “flying blind” if user-level data access continues to deteriorate at the current pace.

Now, as everyone in digital advertising starts talking about MMM, there seems to be a conflation of the terms of marketing and media mix modeling. While the two are highly related and make of use of similar and in many ways identical methods, they are not the same. A recent report by the Marketing Science Institute nicely brings this point home by distinguishing MMM (marketing mix modeling) and mMM (media mix modeling). The key difference between the two is that MMM really is about supporting a firm’s decisions on the full marketing mix (see Figure 1), so product, price, promotion, and place/distribution, while mMM is about informing its decisions on the media mix, i.e., how it sets and allocates its media budget across media and advertising channels (see upper part of Figure 1).

This blog post aims to achieve three things:

(1) Revisit and summarize differences between MMM and mMM, mostly to help inform current industry conversations in digital advertising;

(2) Talk a little bit about why the concepts of MMM and mMM are often used synonymously and may have fused in digitally native business especially;

(3) Highlight that there may be valuable lessons to be gleaned for digital-first advertisers from the distinction of MMM and mMM.

Figure 1: This overview published by Harvard Business Review nicely summarizes the levers firms can work with to impact their marketing strategy and success. It also provides a succinct summary of the related analytics chain. The only lever I would add are a company’s own (new) product releases and launches. (Source: https://hbr.org/2013/03/advertising-analytics-20).

Differences between MMM and mMM

Both MMM and mMM are analytical approaches used by companies to understand the effectiveness of their marketing and advertising efforts. While they share similarities, they have distinct focuses and differences. MMM is a broader approach that analyzes the overall impact of various marketing elements on a company’s sales and other key performance indicators (KPIs). These marketing elements typically include a combination of the “Four Ps” of the marketing mix: Product, Price, Promotion, and Place (distribution). MMM aims to quantify the contributions of each of these elements, and their interactions, to overall sales.

As illustrated in Figure 2, media used for marketing is a subset of all modeling variables used in MMM. In this vein, mMM focuses on analyzing the effectiveness of different advertising media channels in driving sales and other KPIs and determining the optimal allocation of media budget across various channels to achieve the best return on marketing investment (ROMI). It thereby attributes sales or conversions to specific media channels, helping marketers understand which channels are driving the most value. In this way, mMM can sometimes offer insights at a more granular level, such as the impact of specific ad placements, time slots, or online platforms.

Due to their different scopes as shown in Figure 2, the two approaches require different historical data coverage. MMM requires data inputs addressing all the various marketing activities of interest, e.g., on all Four Ps (product, price, promotion, place), in addition to sales data, other relevant external factors (e.g., competitive and macroeconomic), and potentially media spend. While data on the Four P are often added to mMM as control variables, mMM does not require them per se and can work from media spend and sales data alone.

Figure 2: Media mix modeling (mMM) addresses a subset of the analytical scope of marketing mix modeling (MMM). The author believes that awareness of this difference in scope can hold valuable lessons for digital-first advertisers. (Image source: https://hbr.org/2013/03/advertising-analytics-20)

Similarities between MMM and mMM

In terms of model specification and the methodological approaches used for estimation of the models, MMM and mMM lean very similar and often use identical methods. An mMM can also be included in a company’s MMM, meaning a more comprehensive MMM covers media spend evaluation and optimization as a subset of its overall analytical scope. In both MMM and mMM, a simple starting point can be to estimate a parametric model of sales explained by investments in different actions on the Four Ps and in media. Usually, as mentioned above, such a model will also include variables addressing the competitive and macroeconomic landscape. From there, modeling for both MMM and mMM can become more sophisticated by modeling dynamic (e.g., ad stock) effects, interactions between different marketing levers, engineering specific features, using experiments to calibrate the model, and performing other tweaks. More advanced modelers also like to specify, possibly marketing action-specific, response curves that address diminishing returns to scale, e.g., due to saturation of an advertising medium.

While a simple use case of mMM and MMM can be to evaluate past marketing strategy, more advanced uses commonly include forecasting of future sales and optimization of future marketing strategy and actions. These more advanced use cases thereby require explicit assumptions and accommodations in the model. E.g., is the data generating process stationary? Did the competitive or macroeconomic landscape change? Are there new advertising media, product line extensions, or other changes that may require specific adjustments to allow the model to generalize from the past and present to the future? If we increase spending on this medium threefold, how quickly should we expect the returns to that investment to diminish? If we scale down advertising on TV, will sales in the next period be unaffected but may we see a major drop in future periods? If we run large-scale promotions in the next period, how will this in-/decrease and shift our sales between future periods? A model’s architecture will need to be finessed to be able to appropriately reflect these complexities. The larger the model’s scope (MMM > mMM) and the more advanced the use case (optimization > forecasting > evaluation), the more effortful and challenging this task, and the more insightful the resulting model, becomes.

In summary, MMM is a comprehensive analysis of various marketing elements, while mMM specifically focuses on assessing the impact of advertising across different media channels. Figure 2 succinctly captures this difference in analytical scope. Both approaches aim to provide data-driven insights to help companies make informed decisions about resource allocation and strategy in marketing.

Why are MMM and mMM often used synonymously, especially among digitally native advertisers?

By digitally native advertisers, I mean companies that were started and grew with the increased digitization of the production and delivery of consumer goods through the proliferation of the web, personal computers, social media, and then handheld devices. Examples are web-based and mobile gaming companies, direct-to-consumer businesses, app developers, digital (social) media platforms, or e-commerce operations. I believe there are a few factors that may have contributed to a conflation of MMM and mMM among these digital-first advertisers:

  • A distinction of mMM and MMM was simply not needed or relevant: Digitally native businesses primarily operate in the digital realm, relying heavily on online platforms, social media, and digital advertising for their marketing efforts. Since their marketing activities are predominantly digital, they often equate marketing with media, considering digital media as the core component of their overall marketing strategy.
  • Many digital media are priced “freemium:” Very much related to the previous point, digital consumer goods are predominantly offered under freemium pricing where initial product adoption and use are free. Price hence is much less of a relevant decision criterion for consumers, in turn affecting its importance in a firm’s marketing decision-making.
  • Digitization was accompanied by further significant shifts in the salience of the marketing mix’ Four Ps: As freemium pricing reduced the relevance of price in product adoption decisions, promotion is much less relevant as well. Plus, recent research suggests that the effects of price promotions may be very different for digital freemium consumer goods. Distribution collapsed to digital platforms and media or, in direct-to-consumer commerce, was replaced by target advertising and simply disappeared as an essential consideration.
  • On digital media, A/B tests and experiments can be conducted with ease: Publishers of digital goods did not need an MMM to inform their product, price, promotion, and place/distribution decisions. As illustrated in Figure 3, they had (and still have) access to granular, user-level data allowing them to run user-level A/B tests and other experiments to inform marketing and product initiatives. A/B tests and other experiments can be run at the user-level to get “gold standard” reads on price elasticity, inter-temporal substitution, and the effectiveness of promotions.
  • User-level data enable(d) granular analytics and decision support: Similarly, the available detailed first-party and often third-party data could fuel MTA (multi-touch attribution) models or elaborate product analytics efforts to evaluate and attribute merit to different product and marketing strategies and tactics. In digital advertising, this level of data access is currently under siege (so, for the third-party use cases in Figure 3), but it is likely to remain in place for the foreseeable future for first-party data. Thus, it can continue to support decision-making for product, price, and promotion on a firm’s proprietary digital offerings. When the only reasonable use case of an MMM is to support advertising decisions, it becomes an mMM (see Figure 2).I want to note that, while these factors might lead to the perception that MMM and mMM are the same, recognizing the distinction between assessments of the overall marketing strategy and of media channel allocation holds valuable lessons. A well-rounded approach considers all marketing elements, even in digitally native businesses, to enable a comprehensive and holistic understanding of the factors driving business growth. A more holistic and comprehensive model is also likely to provide more accurate estimates, e.g., of ROMI, for each individual marketing lever. Further, while user-level data and experimentation may still provide more accurate and reliable decision support in product, price, and promotion to digitally native businesses, setting up an MMM to complement, cross-check, and build on these other analytics tools is a worthwhile effort. It can bring “everything together” in one holistic model and provide valuable higher-level insights, e.g., on longer-term strategic and interaction effects that might otherwise go undetected.
Figure 3: Digitally native businesses have grown accustomed to using first-party experimentation and user-level analytics to support decisions in product, price, promotion, and third-party experimentation and user-level analytics in digital advertising. MMM-type modeling is hence mostly/only relevant to support media-related decisions. This may help explain why MMM and mMM seem to have collapsed to meaning the same for many digital-first advertisers. My inclusion of new product releases in the first-party experiment scope intends to refer to a company’s own product releases. (Image source: https://hbr.org/2013/03/advertising-analytics-20)

TL;DR / Take-Aways

Using the terms marketing mix modeling (MMM) and media mix modeling (mMM) synonymously really is no mistake if you’re running a fully digitally-centric business. Doing so however may lead to confusion (1) when you operate both on- and offline product and distribution, and (2) if you interface with traditional brand advertisers. So, keep the differences between traditional mMM and MMM in mind and see if you can learn anything for your digital-first MMM from “old school” brick-and-mortar marketing mix modeling:

  • Could you include data on price and promotion and inform your pricing and promotional strategies from your MMM? Could resulting estimates substitute and complement your existing price and promotion analytics, e.g., by reducing the need to run experiments?
  • Are there distribution and advertising channels that you have not considered so far and that could meaningfully increase demand for your product(s)?
  • Can a model that more comprehensively addresses your actions on the marketing mix surface insights on synergistic effects that you so far were unaware of? E.g., do promotional efforts increase the effectiveness of your advertising? Is there evidence that lowered prices in certain territories may increase product usage and in turn word-of-mouth in these regions?

In this way, as MMM adoption among digital-first advertisers matures, awareness of the differences between MMM and mMM can open up new pathways for excellence in marketing analytics. Once your mMM is in (a good) place, strive to complement it with an MMM as the next frontier of digital marketing analytics. MMM and mMM can work nicely together: E.g., you can use a more comprehensive MMM to assess your overall marketing strategy and set a media budget that you then allocate based on your mMM. Your media tactics can additionally be informed by further lower-level analytics such as an MTA model or campaign optimization tools. You can also use outputs from granular product analytics and experiments across product, price, promotion, and advertising to calibrate and fine-tune your marketing and media mix model. And you may be able to inform the design of treatments and strategies that you test experimentally using the insights provided by your MMM.

Incrementality in Game Analytics: Beyond AB Tests, on to Bandits and Marketing Mix Models

(For a TL;DR scroll to the end of the article)

Incrementality is a hot topic in marketing analytics, referring to “the measurement and analysis of the incremental impact of a marketing campaign or initiative. It aims to determine whether the marketing efforts are actually driving additional value or revenue beyond what would have occurred naturally without the campaign.” That’s the first paragraph of ChatGPT’s answer when prompted “What is incrementality in marketing analytics?”

When asked the same thing, Google delivers a similar answer drawing on the website Marketing Evolution: “Incrementality refers to growth that can be directly attributed to specific marketing efforts above and beyond the existing brand equity. For example, how much a certain channel, tactic, or overall campaign helped influence an increase in sales, newsletter sign-ups, etc.” (see Figure 1) Google’s answer confirms that incrementality is a thing in marketing analytics and not something that ChatGPT hallucinated.

Figure 1: The ideal way of measuring the incremental impact of a business action, e.g., a marketing campaign or a new game feature, on an outcome of interest, e.g., conversions or retention, is through a randomized control trial. Source: the author’s mind and hand (sorry if it’s not pretty).

Incrementality programs are so important in analytics because they aim to quantify the incremental causal effect that different actions, tactics and strategies had on relevant outcomes for the firm. Such precise causal measurement ensures that each action is attributed the right amount and sort of credit, in turn crucially informing the firm’s future actions and strategies. It can make all the difference between a fast approach to and increase in profitability. Or the inverse.

Now let’s see what we find about incrementality in game analytics:

ChatGPT: “In the context of game analytics, incrementality refers to measuring the incremental impact of a specific game feature, update, or intervention on player behavior, engagement, monetization, or other key performance indicators (KPIs). It aims to understand whether the implemented changes or additions have a positive effect on the game’s metrics beyond what would have occurred naturally without those modifications.”

Google: N/A, see Figure 2. Google is unable to answer the question, surfacing a link to a resource by the well-known company GameAnalytics that has nothing to do with incrementality. Then it offers material related to incrementality in marketing. Also Google Scholar has no clear leads.

In many ways, this tells us that –

  • Incrementality is not yet a concept of importance in game analytics;
  • It should be; due to its clear value proposition articulated above and due to ChatGPT’s plausible and succinct response. (I find ChatGPT very useful to check the plausibility and merit of new ideas in this way. The exercise is further helpful as ChatGPT commonly surfaces relevant definitions, references and related concepts.)

Based on these two points, this article aims to introduce the incrementality concept to game analytics. We will first define in more detail what incrementality means in the context of game analytics and then I will try to provide an overview of what I believe are important incrementality methods for game analytics and game analysts.

Figure 2: The concept of incrementality is foreign to game analytics as it stands. At a minimum, it has not been formally introduced — something that this blog post endeavors to change. Source: Google Search at www.google.com and scholar.google.com

Defining Incrementality in Game Analytics

While ChatGPT’s answer on incrementality in marketing analytics mentions a spectrum of methods that “aim to minimize biases and control for confounding variables, ensuring that the observed differences can be attributed to the marketing efforts with a reasonable degree of confidence,” its answer for incrementality in game analytics pretty squarely centers on A/B testing (experimentation):

Game developers and analysts often employ incrementality analysis to evaluate the effectiveness of specific game design choices, marketing campaigns, pricing strategies, or other initiatives. The goal is to isolate the impact of the intervention by comparing the behavior of two groups: a test group that experiences the new feature or change and a control group that does not.

Typically, the test group consists of players who have been exposed to the modified game element, while the control group comprises players who continue to experience the game in its original state. By analyzing the differences in player behavior and performance between these two groups, game developers can assess the incremental impact of the introduced changes.

It further writes that “insights gained from incrementality analysis in game analytics can help developers make data-driven decisions to optimize game design, improve player engagement, enhance monetization strategies, and refine the overall player experience. By understanding the true impact of specific game elements or changes, developers can focus their efforts on features and updates that lead to measurable improvements in game metrics and player satisfaction.”

Leaning into these elaborations, let’s define incrementality in game analytics:

Def. Incrementality in game analytics: The measurement of the incremental impact of specific game design choices or features, marketing campaigns, pricing strategies, technical updates, or other interventions on player behavior, engagement, monetization, or other key performance indicators (KPIs) of a game or game portfolio. Incrementality efforts aim to understand whether the implemented changes or additions have a positive effect on the game’s metrics beyond what would have occurred without those modifications. It thereby employs various methods of causal inference that help minimize biases and control for confounding variables, ensuring that the observed differences can be attributed to the intervention in question with a quantifiable degree of confidence.

This definition heavily draws on ChatGPT’s output but extends the space of admissible methods considerably beyond AB testing and experimentation. Incrementality methods in game analytics need to, as they do in marketing analytics, encompass all that causal inference has to offer! A further addendum to the definition is the quantification of uncertainty to help analysts, designers and product managers decide which measurements to rely on and which ones to assess further or abandon.

(For completeness, I should mention that, during my online search, I found this blog post titled “Incremental Data Science for Mobile Game Development.” The title is promising, and the covered applications are actually well selected and outlined, but the post fails to deliver a definition or even touch on the subject again. There is no further mention of incrementality or related concepts like experimentation, AB testing, causal inference, randomization. I am unclear what the author intended, but as it stands, the post’s content and title are simply disjointed.)

The Game Analytics Incrementality Matrix

There is a plethora of analytical tools available for incrementality measurement. Figure 3 tries to provide an initial overview positioning the different tools on a two-dimensional matrix. The horizontal dimension addresses the degree of intervention necessary to use a specific incrementality technique. E.g., AB testing requires randomly exposing different treatments (e.g., versions of the game) to different users, so a high degree of intervention in the user experience. Propensity score matching or marketing mix modeling (MMM) on the other hand work from observational data, requiring no or almost no dedicated intervention and leveraging naturally occurring variation in exposure. Note that not requiring intervention is of course an advantage, but non-interventional methods also tend to be less precise and flexible in detecting incrementality.

The second vertical axis covers the spectrum from low-level product to high-level market touchpoints with users. At higher-level market touchpoints such as an ad platform or (Connected-)TV, a game developer clearly has less control over a user’s experience and in fact might not be able to act at the user-level at all instead deciding on spend level and strategy for a specific marketing channel.

Figure 3: The Game Analytics Incrementality Matrix, showing different tools for incrementality measurement in game analytics. The horizontal axis depicts the degree of needed intervention in the user experience and the vertical axis the proximity to market versus product. A serious game analytics effort should entail the underlined methods at a minimum.

Per the matrix shown in Figure 3, AB testing becomes less applicable as you move further away from high levels of control over a user’s experience at granular product touchpoints to low levels of control, e.g., on an ad platform. Here, the applicability of AB testing as a tool for incrementality measurement will be dependent on the ad platform and if it offers AB test-based measurement. Similarly, algorithmic personalization becomes less applicable the less you can control the user experience at the individual-level. It can get analytically involved with reinforcement learning approaches like bandits and is also usually technically costly to implement. AB testing and algorithmic personalization overlap as a simple form of the latter can involve estimating linear models with interaction terms (of the sort outcome ~ treatment + treatment*covariate) on the data of a randomized (control) trial or AB test. All of these approach leverage the idea of treatment effect heterogeneity, i.e., that the incrementality effect of an intervention (read: marketing campaign, game feature) will often be different for different users where differences are captured and measured in the observed covariates about users.

So far, we discussed methods of “interventional causal inference,” i.e., where we need to intervene to produce the data we need to perform incrementality measurement. We will now turn to observational causal inference, i.e., methods that operate from naturally occurring data without explicit intervention on our part. Difference-in-difference and synthetic control estimators thereby try to identify effects of an event of interest from differences over time. E.g., should you release a new game feature to different countries at different points in time, these methods could produce an estimate of the feature’s incremental effect on your players from this data. They can do so both in the realm of low-level product and higher-level market touchpoints. Synthetic control methods work a bit better with availability of granular data, hence why they don’t reach as far up into the market territory. As both methods benefit from a certain level of intervention, they reach into the right half (the intervention territory) of the chart.

Regression discontinuity leverages the fact that experience assignment can be arbitrary within narrow bounds of certain user characteristics. E.g., say, players need a score of 10,000 to get access to a specific feature. Regression discontinuity would then estimate the feature’s incremental effect between players that reached a score of 9,999, and didn’t get access to the feature, and players that reached a score of 10,000, and got access to the feature. The idea is that these players must be very similar other than missing one point out of 10,000. Likewise, matching methods aim to compare instances who are as similar as possible, but some were exposed to the treatment of interest, and some were not. They essentially aim to control selection effects by matching up instances based on available non-endogenous covariates.

Again, I urge you to note that non-interventional incrementality methods are great because they work from naturally occurring data, but they also are limited in their precision and flexibility. True experiments, randomized control trials, are the gold standard for incrementality measurement and causal inference. Whenever implementable at acceptable cost, they should be your incrementality method of choice. In many cases you however cannot intervene in an environment or system and non-interventional methods are your only shot at incrementality measurement. E.g., when Apple changes its appstore ranking algorithm, you cannot run an experiment to determine what impact this had on organic adoption of your apps — but you can use difference-in-difference style estimators to try and quantify the effect.

Marketing Mix Modeling in the M(atr)ix?

Now, you may be surprised to see marketing and media mix modeling in a figure about incrementality measurement in game analytics. Let me elaborate.

This class of methods was originally developed to produce estimates of the elasticity of sales in advertising on different channels and media from aggregate (high-level) observational data. That is why it is positioned at the opposite end of AB testing in Figure 3. It, however, can take different actions in a firm’s marketing mix into account, including pricing, promotion and major product changes. When a model comprehensively covers a firm’s action space across the marketing mix (the 4P: product, price, place, promotion), it is commonly called marketing mix model (MMM).

You may notice that, while MMM was conceived for estimation from aggregate observational market data, its area in Figure 3reaches into the product territory — that is because a comprehensive MMM can include measures for major product changes (the first of the 4P of the marketing mix) and produce estimates of the incremental effect of these changes on sales and other outcomes. The MMM area further reaches into the territory of interventional causal inference. This is because modern MMM implementations commonly can be calibrated using the precise incrementality measurement outputs from ad experiments.

A simple MMM can boil down to a linear regression of sales on ad spend across different channels plus some trends for competition and indicators for holidays and other key events. Which is a rather simple analytics approach. But a reliable, well-calibrated, and trusted MMM can take a lot of effort in data preparation, model estimation, and on the organizational level, e.g., to be well integrated into a company’s marketing analytics operations.

Finally, Figure 3 shows multi-touch attribution (MTA). MTA provides estimates of the fractional contribution of customers’ touchpoints with a company’s marketing efforts. To the extent that a product (=game) produces touchpoints with new customers (think word-of-mouth), its area reaches into product territory. MTA models draw on many different methods, ranging from MMM-style to game theoretic approaches such as Shapley values which is why it overlaps with other methods. Complementarities between MTA models and MMM can be particularly high, e.g., reflected in Nielsen’s definition of MTA: “[MTA] is a marketing effectiveness measurement technique that takes all of the touchpoints on the consumer journey into consideration and assigns fractional credit to each so that a marketer can see how much influence each channel has on a sale.”

TL;DR / Why Does This Matter for Game Development?

I said in the beginning of this article that incrementality programs are so important in analytics because they ensure that each action taken by a team is attributed the right amount and sort of credit. This exercise is crucially important for the team to know what design and marketing choices worked and which ones didn’t, which ones your players liked and which ones they didn’t (see Figure 1), to in turn inform future actions and strategies. Getting this right can make all the difference between building an awesome game that players love and a game that is no fun and struggles with player retention and engagement.

Leaning into the incrementality concept in marketing analytics, this article defines incrementality for game analytics and provides an initial overview of methods (Figure 3), structured along the dimensions of needed intervention in users’ experience and proximity to product versus market. The second dimension in turn influences the granularity of the available data.

Game analytics can benefit from a formal introduction of the concept of incrementality: Game design, management (e.g., live operations), and marketing need to work in complementarity, as a team, to ensure success for a game. Principled and rigorous incrementality measurement processes and tools can quantify the location and extent of these complementarities and direct the symphony of everyone coming together to build an amazing game.

A serious game analytics effort should entail the underlined methods in Figure 3 at a minimum: AB testing / experimentation, simple forms of algorithmic personalization, and marketing mix modeling. Especially, MMM-style methods may currently be underleveraged in game analytics. They can not only provide guidance for marketing efforts but also inform larger product, live operations, and marketing initiatives, especially in conjunction with a strong and well-defined experimentation roadmap.

Reach out if you want to know how and morejulian.runge@gamedatapros.com

 Like our blog? Join our substack.

Employment Application

    Resume Upload:
    (Allowed file types: pdf, doc, rtf)

    Cover Letter Upload:
    (Allowed file types: pdf, doc, rtf)