Sometimes, Everyone Agrees

I recently published a blog post arguing that over the next 5 years many commercial MMM engine developers might face an uncomfortable truth: their code and algorithms are not defensible. As part of that article, I separated the “MMM Vendor Value Prop” into four components:

  1. The core computational engine and algorithms (aka “engine and modeling capabilities”).
  2. A set of applications that use the trained model provided by the MMM to make recommendations (e.g., spend optimization and revenue forecasting).
  3. A structural model and set of data definitions.
  4. A set of integrations into data sources and production processes to run the engine and algorithms.

I then sketched out an argument that because the first two bullet points are very hard to defend, durable value will move “up the stack” into domain-and-vertical-specific intelligence, operational reliability, and ease of integration (into both other product-based components of the marketing stack and with internal toolchains and processes).

Here’s the actual statement:

The first claim I’m making is that open source will take over the first two bullet points. And the second claim I’m making is that, depending on company size, companies will either do the work associated to the last two bullet points themselves, or use an industry/vertical specific provider that leverages the open-source frameworks from the first two bullets (Larger companies will roll their own; smaller companies will use a vendor).

I also summarized that idea in a LinkedIn post with a buyer’s point-of-view question for MMM vendors: Why, concretely, is using their product a better idea than custom-coding a purpose-built solution on top of PyMC?

(using PyMC as a stand-in for open-source tooling).

To my mind, this is the key question that any vendor should be able to answer very concretely (and the answer should be on their website, in very concrete form).

Two MMM CEOs (Henry Innis (Mutinex) and Charles F. Manning (Kochava) ) disagreed publicly with the blog post. I’m genuinely happy they did. This industry needs more transparent debate, and both of their responses were professional, substantive, and worthwhile contributions to the conversation. I also want to say clearly: I respect Henry and Charles and nothing here is meant as a criticism of them or their teams.

Henry Innis’s Point: Incentives and Money Keep Vendors Ahead

Henry’s core disagreement is direct: he believes third-party MMM vendors are (and will remain) “far, far ahead” of open-source implementations (largely because commercial incentives fund product maturity).

Two specific points stood out:

  • The value is in the product around MMM, not the algorithm. Henry says most MMM value comes from solving product problems around the model, not from the modeling technique itself.

I think Henry and I are in complete agreement here.

  • AI may reduce the incentive to open source. He argues that many open-source efforts are sustained because they monetize elsewhere (implementation partnerships, customization, consulting, benchmarked data). If AI-assisted development reduces the “end state” that needs to be maintained, that value may shift into new SaaS surfaces rather than staying tied to open-source projects in their current form.

This second point is an interesting prediction in and of itself. Many open-source efforts will struggle in the years to come. An early sign of problems to come is the fact that Tailwind recently laid off 75% of their engineers.

In essence, Henry’s argument is that generative AI will cause open-source projects to falter, and commercial engines (funded by customer revenue) will be able to stay ahead.

This is a place where reasonable people can disagree. And, to be clear, I disagree with Henry: Corporate-backed open source , foundations, and vendor-adjacent ecosystems can sustain maintenance even if smaller OSS projects struggle.

Charles F. Manning’s Point: Trust is Built Outside the Engine

Charles’s response was about “trust and defensibility” – the key idea being that commercial MMM vendors, collectively, have established a basis for customer trust that enables them to defend their market (and that, because of this, the open-source engines will not get additional traction).

Using his numbering, the core of his argument is the following three objections:

  • Objection 3: Optimization is the Moat. In Charles’s view, the defensible layer is optimization: forecasting outcomes under constraints and balancing short-term performance with long-term value. He claims that commercial MMM optimization is sophisticated and delivers substantial enterprise value and that similar optimization layers don’t exist in typical open-source stacks today.

The disagreement Charles and I have is twofold. First, I am making a set of predictions about what will happen, and what will be true 5 years from now, and he’s talking about what exists in the market today (to some extent, we are talking about different things). For other points of view on the current state of open-source MMM, I recommend the discussions from Digiday, Search Engine Land, and EMarketer.

And, second, I simply don’t think optimizers and spend forecasters are defensible technologies.

  • Objection 4: Domain Expertise > Generic Modeling. Charles also emphasizes that domains like mobile advertising have unique constraints (attribution nuances, conversion lags, SKAdNetwork gaps, and so on). You can’t model what you don’t understand, and “generic MMM” will miss important real-world structure. Kochava’s product bakes in domain-specific intelligence based on more than a dozen years in the market.

I don’t think Charles and I disagree on this at all. This is actually a foundational thesis for Game Data Pros— effective optimization requires domain expertise and verticalization. A substantial part of the value-add is knowing what to do in a specific domain, not the core engine or modeling capabilities.

  • Objection 5: Modeling Code is not the Product. Charles states that “Model architecture is only ~10% of the challenge.” The rest is data reliability, validation, uplift testing, attribution reconciliation, and governance. These are the operational “scaffolding” that makes results defensible.

Here too, I think Charles and I are in complete agreement. And we both agree with Henry.

Charles concludes his response by saying:

 “Moving Up the Stack” Is What We Already Do. The article claims value will shift from algorithms to integration, QA, and scenario planning. That’s already our model. AIM is SaaS MMM built for action, not academic benchmarking. StationOne is next.

To which I can only say: Great. We are in total agreement.

Except, of course, that I think performance standards and benchmarks matter, and that the phrase “academic benchmarking” could be viewed as somewhat dismissive. Without performance standards and benchmarks, I don’t see how a customer can make an informed choice between the 50 or so providers in Marketing Science Today’s Provider map.

There’s a Lot of Common Ground Here

Henry and Charles’s objections align pretty closely with each other and with what I actually wrote.

  • Henry: the value is mostly in the product around MMM, and commercial incentives fund that product.
  • Charles: the moat is optimization, domain intelligence, reliability, QA, validation, integrations, governance (i.e., everything around the model and algorithms).

That’s extremely close to my claim that engines and algorithms are becoming commodities while value mostly becomes verticalized and domain-specific.

So, where’s the disagreement? I think it’s mostly about what “open source replaces” actually means.

When I say open source “replaces” commercial MMM implementations, I don’t mean the world stops buying (or leasing) MMM engines in the short-term. I mean that the core modeling and optimization stack will be increasingly based on open source, and that, over time, we will have open baseline implementations (increasingly good, increasingly automated).

Faced with that, some commercial vendors will continue to develop their engines. But most will try to win by layering value on top of open source platforms (and not by asking customers to trust a proprietary system without independent evidence).

In much the same way that 60% of developers build on PostgreSQL, I would be willing to bet that, in 5 years’ time, 80% of new MMMs will be built on an open-source framework.

About Benchmarks and Test Suites

In a separate LinkedIn post, I praised Mutinex for building an open-source framework for evaluating MMMs and publishing “rough benchmarks” for what good performance looks like. We can argue about whether they chose the right metrics, and whether or not their performance thresholds are the right ones, but I love the fact that they jump-started a conversation about what metrics and performance.

  • MAPE / sMAPE: excellent <5%, good 5–10%, acceptable 10–15%, poor >15%
  • R²: excellent >0.9, good 0.8–0.9, acceptable 0.6–0.8, poor <0.6
  • Stability & sanity checks: parameter change, perturbation change, and placebo ROI bands

Even more commendably, Henry publicly praised Recast for pioneering the public discussion of MMM performance. And he was right to do so: Recast’s Accuracy Dashboards, discussion of their model validation process, and how to do back testing are exemplary.

Simply put, if we think MMMs are a critical part of the marketing infrastructure, and we think there are substantial performance differences between them, then we ought to be able to define objective performance standards and metrics, and then compare different MMMs using publicly available test suites in exactly the same way that people compare databases.

What we shouldn’t do is claim that the open-source frameworks (or our competitors) aren’t very good, but not have a public test suite or standardized definitions of what good means.

The Path Forward is Open Source and Test Suites

My original article was long (~4,000 words). Here’s a simplified form of the predictions.

  • The modeling and optimization core will become mostly open. I don’t see any reason to recant (any of) the predictions. The trajectory is the same: better libraries, better tooling, and (with AI) faster iteration and adoption.
  • Without a shared test suite and standards of accuracy, open source will win “the engine wars” by default. Without hard evidence, customers have no objective reason to believe a specific proprietary engine is better, and plenty of reasons to prefer an open implementation. And because, over time, for the reasons outlined in the original article, the open-source implementations will pull ahead and become the defaults engines which get plugged into enterprise marketing architectures.
  • Vendors will differentiate above the core. Domain-specific models, priors, and constraints, automated QA, data pipelines, experimentation and uplift integration, governance, and workflow UX are all important pieces of an overall marketing architecture, and they’re the place where differentiation and value creation will happen.

In an upcoming article, I’m going to focus on the second of these bullet points and write more about what credible MMM engine validation should look like (and what a public test harness could include).

But for now, I’m just happy we’re all talking about this in public.

Reporting from the 2025 Game Revenue Optimization Mini-Summit

(To learn how Game Data Pros can help you optimize your games, contact us)

In 2024, we held the first-ever Revenue Optimization in Games Mini-Summit at GDC. We did it because we didn’t like that there aren’t many revenue optimization talks at GDC and that, in general, the idea of “Game Revenue Optimization” doesn’t seem to get much, if any, mindshare at industry conferences.

So, instead of grousing, we organized our own summit in 2024. The feedback we got was incredible —  the attendees loved the event, they thought the talks were amazing, and, more generally, they spent the next year asking us if we were going to do it again. 

Spoiler Alert— We did. We rented the same venue (the incredible American Bookbinders Museum), ordered a few thousand dollars worth of goat-cheese tarts and coconut macaroons, invited the world, and put on a show.

And what a show it was!

First and foremost, we had a set of world-class talks

After a brief introduction by Pallas Horwitz, the day’s emcee, the talks began at 2:15. We had five speakers.

  • Our CEO, Bill Grosso, opened the show with “10 Reasons MMMs Are More Interesting Than You Think” — an overview of how Generative AI combines with open-source libraries like Meridian to make building a good and useful MMM much more accessible to small companies than it was even 5 years ago (slides). 
  • Then Ryo Shima, CEO of JetSynthesys Japan, presented “How Game Revenue Optimization is Different in Japan” — an in-depth discussion of the behavioral differences between Japanese and Western gamers, and how that impacts monetization strategies (slides).
  • Tiffany Keller, one of the superstars at Liquid and Grit, followed Ryo and gave a talk on “Advanced Hybrid Monetization.” This was the graduate seminar version of the roundtable she held last February and was a comprehensive overview of the state of the art in hybrid monetization. 
  • And, finally, Joost Van Dreunen and Julian Runge closed the presentation part of the day by presenting a sweeping overview of the future of brand engagement with gaming (slides).
Speakers, from Left: Pallas Horwitz, Bill Grosso, Ryo Shima, Tiffany Keller, Julian Runge, and Joost Van Dreunen.

Second, we had an amazing audience

Like last year, we were slightly nervous about this — the room only holds 105 people, and we had 340 people registered. Ultimately, we decided to issue 180 tickets. 75 people came, most stayed for the entire summit, and the event turned into a caffeine-fueled group conversation about revenue optimization. 

As a side note, the audience included at least one certified game design legend among the other luminaries. 

Third, the happy hour was delightful

“Most awesome part of GDC.” — Evan Van Zelfden

The combination of the incredible speakers and the amazing audience meant that the happy hour was more than just an excuse to have salmon brioches and artichoke salad while downing plastic glasses of red wine. The food was good, but the conversations were excellent and lasted until the museum closed.

Incrementality in Game Analytics: Beyond AB Tests, on to Bandits and Marketing Mix Models

(For a TL;DR scroll to the end of the article)

Incrementality is a hot topic in marketing analytics, referring to “the measurement and analysis of the incremental impact of a marketing campaign or initiative. It aims to determine whether the marketing efforts are actually driving additional value or revenue beyond what would have occurred naturally without the campaign.” That’s the first paragraph of ChatGPT’s answer when prompted “What is incrementality in marketing analytics?”

When asked the same thing, Google delivers a similar answer drawing on the website Marketing Evolution: “Incrementality refers to growth that can be directly attributed to specific marketing efforts above and beyond the existing brand equity. For example, how much a certain channel, tactic, or overall campaign helped influence an increase in sales, newsletter sign-ups, etc.” (see Figure 1) Google’s answer confirms that incrementality is a thing in marketing analytics and not something that ChatGPT hallucinated.

Figure 1: The ideal way of measuring the incremental impact of a business action, e.g., a marketing campaign or a new game feature, on an outcome of interest, e.g., conversions or retention, is through a randomized control trial. Source: the author’s mind and hand (sorry if it’s not pretty).

Incrementality programs are so important in analytics because they aim to quantify the incremental causal effect that different actions, tactics and strategies had on relevant outcomes for the firm. Such precise causal measurement ensures that each action is attributed the right amount and sort of credit, in turn crucially informing the firm’s future actions and strategies. It can make all the difference between a fast approach to and increase in profitability. Or the inverse.

Now let’s see what we find about incrementality in game analytics:

ChatGPT: “In the context of game analytics, incrementality refers to measuring the incremental impact of a specific game feature, update, or intervention on player behavior, engagement, monetization, or other key performance indicators (KPIs). It aims to understand whether the implemented changes or additions have a positive effect on the game’s metrics beyond what would have occurred naturally without those modifications.”

Google: N/A, see Figure 2. Google is unable to answer the question, surfacing a link to a resource by the well-known company GameAnalytics that has nothing to do with incrementality. Then it offers material related to incrementality in marketing. Also Google Scholar has no clear leads.

In many ways, this tells us that –

  • Incrementality is not yet a concept of importance in game analytics;
  • It should be; due to its clear value proposition articulated above and due to ChatGPT’s plausible and succinct response. (I find ChatGPT very useful to check the plausibility and merit of new ideas in this way. The exercise is further helpful as ChatGPT commonly surfaces relevant definitions, references and related concepts.)

Based on these two points, this article aims to introduce the incrementality concept to game analytics. We will first define in more detail what incrementality means in the context of game analytics and then I will try to provide an overview of what I believe are important incrementality methods for game analytics and game analysts.

Figure 2: The concept of incrementality is foreign to game analytics as it stands. At a minimum, it has not been formally introduced — something that this blog post endeavors to change. Source: Google Search at www.google.com and scholar.google.com

Defining Incrementality in Game Analytics

While ChatGPT’s answer on incrementality in marketing analytics mentions a spectrum of methods that “aim to minimize biases and control for confounding variables, ensuring that the observed differences can be attributed to the marketing efforts with a reasonable degree of confidence,” its answer for incrementality in game analytics pretty squarely centers on A/B testing (experimentation):

Game developers and analysts often employ incrementality analysis to evaluate the effectiveness of specific game design choices, marketing campaigns, pricing strategies, or other initiatives. The goal is to isolate the impact of the intervention by comparing the behavior of two groups: a test group that experiences the new feature or change and a control group that does not.

Typically, the test group consists of players who have been exposed to the modified game element, while the control group comprises players who continue to experience the game in its original state. By analyzing the differences in player behavior and performance between these two groups, game developers can assess the incremental impact of the introduced changes.

It further writes that “insights gained from incrementality analysis in game analytics can help developers make data-driven decisions to optimize game design, improve player engagement, enhance monetization strategies, and refine the overall player experience. By understanding the true impact of specific game elements or changes, developers can focus their efforts on features and updates that lead to measurable improvements in game metrics and player satisfaction.”

Leaning into these elaborations, let’s define incrementality in game analytics:

Def. Incrementality in game analytics: The measurement of the incremental impact of specific game design choices or features, marketing campaigns, pricing strategies, technical updates, or other interventions on player behavior, engagement, monetization, or other key performance indicators (KPIs) of a game or game portfolio. Incrementality efforts aim to understand whether the implemented changes or additions have a positive effect on the game’s metrics beyond what would have occurred without those modifications. It thereby employs various methods of causal inference that help minimize biases and control for confounding variables, ensuring that the observed differences can be attributed to the intervention in question with a quantifiable degree of confidence.

This definition heavily draws on ChatGPT’s output but extends the space of admissible methods considerably beyond AB testing and experimentation. Incrementality methods in game analytics need to, as they do in marketing analytics, encompass all that causal inference has to offer! A further addendum to the definition is the quantification of uncertainty to help analysts, designers and product managers decide which measurements to rely on and which ones to assess further or abandon.

(For completeness, I should mention that, during my online search, I found this blog post titled “Incremental Data Science for Mobile Game Development.” The title is promising, and the covered applications are actually well selected and outlined, but the post fails to deliver a definition or even touch on the subject again. There is no further mention of incrementality or related concepts like experimentation, AB testing, causal inference, randomization. I am unclear what the author intended, but as it stands, the post’s content and title are simply disjointed.)

The Game Analytics Incrementality Matrix

There is a plethora of analytical tools available for incrementality measurement. Figure 3 tries to provide an initial overview positioning the different tools on a two-dimensional matrix. The horizontal dimension addresses the degree of intervention necessary to use a specific incrementality technique. E.g., AB testing requires randomly exposing different treatments (e.g., versions of the game) to different users, so a high degree of intervention in the user experience. Propensity score matching or marketing mix modeling (MMM) on the other hand work from observational data, requiring no or almost no dedicated intervention and leveraging naturally occurring variation in exposure. Note that not requiring intervention is of course an advantage, but non-interventional methods also tend to be less precise and flexible in detecting incrementality.

The second vertical axis covers the spectrum from low-level product to high-level market touchpoints with users. At higher-level market touchpoints such as an ad platform or (Connected-)TV, a game developer clearly has less control over a user’s experience and in fact might not be able to act at the user-level at all instead deciding on spend level and strategy for a specific marketing channel.

Figure 3: The Game Analytics Incrementality Matrix, showing different tools for incrementality measurement in game analytics. The horizontal axis depicts the degree of needed intervention in the user experience and the vertical axis the proximity to market versus product. A serious game analytics effort should entail the underlined methods at a minimum.

Per the matrix shown in Figure 3, AB testing becomes less applicable as you move further away from high levels of control over a user’s experience at granular product touchpoints to low levels of control, e.g., on an ad platform. Here, the applicability of AB testing as a tool for incrementality measurement will be dependent on the ad platform and if it offers AB test-based measurement. Similarly, algorithmic personalization becomes less applicable the less you can control the user experience at the individual-level. It can get analytically involved with reinforcement learning approaches like bandits and is also usually technically costly to implement. AB testing and algorithmic personalization overlap as a simple form of the latter can involve estimating linear models with interaction terms (of the sort outcome ~ treatment + treatment*covariate) on the data of a randomized (control) trial or AB test. All of these approach leverage the idea of treatment effect heterogeneity, i.e., that the incrementality effect of an intervention (read: marketing campaign, game feature) will often be different for different users where differences are captured and measured in the observed covariates about users.

So far, we discussed methods of “interventional causal inference,” i.e., where we need to intervene to produce the data we need to perform incrementality measurement. We will now turn to observational causal inference, i.e., methods that operate from naturally occurring data without explicit intervention on our part. Difference-in-difference and synthetic control estimators thereby try to identify effects of an event of interest from differences over time. E.g., should you release a new game feature to different countries at different points in time, these methods could produce an estimate of the feature’s incremental effect on your players from this data. They can do so both in the realm of low-level product and higher-level market touchpoints. Synthetic control methods work a bit better with availability of granular data, hence why they don’t reach as far up into the market territory. As both methods benefit from a certain level of intervention, they reach into the right half (the intervention territory) of the chart.

Regression discontinuity leverages the fact that experience assignment can be arbitrary within narrow bounds of certain user characteristics. E.g., say, players need a score of 10,000 to get access to a specific feature. Regression discontinuity would then estimate the feature’s incremental effect between players that reached a score of 9,999, and didn’t get access to the feature, and players that reached a score of 10,000, and got access to the feature. The idea is that these players must be very similar other than missing one point out of 10,000. Likewise, matching methods aim to compare instances who are as similar as possible, but some were exposed to the treatment of interest, and some were not. They essentially aim to control selection effects by matching up instances based on available non-endogenous covariates.

Again, I urge you to note that non-interventional incrementality methods are great because they work from naturally occurring data, but they also are limited in their precision and flexibility. True experiments, randomized control trials, are the gold standard for incrementality measurement and causal inference. Whenever implementable at acceptable cost, they should be your incrementality method of choice. In many cases you however cannot intervene in an environment or system and non-interventional methods are your only shot at incrementality measurement. E.g., when Apple changes its appstore ranking algorithm, you cannot run an experiment to determine what impact this had on organic adoption of your apps — but you can use difference-in-difference style estimators to try and quantify the effect.

Marketing Mix Modeling in the M(atr)ix?

Now, you may be surprised to see marketing and media mix modeling in a figure about incrementality measurement in game analytics. Let me elaborate.

This class of methods was originally developed to produce estimates of the elasticity of sales in advertising on different channels and media from aggregate (high-level) observational data. That is why it is positioned at the opposite end of AB testing in Figure 3. It, however, can take different actions in a firm’s marketing mix into account, including pricing, promotion and major product changes. When a model comprehensively covers a firm’s action space across the marketing mix (the 4P: product, price, place, promotion), it is commonly called marketing mix model (MMM).

You may notice that, while MMM was conceived for estimation from aggregate observational market data, its area in Figure 3reaches into the product territory — that is because a comprehensive MMM can include measures for major product changes (the first of the 4P of the marketing mix) and produce estimates of the incremental effect of these changes on sales and other outcomes. The MMM area further reaches into the territory of interventional causal inference. This is because modern MMM implementations commonly can be calibrated using the precise incrementality measurement outputs from ad experiments.

A simple MMM can boil down to a linear regression of sales on ad spend across different channels plus some trends for competition and indicators for holidays and other key events. Which is a rather simple analytics approach. But a reliable, well-calibrated, and trusted MMM can take a lot of effort in data preparation, model estimation, and on the organizational level, e.g., to be well integrated into a company’s marketing analytics operations.

Finally, Figure 3 shows multi-touch attribution (MTA). MTA provides estimates of the fractional contribution of customers’ touchpoints with a company’s marketing efforts. To the extent that a product (=game) produces touchpoints with new customers (think word-of-mouth), its area reaches into product territory. MTA models draw on many different methods, ranging from MMM-style to game theoretic approaches such as Shapley values which is why it overlaps with other methods. Complementarities between MTA models and MMM can be particularly high, e.g., reflected in Nielsen’s definition of MTA: “[MTA] is a marketing effectiveness measurement technique that takes all of the touchpoints on the consumer journey into consideration and assigns fractional credit to each so that a marketer can see how much influence each channel has on a sale.”

TL;DR / Why Does This Matter for Game Development?

I said in the beginning of this article that incrementality programs are so important in analytics because they ensure that each action taken by a team is attributed the right amount and sort of credit. This exercise is crucially important for the team to know what design and marketing choices worked and which ones didn’t, which ones your players liked and which ones they didn’t (see Figure 1), to in turn inform future actions and strategies. Getting this right can make all the difference between building an awesome game that players love and a game that is no fun and struggles with player retention and engagement.

Leaning into the incrementality concept in marketing analytics, this article defines incrementality for game analytics and provides an initial overview of methods (Figure 3), structured along the dimensions of needed intervention in users’ experience and proximity to product versus market. The second dimension in turn influences the granularity of the available data.

Game analytics can benefit from a formal introduction of the concept of incrementality: Game design, management (e.g., live operations), and marketing need to work in complementarity, as a team, to ensure success for a game. Principled and rigorous incrementality measurement processes and tools can quantify the location and extent of these complementarities and direct the symphony of everyone coming together to build an amazing game.

A serious game analytics effort should entail the underlined methods in Figure 3 at a minimum: AB testing / experimentation, simple forms of algorithmic personalization, and marketing mix modeling. Especially, MMM-style methods may currently be underleveraged in game analytics. They can not only provide guidance for marketing efforts but also inform larger product, live operations, and marketing initiatives, especially in conjunction with a strong and well-defined experimentation roadmap.

Reach out if you want to know how and morejulian.runge@gamedatapros.com

 Like our blog? Join our substack.

Employment Application

    Resume Upload:
    (Allowed file types: pdf, doc, rtf)

    Cover Letter Upload:
    (Allowed file types: pdf, doc, rtf)