If Your Incrementality Model Is “Better,” Ship the Test Suite

(AKA Why “Trust Us, It’s Better” Is the Wrong Way to Ship Measurement Algorithms)

A new algorithm just dropped in the marketing measurement ecosystem.

And I had a very mixed reaction.

On one hand: Heck Yeah. I love seeing teams invest in making geo-testing and incrementality analysis more reliable. This stuff is hard. The world is noisy. Decision-makers want something actionable, not an uncertainty interval that’s wider than a barn door.

On the other hand: Sigh. the announcement was basically “we built a new thing, it’s better than the open source equivalents, trust us.” The claim is that it produces less biased estimates, narrower intervals, and better calibration. But without sharing much about the method or the test suite that backs those claims up.

I’m excited about innovations in incrementality. I’m less excited about black-box measurement claims—especially when teams and budgets depend on them.

Why I care (and why this keeps coming up)

For context: GDP sometimes does UA benchmarking and measurement diagnostics—assessing a customer’s current setup (MMM, incrementality, vendor stack), and then giving them a roadmap for improvement.

These engagements usually include some form of vendor analysis (e.g. helping our customers make build vs buy decisions). And in my experience, a meaningful fraction of MMM / incrementality vendors do not share technical details about how their models work.

That usually triggers three red flags for me:

One evolutionary (the “Bill Joy” problem)
One practical (the “don’t grade your own homework” problem)
One ideological (the “measurement is a shared scientific inheritance” problem)

Let’s talk through them.

Red Flag #1: The World Will Out-Innovate You (aka Joy’s Law + Open Innovation)

Bill Joy’s famous line is:

“No matter who you are, most of the smartest people work for someone else.”

A straightforward consequence of Joy’s law is that most innovation happens elsewhere. And that your job, as a company, should include systematically surveying and learning from the wider ecosystem (as a side-note, it’s not a coincidence that Bill Joy’s work on Unix was a key enabler of the open-source revolution).

The academic literature goes one step further: you should not only systematically survey and learn from the wider ecosystem. You should also give back. Knowledge-sharing can be a rational, value-creating strategy, because it attracts external effort and converts it into your advantage. Here’s two important papers on this subject:

Lerner & Tirole’s economics work explains how open source participation can be motivated by things like career concerns and reputation—and why that’s not just altruism.
Their later synthesis makes the case that a lot of open source dynamics are explainable using standard economics (theories of labor and industrial organization), i.e., it’s not “vibes,” it’s incentives.

Note also that firms don’t necessarily have to go “fully open.” There’s a well-studied middle ground: selective revealing. Joachim Henkel’s work on embedded Linux describes firms revealing selectively—sharing meaningful chunks of firm-developed innovation to get external support and ecosystem benefits, while still protecting some competitive IP.

In a later paper, Henkel, Schöberl, and Alexy go further: they describe how customer demand for openness can trigger a positive feedback loop, and eventually openness becomes a new dimension of competition.

That last sentence is the key: once open-source takes hold, and once openness becomes competitive, “closed by default” becomes an evolutionary dead-end. Which means that when a vendor, any vendor, announces a new algorithm but won’t disclose method details or evaluation infrastructure, in an area where there is a credible and growing open-source presence, my spidey-sense starts to tingle. Is this actually an improvement on the state of the art? And, even if it is, is it about to get obsoleted by the open ecosystem?

This question matters a lot because switching costs are real. How can I recommend a vendor to my customers if I suspect that their systems are going to become obsolete in short order?

Red Flag #2: Don’t Grade Your Own Homework (Especially in Measurement)

The second pragmatic red flag is simpler:

When a vendor is the only one who can evaluate the vendor’s product, we’re doing marketing and not science.

A black-box algorithm and a proprietary test suite and self-reported “we’re better” claims is very hard to assess responsibly (especially if, like GDP, you’re advising customers and their budgets). And while a potential customer could, in theory, use a free-trial period to do the comparison themselves, most customers don’t have the ability to do that (and especially don’t have the ability to do that across a suite of potential vendors). Not to mention that having every potential customer do a bakeoff against a suite of vendors is monumentally inefficient.

Acknowledging the importance of verification is a great starting point. But it’s almost meaningless if the details of the verification aren’t fully available.

The PyMC case study: a rare “this is how you do it” moment

This is why I was genuinely happy to see the PyMC Labs team publish an apples-to-apples MMM benchmark comparing PyMC‑Marketing and Google’s Meridian.

What they did right:

They explicitly set out to create and publish a rigorous technical benchmark on realistic synthetic datasets covering different scales (from “startup” to “enterprise”).
They aligned model structures and used identical priors and sampling configurations to keep it fair.
They held a webinar and then published a video walking through the comparison.
They made the benchmark code publicly available on GitHub for reproducibility. The repo itself is not just a toy notebook. It’s a benchmarking suite with data generation and parameter recovery tooling, and it explicitly supports comparing inference methods/libraries.

Even if you disagree with their modeling choices, you can do something extremely powerful:

You can argue with the algorithms and the code instead of arguing with the vibes.

Nico Neumann deserves a special callout here. His LinkedIn post about the PyMC <-> Meridian comparison generated one of the most informative LinkedIn threads in recent memory.

That’s how a field levels up. And it’s the ecosystem pattern I’d love incrementality vendors to lean into more often:

Publish methodology details
Publish test suites
Publish failure modes
And compete on product + implementation + support, not secrecy

Red Flag #3: Measurement is a Shared Scientific Inheritance (and Secrecy Slows the Whole Genre)

Here’s the more ideological point:

MMM and incrementality aren’t new. These tools are built on decades of rigorous academic work, plus a growing body of open-source implementations:

Google’s open-source Meridian.
Meta’s open-source Robyn.
The entire PyMC marketing toolkit, which includes both a vanilla MMM class and a multidimensional model.
Open geo-testing toolkits like GeoLift.
Bayesian data-science toolkits like ArviZ.
….

We are all building on shared foundations. And when vendors keep core methodology opaque, the science progresses slower, trust erodes faster, and customers lose confidence more easily.

AUTHOR

Bill Grosso

More from Author

Sometimes, Everyone Agrees

9 Overlapping Predictions That, Collectively, Explain Why Open Source Will Mostly Replace Commercial MMM Implementations Sometime in the Next Five Years

If Your Incrementality Model Is “Better,” Ship the Test Suite

(AKA Why “Trust Us, It’s Better” Is the Wrong Way to Ship Measurement Algorithms)

Why I care (and why this keeps coming up)

Red Flag #1: The World Will Out-Innovate You (aka Joy’s Law + Open Innovation)

Red Flag #2: Don’t Grade Your Own Homework (Especially in Measurement)

Red Flag #3: Measurement is a Shared Scientific Inheritance (and Secrecy Slows the Whole Genre)

AUTHOR

Bill Grosso

More from Author

Sometimes, Everyone Agrees

9 Overlapping Predictions That, Collectively, Explain Why Open Source Will Mostly Replace Commercial MMM Implementations Sometime in the Next Five Years

Seven Things That Are Absolutely Going To Happen in 2026

CATEGORIES

Employment Application

If Your Incrementality Model Is “Better,” Ship the Test Suite

(AKA Why “Trust Us, It’s Better” Is the Wrong Way to Ship Measurement Algorithms)

Why I care (and why this keeps coming up)

Red Flag #1: The World Will Out-Innovate You (aka Joy’s Law + Open Innovation)

Red Flag #2: Don’t Grade Your Own Homework (Especially in Measurement)

Red Flag #3: Measurement is a Shared Scientific Inheritance (and Secrecy Slows the Whole Genre)

Share this:

AUTHOR

Bill Grosso

More from Author

Sometimes, Everyone Agrees

9 Overlapping Predictions That, Collectively, Explain Why Open Source Will Mostly Replace Commercial MMM Implementations Sometime in the Next Five Years

Seven Things That Are Absolutely Going To Happen in 2026

CATEGORIES

Share this:

Employment Application