Are all donors the same? Hard to tell looking at sector "best practice"
Kevin Schulman
Founder, DonorVoice, DVCanvass/DVCalling. Managing Editor, The Agitator
(reader note: this post will end with a control beating finding with a huge impact on the otherwise stubborn retention metric but take the ride to get there, it is, hopefully, worth it.)
The random nth, that tried and true approach to testing assures (in theory) the test group matches the control so that the only difference is our treatment of the test group and we can therefore conclude that any difference in response is because of the test idea rather than unknown, unseen differences between the test and control group.
This approach makes one massive assumption that is never talked about and we’d argue, never understood, conceptually, mathematically, or otherwise.
The difference between human beings is smaller than the anticipated impact of our treatment/test/idea. Said differently, everybody is the same.
How is this being assumed? What if men love the test idea but women hate it? Because your file skews female their reaction pulls down the test average and it fails to beat the control. And yet, there was a control beating idea hidden in the losing results because you assumed everyone is the same; men and women in this case.
But wait a second, nobody in fundraising thinks all donors are the same, the sector segments the crap out the file. There are cash donors, monthly donors, lapsed donors, donors who also do advocacy, nonsense RFM buckets of donors, the even more nonsensical demographics and psychographics used to highlight all the lovely differences between donors and then the crème de la crème, personas, either statistically derived groups of donors based on a potpourri of random attributes or worse, just made-up archetypes that likely describe no single donor well (your first indication to ignore it entirely) and more to the point, failing to explain the behavior of a single donor (your second indication to ignore it entirely).
On the one hand, we segment like crazy on random but deceptively alluring crap and on the other, we do the vast majority of our testing with the random nth assignment that assumes our test idea is so powerful it will supersede all these differences we think exist and be a winner because most folks like it better than the control idea; because most folks are the same…
Human beings are complex and yes, very different from each other but not in the ways we traditionally think. (We’ve written extensively about why demographics are garbage here).
The takeaway point, that may represent the most misapplied/misunderstood concept in fundraising and the biggest, missed opportunity; the main reason to segment human beings is because you discovered groups with different reasons for supporting you that warrant different treatments.
Instead, most charities stick to the generic average, which is exactly what most controls represent, an average idea that is average for most people. We then bestow an enormous gift to this average idea that gives it enormous power having nothing to do with the quality of the idea itself; exposure via volume over time.
So how the hell do you beat the (average) control with all this artificial power?
In short, think time. Upfront think time to formulate a test that is based on a hypothesis about human behavior. Put in some desk research, a lit review or a consultation with a subject matter expert (i.e. rigor) into what is often a slavish adherence to the idea and process of testing instead of ideas themselves.
Here are test results and an illustration of the upside this type of thinking and rigor can provide. This is a year-long test with newly acquired donors. The metric that matters is retention rate.
Two test groups were created using the standard, random nth approach and visually illustrated with our blue, green, yellow and orange people. The treatment is sending additional, no-ask communications, beyond the control number of comms, to newly acquired donors; heavy get 12 more during the year, light get 6.
At the end of the year, the yellow bars reflect failure. No difference in the retention rate between the test groups and the control but more money/time was spent with the additional comms so the tests are big losers.
But, what if, hypothetically, the blue people loved it and the orange people hated it as the callout suggests? We had a winning idea for an important, sizeable subgroup (blue people) but because we treated everyone the same, that finding was lost. This happens all the time.
Finding your blue people cannot be arrived at after the fact by breaking out the average results (i.e. the yellow bar) with all the data on your CRM. Why? It isn’t because you won’t find differences. In fact, the more data you have appended to your CRM the more differences you will find. This approach is what is referred to as a fishing expedition, just random hunting for what amounts to statistical noise. Lots and lots of noise.
To find your blue people you need a hypothesis upfront about what makes blue people different from orange and then design a test to try and find support for this theory.
This is exactly what we did. Our hypothesis was not that sending more no-ask comms would increase retention.
Our hypothesis was two-fold:
1) Higher Commitment (our proprietary measure of relationship strength, which we publish so it ain't black box) donors require less content designed to build the relationship that is already built. In fact, this additional content will create irritation suggesting the charity doesn’t know them as donors and by extension, is wasting their money, which means sending more stuff will make retention worse.
2) A second hypothesis is that Low Commitment donors would benefit, to a point, with some additional content designed to build that relationship.
At point of acquisition we measured Commitment (and satisfaction with the experience via a short, purposeful 5 question survey) for every single donor in the control and test groups.
The real analysis, not the ‘fake news’ analysis that assumes everybody is the same, required analyzing the High Commitment donors (blue and green people) separately from the Low Commitment donors (orange and yellow people).
As you can see, the results are exactly as hypothesized and massive. All of this massive impact (positive and negative) was hidden in the random nth, ‘fake news” world where we assume everyone is the same.
If any charity (or agency) has discovered a way to increase first year retention by 12 points, let us know. This sort of outcome is highly unlikely to come from an internal brainstorm, a white board session or an agency response to a brief. This only comes about with rigor and subject matter expertise about why people behave as they do, a well-designed test and the proper analysis.
But, the starting point is accepting that people are indeed different just hardly ever in the traditional ways we currently slice and dice the world.