Visitor Bloat: Empty Calories and Your Digital Data
for Ihab El-Waly who keeps teaching us things
Most marketers probably think a new visitor is someone visiting your site for the first time. That would make a returning visitor someone who's coming back for a second+ time.
So far, so good.
Most analytics tools, including Google Analytics 4 (GA4), use sessions to distinguish between new and returning visitors.
And this is where things get interesting.
?? Session-based tracking is highly unstable.
Generally, GA4 ends a session after 30 minutes of inactivity. If a user visits your site again after this period, GA4 will count this visitor as a new visitor.
Additionally, GA4 often counts visits from different campaigns or ads as new visitors. And sessions also reset at midnight so a user active on your site from 11:50pm to 12:10am may be counted as two separate visitors.
This gives us two specific vectors -- time and source -- that may harm our dataset, fragment visitor identities, and create confusion about whether visitors are, in fact, new or returning (and how many total visitors there really are in a given time period).
? Persistent UUID-based tracking is much more stable.
Confection takes a different approach.
We define a "new visitor" as a UUID with exactly one pageview within the specified timeframe and no historical pageviews before this timeframe. Conversely, a "returning visitor" is a UUID that has at least one pageview in any historical timeframe, as well as one or more pageviews within the specified timeframe. A UUID with a historical pageview from a year ago (or ten years ago) will be correctly identified as a returning visitor.
This approach provides a much more precise view of visitor behavior compared to other analytics tools. It gives our customers a more cohesive and accurate understanding of visitor activity over time and enables more precise visitor targeting, better conversion tracking, and deeper insights into customer journeys.
It's also far more aligned with the spirit of the definition of new/returning visitors that most marketers live and work with each day:
A new visitor is someone visiting your site for the first time. A returning visitor is someone who's coming back for a second+ time.
Now, this is where things get very, very interesting.
领英推è
1?? Inflated Counts
In addition to artificially inflating new visitor counts, if GA4 counts the same visitor multiple times across different sessions or devices, this could lead to higher overall visitor counts. (A new visitor will always be a +1 to the total count, after all.) This very well could inflate your overall visitor count by a comically large degree.
Some of our customers have seen a 15x greater total "visitor" count in GA4 vs. Confection. And this is why: virtually all users being logged as new, bloating the overall visitor count to absurd levels.
2?? Significant Identity Resolution Issues
In these cases, GA4 logged 93% more visitors than Confection did. This suggests it's able to resolve identities < 10% of the time. The session-based model not only over-counts visitors. It struggles to say who, and who is not, the same person. This implies that other attributes tied to visitors -- demographics, behavior patterns, engagement data -- may also be inaccurate or fragmented.
We'd need some sort of controlled comparison to be sure. But if we take this at face value, it would mean Confection is accurately resolving identities > 90% of the time, which is exciting and encouraging.
?? Why does all this matter? What are they takeaway lessons here?
I spend a lot of time making the claim that digital marketing data is, paradoxically, one of the most important and least valued/understood/well-defined assets on planet earth. People shrug at digital analytics and allow them to be noisy in a way we'd never tolerate in finance, healthcare, or mechanical engineering.
This is a great example of this in action.
For example, consider the profound difference between a company's lead conversion rate if it's getting 26,000 visitors vs. 1,700. If you're averaging 100 leads per month, you go from 0.38% to 5.88%. That's someone's bonus, someone's promotion, someone's professional capital, someone's job.
These are also two completely different strategic paths.
The first scenario suggests a content/persona optimization problem: "We're not bringing the right people to our site, and/or our content doesn't interest them." The second is a traffic problem: "Our content and personas are dialed in. We just need more visitors."
For comparison's sake, imagine the CFO of a publicly-traded company doesn't know how to segment revenue into new and retained buckets but still had to prepare a GAAP-compliant report for tax purposes and security filings and advise his team on next term's strategy. The CFO could not, and whatever he tried to cobble together would be misleading and inexact.
This simple new/returning visitor issue affects millions of companies and tens of millions of of marketing professionals and billions of customers. The confusion is real and ever-present.
Digital marketing data quality really matters. Definitions really matter. Methodologies matter. Data independence matters. These contribute to, or detract from, quantifiable, material outcomes.
Marketers, we are here working for you. If you want to talk, let's talk.