Attribution 101
In its simplest form, attribution is a way to divide credit among the marketing channels for a conversion. The more sophisticated methods of attribution can explain the causation. Before getting into the various attribution models, first some basic concepts.
Conversion
Conversion is equivalent to the dependent variable in a regression analysis. It can be any user activity of interest. For a eCommerce site, checkout is a key conversion event. For a gaming app, conversion can be the purchase of paid features. For a content provider, click through is a strong indicator of user intent and therefore a conversion event.
While every user activity can be a conversion, companies usually align the conversion events with their KPIs. After all, the purpose of attribution is to understand what’s driving the conversions, and ultimately the business.
Touch points
Touch points can also be any user activity. So it is possible for a user activity to be both a touch point and a conversion. For example, opening an email.
The rule of thumb is to track as many touch points as possible . The rich data can be mined to show hidden behavior pattern and previously unknown decision points. The bigger the data set, the more important good tracking. Tracking of touch points include these data elements: what is it, where did it come from, when did it happen, and who did it.
Many advertising, analytics and tracking service providers offer standard tracking template. For example, Google Analytics tracks dozens of data variables by default, as well as offers customizable variables and dimensions. Adopting such templates greatly reduces the development time. However, even with the best tracking data design, implementing the code consistently across the organization and over the time can still be an ongoing challenge.
Channel
A user can have many touch points before converting, which makes each user path unique. Paths of the whole user population can have endless variations. To streamline the data for better presentation, touch points with similar characteristics are grouped into “channels”. The most commonly used channels are the marketing ones - SEM (paid search), SEO (organic channel), Social Media, Email, Affiliates, Partnership.
The “Direct” channel is used to identify users who come directly to the site, for example, by typing in the site URL in the browser. Direct channel is detected because there is no referring URL. The reliance on the referrer creates a unique challenge for tracking mobile apps. App needs more customized code to identify the source of traffic without the presence of referrer.
Companies often make the distinction between “internal” and “external” channels. Internal channels can be product related, while external marketing. This distinction is motivated by the channel ROI. Therefore the cost structure is another way to group the channels.
User identifier
Touch points and conversions are connected by the same user identifier to reconstruct the user path, sometimes referred to as the "path to conversion". User identifier is the single most important factor in attribution, and the trickiest one.
Most companies adopt the legal requirements and industry best practices to protect user privacy. This means companies will make trade-offs between complete and accurate tracking data and users’ need for privacy. The latter should always take the priority.
Within the legal boundary, technology can help to improve the data collected. However, such data transformation, transportation and storage needs to be well documented and legally validated. This is especially critical when the first party data is shared with a third party.
With the legal and user privacy requirements satisfied, the rest challenges facing user identifier tracking are mostly technical. For example, part of the user’s path is associated with a cookie ID before the user signs up or signs in. The cookie ID to user ID matching is a standard data processing step in attribution. Such matching takes into account the change of cookie IDs associated with devices and browsers; as well as any cookie deletion done by users.
Technical solutions are truly amazing in both improving the data quality and drawing deep insights. Data engineering and data science teams can be “carried away” by such possibilities. When new legal requirement, such as GDPR, appears, it can take the teams weeks, if not months, to track down all the data assets. Data governance is an operational excellence function often overlooked by fast growing companies.
Frequency
In the following two example user paths, User 1 has more frequent touch points than User 2.
There is no set rule judging which path is better. It depends on the nature of the business. If the conversion decision takes longer, such as in the case of buying a car, users likely have more touch points as they conduct researches and comparisons. While ordering a delivered dinner may take only one click.
Frequency is an important measure for companies using integrated multi-channel marketing. Powerful insights can emerge by looking at the frequency alone. For example, some users have many touch points, but all engaged with a single channel. Some have multiple touch points with a combination of channels. Such user behavior is an important data feature to consider in user segmentation and channel ROI calculation.
Latency
In the context of attribution, latency is defined as the “distance” between two touch points. Generally speaking, the shorter the latency, the stronger the connection. In the following example, the email User 3 received more likely led to the subsequent organic search than in the case of User 1.
Recency
If latency looks at the time elapsed between two touch points, recency examines the distance between each touch point and the conversion. Recency is an important factor in determining the methods of attribution, together with the look-back window.
Look-back window
How far back shall we look into the past for touch points prior to a conversion? Preferably as far back as possible if we know when the very first user interaction happened.
In the age of big data, storing all the users’ history is possible. But the insight drawn from data is not always proportional to the data volume. Bigger data can be nosier. The best practice is to start with small(er) data that has high confidence in the data quality.
While some third party service providers set a maximum look-back window (90 days in Google Analytics), companies can invest in an one-time data exploration to determine the “right” look-back window for their businesses.
Attribution
With all the necessary data tracked, cleaned and organized, attribution can start. The simplest attribution models are easy to develop. What’s confusing is the various methods (models) available.
Attribution models fall into two major categories: rule-based or data-driven. For example, Google Analytics provides 7 attribution models plus a customizable “data-driven one. Last touch, first touch, linear models are rule based. They “favor” certain touch points over the others.
The data-driven models take into account of the value of the conversions, and have incrementality built in. For example, if user paths that contain Channel A always lead to higher-value conversion than paths that do not, Channel A can take extra weight. The weight is decided by the incremental conversion value.
Companies are careful in choosing the attribution model to operate on because they make a big difference in investment decisions. Before the integrated tracking became the industry standard, each advertising publisher tracked their own data and claimed full credit for a single conversion. This led to “double counting” and duplicate investment.
Should all companies strive for the most sophisticated attribution model? The answer is no. Decision makers need to weigh the cost of development with the potential revenue lift or cost saving. They also need to consider the cost of maintenance. More complex models need constant monitoring and continuous updating as the business conditions change.
In a nutshell
In a nutshell, three things matter in attribution:
- Track well, and thoughtfully, protect users and don’t break the law.
- Design good data models so new elements can be added like Lego pieces. Make the data good before attribute.
- Use the attribution model needed, do data exploration before committing to a model, adapt when business conditions change.