How Audience Building Works in a DMP [Technical]
Clearcode.cc

How Audience Building Works in a DMP [Technical]

Almost every data-management platform (DMP) on the market allows advertisers to create audiences and use them for different use cases, such as improved online ad targeting and advanced analytics.

To create audiences in a DMP, the platform must first create user profiles, which comprise of numerous profile identifiers. 

As part of a recent internal project carried out by one of the AdTech development teams at Clearcode, we researched the topics of audience building and profile merging and included some of our findings below.

To provide some context about the goal and purpose of profile merging, we first need to explain what audience building is and what profiles and profile identifiers are.

Audience building is one of the main data processes in a DMP. 

Once advertisers create an audience in a DMP, they can export it to other systems, such as a demand-side platform (DSP), for improved ad targeting.

An audience is a group of user profiles that share a common user identifier. 

For example, an advertiser might create an audience in its DMP called “Visitors from the USA.” The audience would then contain profiles that have an attribute such as “country = USA.”

How the profile merging process looks in a DMP.

No alt text provided for this image

Here’s an overview of what’s happening in the image above:

  1. A new event occurs – in this case, a website visit.
  2. The event contains numerous profile identifiers: cookie_id, country and click_id.
  3. The profile identifiers are identified as belonging to an existing profile. Any new identifiers, in this case the click_id, are added to the profile.
  4. The profile is added to any existing audience, provided it met the conditions. In this case, it would be added to the Visitors from the USA audience because of the country = USA attribute.

Note: Most DMPs hash personally identifiable information (PII) such as email addresses. To keep things simple, we’ll use examples of unhashed email addresses in this article.

Audiences are built on numerous processing assumptions, with the process starting from an input event (e.g. web visit), which may contain different user identifiers.

To create profiles, and subsequently audiences, every event generally needs to have at least one profile identifier.

What Are Profiles and Profile Identifiers?

A profile is a set of data collected from events tracked by a DMP. It represents a user and may contain the following pieces of information:

  • profile idcookie id (list)
  • hashed email (list)
  • sid / uuid (list)
  • country (last seen)
  • name (nullable)
  • device_type (last seen)
  • device_vendor (last seen)
  • device_os (last seen)
  • browser_vendor (last seen)
  • gender (nullable)
  • company (nullable)
  • company size (nullable)
  • matching ids (list)

The list presented above can be extended by specific use cases of a DMP. Some of the fields are not filled with data in the beginning.

Generally speaking, if an input event contains an unknown identifier (i.e. one that is not in the DMP already), a new profile is created. 

On the other hand, if the input event contains an identifier that is already known to the DMP, the profile is updated with incoming data from the event.

After updating the profiles with event data, two profiles may often share a common identifier. 

If this occurs, the DMP will have to perform an operation known as profile merging.

What Is Profile Merging Exactly?

The profile-merging operation ensures there are no duplicate identifiers or attributes within a given profile and that no two profiles have the same unique identifiers (such as email addresses). It achieves this by converting all profiles sharing a common identifier into one profile. 

As events can have multiple identifiers, they can arrive from the same user/profile but with different identifiers. 

For example, consider the following three events:

Event 1: A user visits publisher.com using Firefox: {cookie_id = 7M-Q1P8-6AWG-1N3I}

Event 2: The same user subscribes to a newsletter on publisher.com using Chrome: {email = [email protected], cookie_id = eyJraWQiOiJzZXN}

Event 3: The user fills in a form on publisher.com using Firefox: {email = [email protected], cookie_id = 7M-Q1P8-6AWG-1N3I}

All three are from the same user, but before the third event arrives in the system, this isn’t known, and they are treated as two totally separate profiles.

Once it’s known that all three are from the same person, it would be advisable to treat them as the same object (profile), otherwise, we would have multiple profiles assigned to one user and those profiles wouldn’t contain the latest and most up-to-date information. 

At a minimum, profile merging requires joining the IDs and profile attributes together. 

Due to the large number of IDs and attributes that can be collected via events, it’s possible to merge and use a small percentage of collected data for audience creation. 

Also, if multiple user identifiers are found between profiles, we need to determine which identifier is the proper one – i.e. a single ID that will be used as the master ID after the data has been merged. This master ID will also be used to assign new data from events to a given profile.

To make things easier, it is assumed that a master identifier can be computed. This means that when an event with multiple IDs arrives in the system, it will be assigned a single ID calculated on the basis of the event IDs plus any other known IDs. 

A simple implementation would be to construct a list of all known IDs, sort it, and use the first element as the master ID. This approach is the simplest, but differs depending on the business use case of the DMP.

The merged profile can be assigned to segments or audiences different than those its original profiles were assigned to. 

After the profile-merging operation, DMP taxonomies, segments and audiences also need to be regenerated. 

How to Merge Profiles Together

To effectively carry out the profile-merging operation, a proper way of merging must be determined.

Imagine the merging operation between two profiles, both containing information entered by the user, where linking fields were found.

Two different names were provided by the user during registration on two different platforms.

No alt text provided for this image

The profile-merging operation has to decide which name is correct. 

There are a few ways to conduct profile merging. Below, we list four possible options.

Sort By Overwriting Existing IDs and Attributes

One of the simplest ways to merge profiles would be to overwrite all existing IDs and attributes with new, incoming ones. 

This can be done either by defining a master ID that will remain consistent (meaning it won’t be updated) or replacing the master ID each time a new ID is collected.

Alphabetical Sorting

Alphabetical sorting is another simple option for merging different profiles together. 

With this method, the data between profiles is sorted alphabetically and the first value is used. 

According to our example, we have two names: Ben and Obi-Wan. With alphabetical sorting, the name Ben is defined as the correct one.

Timestamp Sorting

Another approach would be to use the value that has the first or last recorded timestamp. 

In most cases, timestamp sorting will be the most desired method to use.

Again, according to the example, the event containing the name Ben was received first, so we use it instead of Obi-Wan. 

It’s important to note that timestamp sorting is determined by the event time, rather than the processing time.

Wait-and-See Sorting

A more complex solution would be to keep all values for reference until a different sorting method (e.g. timestamp) becomes applicable, which may determine whether the assumption was correct and outline the final value after the merging itself.

Which Profile-Merging Option Do You Use?

Most of the time, assigning a profile-merging algorithm is based on the DMP’s use case, but is also dependant on the type of data we are merging and in most cases will need business justification.

Another aspect to consider is the order of profile-merging operations. 

When two profiles are found with linking fields, the profile-merging operation is performed. There will likely be cases where more than two profiles will need to be merged during a single operation. 

For example, if there are three profiles that need to be merged, the first two profiles would be merged and the third would be merged with the result of the first merger. 

To properly carry out this process, a merge order must be determined. 

For example, we can assume that the order is based on the timestamp of each of the events. 

Taking this into account, we may face a situation where a different order of merge operations may possibly end up with a different final profile combination. 

Depending on the business use case, an additional service that will periodically verify the profile merging may be required in order to guarantee proper merges.

How to Handle Concurrent Merging

Most DMP systems often face very high processing requirements in terms of speed and amount of data. 

Concurrent profile merging is a solution that enables us to perform profile merging in a short amount of time. 

However, in this case, multiple processes are evaluating events and the merges become a lot more complicated. 

The main problem with concurrent merging is deciding how to handle this when multiple events are being processed by the DMP at the same time. 

A simple approach is for the process that first receives an event to create a new profile, which should then be used in the second process. 

However, this causes all sorts of problems with synchronization – it often takes a while to create a new profile – so before the first process finishes creating a new profile, the second process decides that it should also create one, resulting in two profiles that should really have merged. 

While this might seem unlikely to happen, considering the scale of data processed by a DMP, such problems will no doubt appear.

In order to avoid such problems, we decided to route events to different processes, resulting in a master process that handles all identifiers. 

When a new event arrives in the DMP, the router checks its ID and decides which processor it should go to, which (with a decent routing algorithm) should allow for even load distribution. 

This reduces the problem of concurrent profile-merging to multiple, simple merges such as those described above, at the cost of a single point of sequential processing. 

Even if two events arrive at the same time from the same profile but with different identifiers (and therefore should be merged), they will process one after another and both will go to the same processor.  

To ensure the process runs smoothly, each process should have access to all profiles. If each process has its own profile store (e.g. a database), this will require copying profiles from one process to another. 

Key Takeaways

Below are a few key takeaways from our profile-merging research:

  • The correct implementation of a profile-merging algorithm is not just a matter of technical implementation, but also the DMP and business use case.
  • As there are multiple ways to carry out profile merging, the user profiles may change over time according to the different merging operations and the time at which the data is collected. It’s important to remember that DMPs collect user data at different times – sometimes it can be in real time (e.g. data collection from a website) and other times is can be via a data-import operation (e.g. first-party data onboarding).
  • To be sure that user data is not contaminated with false information, we need a proper merging algorithm for each of the collected and populated pieces of profile information.
David Jenkins

Chief Commercial Officer | NXD / Trustee | Former CEO, MD and Sales & Account Director from Leisure / Entertainment / Media / Technology | Strategic | Stakeholder driven | Growth focused

5 年

Detailed - thanks! Does a DMP have to merge these as surely everything is valuable (even dodgy names like Obi)? Surely some good data is potentially lost, and it still seems to be that you cant store multiple identifiers - such as emails - on any one profile (or have I misread that?)....

Alok Pabalkar

CoFounder & CTO - GIDE.AI

5 年

Good stuff. The main challenge and fun is in generating dynamic segments basis the imited available information from impression, click and event streams and then ensuring segregation of public and private data clouds.

回复
Britta Behrens ??

LinkedIn Expertin ?? Social Selling Virtuosin ?? B2B Marketing & Sales Nerd ?? Speakerin ??Autorin ?? Golf Geek ? Gin Taster ?? Always on Fire ??

5 年

Thank you Maciej Zawadzinski for the great article.

回复
Steve Dunlop

Founder and CEO of AMA

5 年

This is so helpful, thanks Maciej Zawadzinski

Maciej Zawadzinski

3x Exited Founder-Turned-VC | Investing in Hard-to-Beat Founders

5 年

Co-authored by??ukasz Ma?ecki,?Daniel O'Connell?& Michael Sweeney. Great stuff!?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了