Enhancing transparency with LinkedIn’s Ad Library

Aanchal Somani

January 8, 2025

Online safety and transparency laws around the world increasingly require that platforms, like LinkedIn, provide searchable tools and databases of ads that have run on the platform.

In our ongoing commitment to building transparency in advertising, LinkedIn’s Ad Library is available to all LinkedIn members and the public and contains ads that have been served at least once in the last year by LinkedIn, along with information about those ads.

In this blog, we’ll outline how we built our Ad Library to provide an easy-to-use repository of searchable ads.

Design and flow of the ad library

Members can tune their search criteria using filters like company, keyword, country, and date range, and to view details of a particular ad. The search results are presented in a sorted and paginated manner, showing members information with every ad, such as

ad preview;
company name;
the entity who paid for the ad;
first and last impression date for the ad;
targeting parameters of the ad;
total impression count of the ad to date;
impression breakdown by country; and
restricted status of the ad (shown for restricted ads only).

Figure 2. LinkedIn’s ad library high level design and flow

When members click the “Search” button, the system makes an API call to the mid-tier to fetch the required data to be rendered on the portal. The mid-tier first collects the ad IDs in a paginated manner and for each of these ids, it retrieves the ad rendering data.

To display the search results of the query made by the member, LinkedIn’s in-house search solution (similar to elastic search), indexes multiple fields from our dataset. For these ad IDs, another API call is made from the mid-tier to fetch the preview rendering details so that the ad is displayed with its text and assets, like image, video, and URLs.

Members can also click the “View Details” button to see more details about the ad, including:

Ad preview;
Advertiser details, including the name of the advertiser and name of the entity who paid for the ad; and
For countries where the law requires it, ad analytics and ad targeting details.

We’ll dig into each module in more detail in the following sections.

Developing the search feature

To facilitate the search requirements, we leverage Hosted Search, LinkedIn’s in-house search solution. Hosted Search relies on ETL data from our Espresso table (ads dataset) to build the offline index. The nearline index is updated using Brooklin change capture. This ensures that changes made to the table are reflected on the Hosted Search index as soon as possible.

Figure 3. Search flow within the ad library

We added multiple fields from the Espresso data store to facilitate searching for text in the ad content, advertiser/company name, countries and date range when the ad had impressions. These indexed fields are inverted fields used for filtering content. Hosted search can also sort the result on different fields like ad’s impression time or ad’s created time.

Here is a preview of index types that we have used:

Index type	Details to note
NUMERIC_RANGE	A range query is required on a numeric value.
STEMMED_TEXT	When all words of query text need to exist in a sentence. Common stop words like "a", "the", "of" etc., are ignored. The tokens generated are converted to lowercase and stemmed (reduced to the base or root form).
STRING_PREFIX	A case insensitive prefix search is required on the entire string.
STRING	An exact match of the field value is required. The entire, case sensitive, exact field value has to be specified to match the records.
URL	Designed specifically to index URLs. Tokens generated during query are similar and all tokens must exist in the indexed field for a match.

Several options were considered for the Ad Search. We ruled out using the existing ad search due to missing features like date range filters and coverage for all ad types. Leveraging the user-generated content search was also not ideal, as ads may not always be linked to user-generated content.

Meanwhile, adding a new dataset which is customized for the Ad Library would require significant time and introduce maintenance overhead. Therefore, we landed on using an existing standardized data set for ads, which already has most of the necessary information, though it required onboarding additional ad types.

Displaying ad previews

We display ad previews on both the search result page and detail page. The preview includes the ad’s text content, asset, landing page URL, and so on, which can vary depending on the ad type (for example, Single Image Ad, Video Ad, Job Ad, Carousel Ad). As we developed this function, we considered using the existing Ad Preview, which provides authentication support for members but requires adjustments for guest users. However, it doesn't support non-user-generated content ads and would need changes like modifying tracking and removing CTAs and interactions for the Ad Library use case.

We also explored using the ad search dataset (mentioned above in the search section), but it only stores header and non-header data without clear rendering specifications. Ultimately, we opted for an internal dataset that handles all ad formats with a strict schema, incorporating projections for more efficient data extraction.

About the ad

We provide details about the ad type, company name with a link, the entity that paid for the ad and the date range during which the ad was served.

Ad analytics

We show the following ad analytics on the details page:

First impression date
Latest impression date
Total impressions
Impression breakdown by country

Ad analytics data is calculated daily through an offline job that is triggered once the upstream data, including impressions and demographics, becomes available. The results are then stored in the Venice data store, where the key is the ad ID, and the value includes key metrics such as the first and last impression timestamps, total impressions, and impressions per country. In parallel, the offline job triggers Kafka events for all ads with updated data. These Kafka events are consumed to update the Ads Standardized dataset, which subsequently updates the hosted search indexes.

We considered utilizing the existing internal /adAnalyticsV2 endpoint since it offers direct data access and avoids logic duplication. However, it has limitations like a 5k QPS cap and restricted data availability (e.g., only top 100 countries for the past 2 years). Storing data in ad search dataset would reduce system complexity by avoiding Venice but risks mixing Ad Library-specific data with standardized data, violating data separation concerns. Finally, using the Espresso Bulk Update Job would reduce external system costs but offer less flexibility for Ad Library-specific information and has potential data loss issues.

The selected approach is to use an offline flow to store data in a new Venice store, allowing us to add necessary Ad Library fields while keeping the ad search dataset stateless and minimizing costs by storing data only for entities that served impressions in the last year.

Ad targeting details

Figure 5. Ad Library details page with Ad targeting

targeting parameters applied by the advertiser to target their ad. For each targeting parameter, the Ad Library displays whether the advertiser used that parameter for inclusion targeting, exclusion targeting, or both inclusion and exclusion targeting.

Protections around restricted ads

The Ad Library also displays information for ads that have been restricted by LinkedIn. For these restricted ads, to comply with relevant laws we do not show the advertiser's name, the payer's name or the ad preview.

Making an external API available

In order to meet transparency obligations, we also provide an external Ad Library API to search ads, which can be accessed by requesting the product from our developer portal.

Conclusion

The Ad Library has enabled us to achieve compliance with laws like the European Union’s Digital Services Act (DSA) and increase transparency across LinkedIn's ad platform, supporting our efforts to uphold member safety and trust in LinkedIn.

Acknowledgments

It takes a village to build a product that impacts so many members across LinkedIn. A big note of thanks to (in alphabetical order):

Core Engineering Team: Aanchal Somani, Dhruv Bansal, Jasmeet Singh, Mohnish Satidasani, Sneha Dewan, Swathi Shenoy and Vikrant Mahajan.

User Council: Adam Botzenhart, Sahila Agarwal and Brian Waxham for helping us bring a user view and representing the product designer, product managers, and several others across the engineering team for their enthusiastic and continuous inputs.

Leadership: We would also like to thank Balaji Srinivasan and Mohan Nellore from the leadership team for their continued support and investment in the project.

Topics: Marketing Member/Customer Experience