LinkedIn's A.I. Breakthrough in Real-Time Personalization Explained

LinkedIn's A.I. Breakthrough in Real-Time Personalization Explained

LinkedIn Showcases Real-Time Features for Near Real-Time Personalization

Making relevant recommendations to LinkedIn Members. Blog summary.

If you enjoy articles about A.I. at the intersection of breaking news join AiSupremacy?here. I cannot continue to write without community support. (follow the link below).

https://aisupremacy.substack.com/subscribe

AiSupremacy is a Newsletter at the intersection of A.I. and breaking news.?You can keep up to date with the articles?here.

Fair warning this article summary gets a bit technical as it is based on LinkedIn's AI Engineering blog full of technical details.

In a world of streaming, videos stories and competition of notifications for human attention, real-time personalization on our app Feeds is more important than ever. For busy professionals hunting for a job, someone in their network or some topic they are interested in, speed is of the essence.

At LinkedIn, enabling this kind of economic mobility at scale is their job—that is, they continually want to connect every member of the professional workforce in the world to opportunity.

The quality of their social feed is debatable, with many people having their own points of view. However the majority of people who log-in to LinkedIn are job seekers, sales professionals or HR professionals with their own particular goals.

LinkedIn’s Growing Relationship with A.I.

In this article we’re mostly going to be talking about the machine learning steps behind how the recommendation engine is improving.

LinkedIn, which is now becoming more popular in South Asia, even has a course on?Artificial Intelligence. In fact today in 2022, there are many such courses on LinkedIn Learning. I don’t know if they are any good.

AI Courses on LinkedIn Learning

Machine learning is one of the liveliest areas in artificial intelligence. Machine learning algorithms allow computers to learn new things without being programmed. Even in an era of?Deepfake LinkedIn profiles, LinkedIn is updating its feed with new AI technologies making if faster and more personalized.

Organizations applying real-time machine learning are reportedly seeing increased return on investment especially when it come sto real-time personalization. LinkedIn?AI engineering team?is thinking about a lot of issues?as it pertains to security?and personalization among other factors.

See AI Engineering Topics at LinkedIn

From?Bayesian optimization?by?Yunbo Ouyang, to?fairness in its AI products?by?Heloise Logan, you get a sense of the diversity of projects that LinkedIn’s AI team is working on. Recently a more exciting topic came across my desk to do with real-time personalization. So the rest of this article will attempt to summarize it in some detail.

Real-Time Features for Near Real-Time Personalization

According to a basic definition, AI is the science and engineering of building intelligent computer programs that can achieve complex goals. LinkedIn now under Microsoft is doing just that.

Looking back on AI’s history at LinkedIn some context can be helpful.

In order to understand how AI systems help LinkedIn achieve its goals, it’s important step back and look at how these algorithms work.

  • You identify a broad objective for the AI system, like “provide new job opportunities for our members that match their skills and interests” or “provide recruiters with a list of candidates that both match a given search criteria and are likely to result in a successful hire.”
  • Have a set of intermediate metrics (called “relevance” metrics in Figure 1) that are used as a proxy for how well the system is achieving its goal. This is often necessary because the original product metrics (for example: successful hires) are are not something that a machine learning algorithm can directly optimize easily. In our example, these metrics could include the number of members that apply to the jobs they are given, the number of confirmed hires, the number of members that click on job listings, etc.
  • You create an algorithm that improves (according to your relevance metrics) upon your existing method of generating results from data. For example, a model could use a different criteria to recommend job opportunities to members that results in an increased number of members clicking on job listings, which is used as an indication that the job recommendations have improved.

So what’s new in 2022?

The blog article that we are going to summarize?Co-authored by:?Rupesh Gupta,?Sasha Ovsankin,?Qing Li,?Seunghyun Lee,?Benjamin Le, and?Sunil Khanal.

Do you enjoy A.I. articles at the intersection of breaking news, then help me continue to write on the subject. I cannot continue to write without support. Grateful for all tips, patronage and community contributions.

Join 43 other paying subscribers

No alt text provided for this image

Improving LinkedIn Recommendation Engine

At LinkedIn, they are striving to serve the most relevant recommendations of their members, whether that’s a job they may be interested in, a member they may want to connect with, or another type of suggestion.

  • In order to do that, we need to know their intent and preferences, which may be revealed through their actions on our application. For example, if a member visits LinkedIn and applies for a web developer job in San Francisco, then this action can reveal their intent to find a job, with a preference for web developer positions in San Francisco.
  • Such information is leveraged by our various recommenders to better personalize recommendations for the member. For example, our job recommender can recommend other web developer jobs in San Francisco, our connection recommender can recommend web developers to connect with in San Francisco, and our feed recommender can recommend content related to the job market in San Francisco to this member.

Speed Approaching Real-Time Personalization

However, there is usually a delay between when a member takes an action and when it can be leveraged to adapt recommendations for that member.

  • This is because member activity data is typically processed periodically into features in a batch environment and then made available to recommender systems.
  • Every time a member takes an action, an event containing information about the action is emitted to a data stream (such as?Apache Kafka). These action events are periodically?ETLed into an offline data store (such as?HDFS). A batch job (such as an?Apache Spark?job) periodically reads the activity data of all members in this offline store and processes it into features.
  • These features are then pushed to an online store (such as a key-value?Venice?store). When a member lands on a page that contains a recommendations module, the corresponding recommender system reads features from the online features store and uses them in a model to score candidate recommendation items and return recommendations ordered by their scores.?

No alt text provided for this image

Figure 1. A conventional feature pipeline for leveraging past actions of a member to personalize recommendations.

LinkedIn’s solution is based on the following two ideas:

  1. Recent actions of a member may be ingested into an online store, as we only need to retain actions taken within the last few days in this store. This is because features computed from actions in this online store should complement the features computed through the conventional (batch) feature pipeline.
  2. Rather than precomputing features based on recent actions of a member, these features may be computed on-demand when recommendations need to be generated. The computation is fast, as typically only a small amount of data needs to be processed for these features. This allows us to use the most recent actions of a member for computing these features.

Understanding Past Behavior

Computation of features based on a member’s past actions

The team involved surveyed several AI teams at LinkedIn to understand how they compute features (through the conventional feature pipeline) based on a member’s past actions. They noticed a generic pattern in computation of a majority of these features. This computation comprises three steps:

Step 1:?Get relevant actions taken by a member over a duration of time. For example, get all the job-apply actions taken by a member over the last 7 days.

Step 2:?Look up certain attributes of the entities on which the above actions were taken. For example, look up the embedding (a numeric vector representation) of each job which the member applied to from the previous step.

Step 3:?Perform a summarization operation on the attributes of all the entities. For example, compute the average of the embeddings of all the jobs from the previous step.?

For datascience and AI generally speaking, it’s pretty interesting to see how they conceptualize this.

A recommender system might also compute a (member, item) pair feature based on the member’s past actions.

For example, the job recommender might use the following pair feature when scoring a candidate job recommendation jobi for a member: the number of times this member applied to any job in the same geographic location as jobi in the last 7 days.

Summary of requirements

The generic computation pattern above helped the team define the requirements for their desired solution. These requirements were:

Requirement 1: Ability to record any member action of interest within a few seconds.

Requirement 2: Ability to join any attributes of the entity on which an action was taken.?

Requirement 3: Ability to retrieve actions taken by a member (along with joined attributes) that meet certain criteria and compute features from those actions?in less than 100 milliseconds.?

So while our experience of LinkedIn’s feed might not be any better on a UX or content level, the personalization and speed behind it has improved significantly in recent months.

Design of solution

With the standard schema for representing any member action, we designed our solution as shown in Figure 2.?

No alt text provided for this image

Figure 2. LinkedIn AI Engineering team’s solution leveraging actions of a member in near real-time to adapt recommendations for that member in near real-time.

They introduced an?Apache Samza?stream processor to listen for and process events corresponding to member actions of interest from Kafka.

They chose to support the?Samza SQL API?for writing the processing logic in this processor. It limits processing logic to simple operations such as filtering, stream-table joins, and projections. This limitation helped them ensure that the stream processor is always simple and lightweight.

The exact processing logic can be different for different use cases, but generally looks like the following:

  1. Read an event corresponding to an action.
  2. Filter the event out if the action is not worth recording. For example, filter out an event if the ID of the member who took the action is null in the event. Such an event might be triggered when an action is taken by a bot.
  3. Join any required attributes of the actor, verb, or the object of the action. These attributes may be stored in external stores. For example, a Venice store may contain attributes of each job, such as its embedding and/or the geographic location where it is based.

The store is configured to retain data for 96 hours, which means that an action is deleted from the?store 96 hours?after it is ingested—this keeps the size of the store under control.

The store is also configured to use “actor” as the primary key, so that the data is partitioned and sorted based on the “actor” column, which allows quick retrieval of actions taken by a specific member.

They chose Pinot as our store for several reasons. Their main ones were:

  1. It supports near real-time ingestion of data from Kafka.?
  2. It can answer analytical queries with low latency. This allows computation of a variety of features from activity data in the store in less than 100 milliseconds.
  3. It is horizontally scalable. This allows multiple recommender systems to query from the same store.
  4. It supports purging of old data.

A recommender system can now query this Pinot store when recommendations need to be generated for a member.

Depending on the types of features required, the recommender system appropriately queries the Pinot and attributes stores to compute the near real-time features. It then uses these features, along with other features (such as those computed through the conventional feature pipeline) in a model to score candidate recommendations.

The near-real time features can capture the short-term intent and preferences of a member, while the other features can capture the longer-term intent and preferences.

After scoring, the recommender system also emits an event to log the computed features to HDFS. Since features based on a member’s actions can be very time sensitive (for example, a member may apply to two jobs within a minute, or click on two feed articles within seconds), logging them ensures that they have the correct value of these features associated with each impression of recommendations. This makes it easy to prepare training data for future iterations of the model.?

Results

Their solution has been successfully adopted by several recommender systems at LinkedIn to leverage actions of a member in near real-time to adapt recommendations for that member in near real-time.

So it seems even if your might find LinkedIn’s Feed cringe at times, the tech behind it is truly moving ahead with A.I. research by the engineering team. As far as functional things go like finding a job, the personalization of the user experience has been considerably refined.

The solution has been able to meet all the requirements:?

  • Member actions of interest can be recorded in the?Pinot store within 0.1 to 15 seconds, depending on the frequency of the type of action. This means that if a member takes an action now, it can be leveraged to adapt recommendations for that member within the next few seconds.
  • Actions can be retrieved (along with attributes) in less than 50 milliseconds at a rate of over 20,000 queries per second with the appropriate number of Pinot servers.
  • New use cases can be onboarded within a few days.
  • The maintenance cost has been small.

It has also resulted in significant gains in business metrics. The gains realized from the job, feed, and search typeahead recommenders and so forth.

This team also thanks fellow contributors and collaborators such as:?Jiaqi Ge,?Aditya Toomula,?Mayank Shrivastava,?Minhtu Nguyen,?Justin Zhang,?Xin Yang,?Ali Hooshmand,?Yuankun Xue,?Xin Hu,?Qian Li,?Hongyi Zhang,?Marco V. Varela,?Manas Somaiya,?Shraddha Sahay,?Raghavan Muthuregunathan,?Anand Kishore,?Daniel Gmach,?Joshua Hartman,?Shipeng Yu,?Abhimanyu Lad,?Tim Jurka,?Romer Rosales, and many others who helped them.

To read more of the details and workflow about how they achieved the solution you can read the original blog here:

Real time Personalization LinkedIn

At LinkedIn, they really do strive to serve the most relevant recommendations to their members, whether that’s a job you may be interested in, a member you may want to connect with, or another type of suggestion.

It’s pretty neat to try to understand some of the A.I. behind that recommendation process and how it’s improving even years after LinkedIn was born and grew to over 800 million users.

This is not a sponsored post, just something I was interested in seeing. I try to cover AI Google, Meta AI, Microsoft AI and Microsoft Research along with DeepMind and OpenAI in equal distribution so far as is possible.

The lead writer of the blog was?Rupesh Gupta, who has been a senior staff Engineer at LinkedIn for over 9 years. Finally if you are interested in following this sort of thing you can follow?LinkedIn’s Engineering blog here.

LinkedIn's Engineering Blog

If you enjoy articles about A.I. at the intersection of breaking news join AiSupremacy?here. I cannot continue to write without community support. (follow the link below).

https://aisupremacy.substack.com/subscribe

AiSupremacy is a Newsletter at the intersection of A.I. and breaking news.?You can keep up to date with the articles?here.

NOTE FROM THE AUTHOR

I cannot continue to write without?tips, patronage and community support?from you, my readers and audience. I want to keep my articles free for the majority of my readers.

Join 43 other paying subscribers

So by subscribing you are essentially?helping fund a network of Newsletters?whose aim is to inspire and inform. This is my only job and stream of income.

See My Writing Feed

AiSupremacy is the fastest Substack Newsletter in AI at the intersection of breaking news.?It’s ranked #1 in Machine Learning as of January 22nd, 2022.

Thanks for reading!

Gerald Wiebeck

Investigator SIU, ORMC

3 年

So now I’ve got a question for the guru Michael Spencer. With the US debt headed towards $40 trillion and the technological Edge and likely the Chinese equivalent of the dollar becoming the new world currency what can we do to change the narrative and therefore the US trajectory?

Good having a break through can make the difference appreciate thank you ??

回复

要查看或添加评论,请登录

Michael Spencer的更多文章

  • Is BYD Disrupting Tesla in 2025?

    Is BYD Disrupting Tesla in 2025?

    When we think of AI Supremacy as in the AI race between China and the U.S.

    20 条评论
  • The Fundamental Lie of OpenAI's Mission

    The Fundamental Lie of OpenAI's Mission

    Welcome Back, Everyone from OpenAI to DeepSeek claims they are an AGI startup, but the way these AI startups are…

    13 条评论
  • Vibe Coding: Revolution or Regression Students and Non-coders?

    Vibe Coding: Revolution or Regression Students and Non-coders?

    Good Morning, As the vibe coding interface takes shape, I’ve been checking out a new startup coming out of stealth this…

    10 条评论
  • The Truth about DeepSeek's Integration in China and WeChat Explained

    The Truth about DeepSeek's Integration in China and WeChat Explained

    DeepSeek's rapid integration in China is a bigger story that is being told. It's not just the China Cloud leaders…

    5 条评论
  • How AI Datacenters Work

    How AI Datacenters Work

    Good Morning, Get the full inside scoop on key AI topics for less than $2 a week with a premium subscription to my…

    5 条评论
  • How Nvidia is down 30% from its Highs

    How Nvidia is down 30% from its Highs

    If like me, you are wondering why Nvidia is down more than 20% this year even when the demand is still raging for AI…

    8 条评论
  • What DeepSeek Means for AI Innovation

    What DeepSeek Means for AI Innovation

    Welcome to another article by Artificial Intelligence Report. LinkedIn has started to "downgrade" my work.

    16 条评论
  • What is Vibe Coding?

    What is Vibe Coding?

    Good Morning, Get access to my best and complete work for less than $2 a week with premium access. I’m noticing two…

    23 条评论
  • TSMC "kisses the Ring" in Trump Chip Fab Announcement

    TSMC "kisses the Ring" in Trump Chip Fab Announcement

    Good Morning, To get the best of my content, for less than $2 a week become a premium subscriber. In the history of the…

    9 条评论
  • GPT-4.5 is Not a Frontier Model

    GPT-4.5 is Not a Frontier Model

    To get my best content for less than $2 a week, subscribe here. Guys, we have to talk! OpenAI in the big picture is a…

    16 条评论

社区洞察

其他会员也浏览了