Medium Blog Recommendation

Medium Blog Recommendation

“ Why do we need a recommender system for Blogs? ”

Recommender systems?are an important class of?machine learning?algorithms that offer “relevant” suggestions to users. Netflix, YouTube, Tinder, and Amazon are all examples of?recommender systems.

Machine learning algorithms in recommender systems are typically classified into two categories — content-based and collaborative filtering methods although modern recommenders combine both approaches. Content-based methods are based on the similarity of item attributes and collaborative methods calculate similarity from interactions.

  • There are around 18,00,000 new articles posted on Medium.com every month.
  • There are more than 2,50,000 writers on Medium currently.

and this number is increasing day by day. So for recommending the right stories to the right user is important to give the best experience to its reader.

Just like, the product recommendation system is there to help increase the revenue of the company, similarly, the Blog Recommendation system is to?increase the popularity?of the blog and give readers a?good experience.

As we understand that data is the most important raw material for building any machine learning model. So we also need to collect some data to proceed with the process of building a blog recommender system. Medium has data of each of their users either they are readers or writers. From that, we can extract some features that will serve our purpose.

  1. User Data:-?age, gender, location, user bio, number of followers.
  2. Content Data:-?The title of the blog, published date-time, category, author, blog publication, tags, number of reactions, number of views, average reading time, predicted reading time
  3. User-Content Interaction:-?number of claps, comments.

Then there is a need to prepare these features according to our needs. So let’s start with the feature engineering part.

Feature Engineering

While building any machine learning model selecting the right set of features is a must and so with recommender systems. So, after collecting the User data, Content data, and User Content interaction we explored all the features and also engineered some new features by combining them.

Title:

The?title of the blog?is an important factor on the medium which enables readers to find the blogs and makes them want to click through to read more. Creating headlines that catch visitors’ attention and spark their curiosity will encourage them to stick around longer and come back for more. We can extract the titles of each blog post and by applying relevant NLP techniques we can vectorize this feature and will further use it in our recommendation system.

Tags:

When it comes to getting the blogs noticed,?tags?are second most important after the title of the blog. Picking the right tags can make the difference between an article getting popular, and an article dying upon delivery. People generally write topic tags with respect to the content they have written and in medium, it is limited to 5 tags. Tag is an important feature that can be used in this recommendation system for grouping articles w.r.t their respective tags.

Claps and Comments:

Claps and comments with respect to each and every blog reflect the Readers’ engagement with that blog and it basically reflects the popularity of a blog. We can use these two features to create a new feature called?Readers reaction?which will be the combination of these two features.

Article Read Time and Average Reading Time:

Read time and member reading time are two important features of medium blogs.

  • Read Time?or?Article Read Time, appears at the top of any Medium article and is an estimate of how long it will take the reader to complete an article. It is based on the average reading speed of an adult (roughly 265 WPM). Medium takes the total word count of a post and translates it into minutes, with an adjustment made for images.
  • Average Reading Time?is the Medium’s way of measuring how long a reader actively engages with an article. It is the average amount of time all readers (members and non-members) spent reading the blog.

No alt text provided for this image


We can combine these two features to generate a new feature called?Readers engagement?which is the ratio of average read time and Read time.

Modelling of System

Popularity Recommendation:

On medium, one can also see this trending on medium blog posts column. These recommendations completely based on claps, followers, views, average read time from which user-blog interaction scores can be derived and top6 blogs are recommended.

This recommendation doesn’t depend on the content user usually interacts with and the topics user follows. This is a general recommendation for everyone based on the trending blogs.

Personalized recommendation:

These recommendations are different for every reader based on their interactions with the blogs on medium. We have broadly two types in this.

The first one is?collaborative filtering. It is a method for obtaining automatic predictions (filtering) about a user’s interests by collecting preferences or taste information from a large number of users (collaborating). Finding patterns among multiple readers is what this entails in the context of the media.

If a group of articles excites the attention of multiple readers, it’s highly probable that a reader who begins reading one of these pieces will want to read the others in the group. As a result, suggestions are offered to comparable individuals based on the reading behaviour of other readers.

  1. Memory-based
  2. Model-Based

Memory-based:

This approach computes user similarities based on blogs they’ve interacted with (user-based approach) or computes blog similarities based on the users who’ve interacted with them (blog-based approach) (item-based approach).

Model-based :

This algorithm utilizes a user-item matrix with strengths calculated from claps, comments, and read time: expected time. Blogs can be recommended to a specific user based on the strengths of the blog.

The other one is a?content-based recommendation engine. They are different in that they provide recommendations based on blog articles and the words included within them (mostly tags). If a person reads an article that includes the terms Machine learning and Data Science, it’s likely that the same user enjoys reading additional articles that include these terms.

Three other relevant articles are provided to the user as suggestions for continued reading in the medium for continuous reading for each article. The suggested articles were recently published by the same publication and contain any of the tags from the user’s current article.

No alt text provided for this image

Evaluation Metrics:

There are 2 primary ways of checking whether your recommendation system is performing as per expectations.

  1. Online A/B Testing
  2. Offline Evaluation Metrics

Online metrics?are the empirical results observed in your user’s interactions with real-time recommendations provided in a live environment. The most effective way to do this is by performing an A/B test. In a live environment, you have a?Control: your existing system and the?Version?is your recommender system under test. This is because user behaviour is the ultimate test of our work.

No alt text provided for this image

Then why use offline techniques for evaluation?

Because they are the ONLY indicators you can look at while developing your recommendation systems. Always preferring to go with online metrics to collect user behaviour and scoring your system is expensive and time-consuming. Moreover, when continuous feedback are asked from users, they might become more hesitant to use our platform and not use it at all. Good accuracies in offline metrics followed by good online A/B scores are what you will be looking for.

Accuracies in the above methods depend on historical data and try to predict what actual?users have already seen. If the data collected is too old, however high the accuracies may be, they won’t mean anything as your interests a year back will not be as same as your interests a year from now!

Some of the offline techniques for evaluation are as follows:

Root Mean Squared Error (RMSE):

RMSE is similar to MAE but the only difference is that the absolute value of the residual(see above image) is squared and the square root of the whole term is taken for comparison.

The advantage of using RMSE over MAE is that it penalizes the term more when the error is high. (Note that RMSE is always greater than MAE).

Hit Rate:

Hit Rate is a better alternative to MAE or RMSE. To measure a Hit Rate, we first generate top N recommendations for all the users in our?test data set.?If generated top N recommendations contain something that users rated — 1 hit! There are various versions of this one being the?cumulative hit rate.

MAP@K / MAR@K:

A recommender system typically produces an ordered list of recommendations for each user in the test set. MAP@K gives insight into how relevant the list of recommended items is, whereas MAR@K gives insight into how well the recommender is able to recall all the items the user has rated positively in the test set.

Other important metrics include?Coverage?which is the per cent of items in the training data the model is able to recommend on a test set.?Personalization?uses dissimilarity (1- cosine similarity) between user’s lists of recommendations. The higher the score, the higher is the dissimilarity meaning it is giving more personalization.

Deployment

If we can’t make a machine learning model work in production, then that model is of no use. It’s like a written blog that never got published. So, that’s why the deployment of the model is one of the most important steps to get benefit out of the developed model. Hence, we also need to deploy the recommender system after testing and tuning of the model is done.

There are few features that makes difference in the deployment of the recommender system, i.e. Scalability of the system, Latency of the system, Offline system or Online system, etc. Let’s have a look at a few of these features. The scalability of the model defines how the model will respond when the number of users and the amount of content increases. Scalability problems have significantly increased with the rapid growth of the e-commerce industry: modern recommendation engines are required to generate real-time results for large-scale applications. In other words, the performance of the recommendation model is measured in terms of throughput (number of inferences per second) and latency (time for each inference).

Conclusion

We have discussed the complete methodology to design a Blog Recommendation System in this article. There can be other improvements and modifications that can be done in this design methodology. We have mentioned a few of the improvements that can be done on this system in the future plans section of this blog. In further blogs, we are planning to integrate deep learning, computer vision and other advanced techniques to improve the efficiency of the Blog Recommendation System.

Future Scope

  1. Explore various DL techniques in order to utilize the image and video data which are an integral part of the blogs.
  2. Leverage blog content like keywords, references etc. in order to provide better recommendations.

要查看或添加评论,请登录

Gaurav Gade的更多文章

社区洞察

其他会员也浏览了