Rate a business - apply Newton's law to a review-based system
Sean Zhao, MBA, MSIS
Senior Product / Analytic Manager | Leader Building Data-Driven Culture for Organizations | Analytics Expert | Product Management with Strong Technical Background
A couple of days ago I joined Yelp Dataset challenge. As I was digging through some raw data, I realized some businesses which operated for a long term, their customer reviews are not always consistent. For example:
This is an interesting phenomenon. The business received more than 2000 reviews throughout a 12-year span. From this figure we can easily tell the trend is facing downhill without any additional math. However if take a closer look at the last four years, it's obvious this business is operating under their average performance yet the actual rating which customer read from Yelp website is still on a relative high level. The huge amount of historical reviews pays this business a great dividend. On the contrary, customer will get a misleading image therefore set up an inaccurate expectation. That harms both the customers' time and money as well as Yelp's credibility.
If we want to value all customers' opinion as well as keeping information more relevant to the current event. It would be a better way to weight recent reviews more and relatively reduce the weight of older reviews. Newton's law of cooling could help solve this problem. To put this law in a easy way: reviews from today are hot and have more impact on customer's decision making process tomorrow, reviews from ten years ago are cold and will carry less weight helping today's customer. The Newton's law of cooling is:
T(t) = T(a) + (T(a) - T(0))*exp(-kt)
In this article, I won't bore you with mathematical details. All you need to know is it is a formula to calculate the relationship among current temperature of an object, room temperature and time. In the case, temperature is the weight of a customer's review. Reviews made today should weight as 1 while a review made ten years ago should weight way less than 1. I assume the old reviews will eventually drop their weight to 0.01. So the formula can be simplified as:
T(t) = 0.01 + 0.99*exp(-kt)
The k-value is another complicated story, I'll explain later. I've found some k-value within an optimal range. For now, we can actually calculate each review's weight according to when the reviews were made. For example, if a review was made around 18-Sep-2013 (that's one day after my birthday, just saying), the weight for that review will be 0.098. Then we could calculate the weighted average for this business. And here's what I got:
The blue line is still the average which 2.54. The red line is the weighted average which is 2.28. For a business that has been under performed for four years, the red line could be a better way to reflect the current issue. When you build a recommendation system, you want to leverage more recent update and activities. This method can help keep information fresh therefore help customer better informed. This example may not be the best because we are seeing a business declining their quality of service. But what if you have some new business that made some small mistake at the beginning and made it through all obstacles became a 5 star business. Those business still have to carry the luggage from the top. It will be a huge encouragement if we use this method to mitigate the negative as well.
Data product manager - AI Platform for E-commerce and Marketing Analytics | Business Development
7 年Very useful insight for a foodie like me:) If not taken time into consideration, some data might be easily misleading.