Amazon Customer Satisfaction Analysis by Using Machine Learning
Rohit Saraiwala
VECV (A Joint Venture between Volvo Group and Eicher Motors) II Summer Intern at Maithan Ispat limited II IMI Bhubaneswar || PGDM Batch 2022-24 || Joint Member Secretary-Sports Committee
ABSTRACT
In recent years, customer satisfaction has become an increasingly important metric for businesses to measure and improve upon. One such business is Amazon, a global e-commerce platform that serves millions of customers around the world. In this report, we aim to analyze Amazon customer satisfaction using machine learning techniques. To analyze the company’s goal, we gather datasets from the Kaggle website and use machine learning processing techniques such as regression, classification, correlation analysis, and clustering, to identify patterns and insight within the data. The main goal of this report is to identify the key drivers of customer satisfaction for Amazon, such as product quality, pricing, shipping speed, and customer service. By understanding these drivers, Amazon can better focus its efforts on improving customer satisfaction and increasing customer loyalty. Overall, this report demonstrates the potential of machine learning techniques for analyzing large amounts of customer feedback and identifying insights that can improve business operations and customer satisfaction.
Keywords: Customer Satisfaction, Regression, Correlation, Clustering, Machine Learning?
INTRODUCTION
Customer satisfaction analysis is the process of measuring and analyzing customer satisfaction levels to gain insights into customer needs and preferences, as well as to identify areas for improvement. With the help of machine learning techniques, businesses can gain deeper insights into customer satisfaction levels by analyzing customer feedback, behavior, and demographic data. Machine learning algorithms can be used to automate the process of analyzing large volumes of customer data, including customer feedback from surveys, online reviews, social media, and other sources. This data can then be used to create predictive models that can help businesses identify patterns in customer behavior and predict future customer satisfaction levels. One popular machine-learning technique used in customer satisfaction analysis is clustering. Clustering involves grouping customers with similar behavior and preferences together, which can help businesses better understand customer segments and tailor their products or services to meet specific customer needs. Overall, machine learning can help businesses gain deeper insights into customer satisfaction levels and improve customer experience by identifying areas for improvement and tailoring products or services to meet specific customer needs.
RESEARCH OBJECTIVES AND QUESTIONS
The research objectives of customer satisfaction analysis by using machine learning may vary depending on the specific business or industry, but some possible objectives could include:
? Predicting customer satisfaction levels: One objective could be to develop machine learning models that can accurately predict customer satisfaction levels based on factors such as customer feedback, behavior, and demographics. ? Identifying key drivers of customer satisfaction: Another objective could be to use machine learning techniques to analyze customer data and identify the key factors that influence customer satisfaction, such as product quality, customer service, or pricing.
? Segmenting customers based on satisfaction levels: Machine learning can be used to segment customers based on their satisfaction levels, allowing businesses to target specific customer groups with personalized marketing messages and offerings.
? Identifying areas for improvement: By analyzing customer feedback and behavior data, machine learning can help businesses identify specific areas where they can improve their products or services to increase customer satisfaction.
? Optimizing customer experience: Machine learning can also be used to optimize customer experience by providing personalized recommendations, improving the user interface, and streamlining the customer journey. Overall, the objective of customer satisfaction analysis by using machine learning is to gain insights into customer needs and preferences and to use these insights to improve customer experience, increase customer loyalty, and drive business growth.
Some possible research questions for customer satisfaction analysis by using machine learning include:
? What are the most effective machine learning techniques for analyzing customer feedback and behavior data, and how can we use these techniques to improve customer experience?
? What is the relationship between customer satisfaction levels and business outcomes, such as customer retention, repeat purchases, and revenue growth?
? Which variables are having a crucial impact on customer satisfaction?
? Which variables have positive and negative impacts on satisfaction?
? Do customer dissatisfaction lead to a decrease in customers? Based on generated problem statement we have presented the report after analyzing all the possible practical applications through machine learning.
RESEARCH METHODOLOGY
Based on our problem statements, data is collected from the Kaggle website. Data includes factors like product quality, customer resolution, technical solutions, salesforce service, claims, and warranty, etc. all the collected data are fully structured and cleaned earlier. So, no missing data is found from the dataset. In the report, the customer satisfaction rate is used to build our machine learning algorithms. Based on the dependent variable, independent variables such as Product Quality, E-Commerce Activities, Technical Support, Complaint Resolution, Advertising, Product Line, Salesforce Image, Competitive Pricing, Warranty & Claims, New Products, Order & Billing, Price Flexibility, Delivery Speed. To interpret our result we take regression model, correlation, clustering, random forest for predictive analysis, descriptive 8 analysis of the data, and data visualization model used for model evaluation. In our result, we find a positive relationship between the dependent and independent variables.
LITERATURE REVIEW
This thesis is purely based on secondary research approaches, books, and studies analysis. This report contains both theoretical and practical implications. The literature reviews provided us with various theoretical frameworks which help us to apply consolidated interpretation in the report. (N.Gladson Nwokah & Doris Ngirika 2018) prioritize the importance of customer trust, product quality, delivery, and after-sales service leading to achieving high levels of customer satisfaction. (Yu-Cheng Lee & Yu-Che Wang 2014) there is a significant relationship between service performance and customer satisfaction. Specifically, customers were more satisfied when the service performance met or exceeded their expectations. (Kim Leng Khoo 2017) service quality and corporate image have a significant impact on customer satisfaction, which in turn influences their revisit intention and word-of-mouth recommendations. (Satnam Kour Ubeja and D.D Bedia 2012) Customer satisfaction and loyalty lead to increase profitability and sustained growth of the business. (Petr Suchánek & Maria Králová 2019) businesses should invest in training and development programs to enhance their employees' knowledge and skills, develop effective marketing strategies, and stay ahead of the competition. (Ratih Hadiantini, Silalahi, and H Hendrayati 2020) Satisfied consumers are more likely to make repeat purchases, recommend the platform to others, and leave positive reviews whereas, dissatisfied consumers may switch to competitors, leave negative reviews, and spread negative word-of-mouth. (Ismail Razak & Nazief Nirwanto 2016) focusing on customer value, companies can enhance their customer satisfaction and create a competitive advantage in the market. (Zahi Hameed 2022) service quality and customer satisfaction earn customer loyalty and retention.
RESEARCH FINDINGS
Overall, the findings of the report are customer satisfaction after analyzing different factors provided in the datasets. We have classified all those into different findings interpretations.
DESCRIPTIVE SUMMARY OF DATA
To support our hypothesis, we have taken amazon customer satisfaction datasets. We have taken 100 customers’ feedback in 14 columns. Each column is representing a different 9 customer’s rating in different aspects of the company’s operations or performances. There are no missing values for any factor.
The aspects being rated are:
? Product Quality: Out of 100 customers’ feedback, a minimum rating of 5 is rated for its product quality and the maximum rating stands at 10 whereas, 100 customers have rated an average rating of 7.8 Amazon for its product quality.
领英推荐
? E-Commerce Activities: 100 customers accumulated a rating is 3.67, whereas the minimum and maximum ratings are just 2.2 and 5.7 respectively. which directly indicates that Amazon needs to work on its e-commerce activity to improve its rating.
? Technical Support: The range of technical support ratings is quite high, with almost 7 differences, which interprets that different customers are having different views on Amazon technical support. Variance is high which says that the rating data of technical support is quite dispersed.
? Complaint Resolution: Amazon has not good technical support solution which also leads to lower-rated complaint resolutions. The company has least ratings in complaint resolution of just 5.44%.
? Advertising: The Company has a cumulative rating of 4 only. As per the literature review, the company needs to keep brand loyalty and priorities its customer for which advertisement is the only way of communication.
? Product Line: As compared to competitors of Amazon, the company has low competition from others but still product line rating is low. Customer need is not able to meet through its product line.
? Salesforce Image: Amazon’s sales force image rating is 5.12 which defines that sales service is of average quality. The maximum number of the customer-rated sales force of Amazon is below 5.
? Competitive Pricing: Cumulative rating shows Amazon is quite concerned about its pricing and policy. Which resulted in receiving a good number of customer ratings that stands at 7.
? Warranty & Claims: Amazon is customer friendly that is found by describing the warranty claim process. Amazon got a 6 rating in the warranty claim factor.
? New Product: Amazon is more concerned about its new product introduction. Variance is scattered and dispersed so much as per the descriptive data of it. 10 ? Order & Billing: Customers are satisfied mark is below average in this case. Customers are expecting a more detailed bill. order and billing factor got 4.3 ratings.
? Price Flexibility: The price flexibility rating is also the same as the product line of Amazon. Amazon works as the middleman between manufacturers and buyers. So the company has to set the price of its products.
? Delivery Speed: delivery speed rating of 100 customer show that Amazon’s delivery speed is just more than average. Less variance shows the data are not scattered. Data of delivery speed ratings are near to each other which is very concerning as Amazon has to take necessary action or else.
? Satisfaction: Highest rating in customer satisfaction is 9.9 or almost 10 whereas the average cumulative is almost 7. Variance and standard deviations are almost 1.3 which says data is not more scattered. Customers’ feedback about satisfaction depends on all other factors that are already discussed. We will do a relationship model on whether satisfaction is the dependent variable and each variable those are shown are independent variable for Satisfaction. The values in the table represent the ratings given by customers, which appear to be on a scale from 1 to 10, with higher values indicating more positive ratings. Overall, the ratings vary widely across the different aspects of the company's operations, as well as across customers. Some aspects appear to have relatively high ratings on average, such as product quality and satisfaction, while others have lower ratings, such as technical support and e-commerce activities. Additionally, there is a significant amount of variability in the ratings across customers, indicating that different customers have different experiences and perceptions of the company's operations.
CORRELATION ANALYSIS
The Pearson correlation coefficient is a measure of the linear relationship between dependent and independent variables, ranging from -1 to 1, with 0 indicating no linear correlation. Looking at the matrix, we observe that, some variables are highly positively correlated, such as Complaint Resolution and Advertising, while others are highly negatively correlated, such as Product Line and Competitive Pricing. Some variables have low or no correlation, such as Technical Support and New Products
The matrix is used to identify relationships between variables and to identify potential areas of improvement in the Amazon. There is a high negative correlation between Product Lines and Competitive Pricing, the Amazon should consider adjusting its pricing strategy to improve sales in certain product lines. Similarly, if there is a high positive correlation between Complaint Resolution and Advertising, Amazon should consider increasing its advertising spend to improve customer satisfaction and resolve complaints more effectively.
REGRESSION ANALYSIS
As per the regression analysis on product quality rating and satisfaction, we found that the regression line is upward sloping as product rating increases and customer satisfaction increases. Same for all other factors regression model is shown as upward sloping. We tried with cumulative independent variables about customer satisfaction we found that line formation is upward sloping. This indicates that if the dependent variable rating increase satisfaction rating will increase and vice-versa. This output of the regression model represents the results of a linear regression model, where customer satisfaction is the response variable and the Independent is the predictor variable. Here's an interpretation of the output: The intercept is -0.00300, which means that when the value of "Independent" is zero, the predicted value of customer satisfaction is -0.00300. The coefficient for "Independent" is 1.22745, which means that for every unit increase in "Independent", the predicted value of customer satisfaction increases by 1.22745. The p-value for the coefficient of "Independent" is 8.13e-12, which is less than the significance level of 0.05, indicating that there is strong evidence that "Independent" has a significant effect on customer satisfaction. The R-squared value is 0.3838, which means that 38.38% of the variance in customer satisfaction can be 12 explained by "Independent". The adjusted R-squared value is 0.3775, which is slightly lower than the R-squared value, suggesting that the additional predictor variables may not contribute much to the model. The F-statistic is 60.42 with a p-value of 8.126e-12, indicating that the overall model is significant.
PREDICTIVE ANALYTICS In our report, we used random forest algorithms to interpret the predictive analysis. Based on the dependent variable, it is found that all independent variables have positive and negative impacts. By using random forest, we can find the error and accuracy of the independent data over the dependent variable. The random forest model predicts the dependent variable "Satisfaction" based on a set of independent variables: Product Quality, E-Commerce Activities, Technical Support, Complaint Resolution, Advertising, Product Line, Salesforce Image, Competitive Pricing, Warranty Claims, New Products, Order Billing, Price Flexibility, and Delivery Speed. As a result, the model is used to make predictions on new data such as coefficients and goodness of fit measures. From the model, we found that the R2 (R-squared) value of 0.6953466 indicates that 69.53% of the variance in the dependent variable is explained by the independent variable(s) in the regression model. In other words, the model can account for a significant proportion of the variability in the data and is a reasonably good fit for the data. However, there is still 30.47% of the variance that remains unexplained by the model. Based on our data RMSE(Root Mean Squared Error) value is 0.646which is reasonably low. It suggests that the model has a good predictive performance. However, the RMSE value also depends on the scale of the customer satisfaction rating (dependent variable), so it is important to consider the context of the problem and the range of values of the Customer Satisfaction Rating (dependent variable). Again, MAE (Mean Absolute Error) is 0.523 which indicates, on average, predictive values of the customer satisfaction rating are off by 0.523 units from the actual values. In our report, this score of MAE states that, the average absolute difference between predicted values and the actual values of the customer satisfaction rating in a regression model. MAE provides a similar estimate of the model's predictive performance as the RMSE, but it is less sensitive to outliers in the data. A lower MAE value indicates that the model is better at predicting the values of the customer satisfaction rating (dependent variable). In this case, an MAE of 0.523 is reasonably 13 low, which suggests that the model has a good predictive performance. However, as with the RMSE, the interpretation of the MAE value also depends on the scale of the dependent variable, so it is important to consider the context of the problem and the range of values of the customer satisfaction rating(dependent variable).
DATA VISUALIZATION
To understand the concept of rating we used a box plot format to understand the theoretical assumption taken in the descriptive summary. Here in the plot, we only have taken eight independent variables to understand. E-Commerce Activity has very small variation in which the maximum rating is shown within 3 to 4 ratings. Product quality and competitive pricing have a maximum rating and are also more dispersed. Same as technical support and advertising are moderately rated from 3 to 7 and measures of dispersion are so high. Measures of dispersion is high means the rating rated by customers are different for different customer. Some are satisfied with the service some are not. From the second box plot, we interpret that, the customer satisfaction rate is between 6 to 8 most of the time. In the conclusion, all the ratings cumulatively affect customer satisfaction ratings. Amazon must work on lower rate factors to improve the satisfaction level above 8.
SUMMARY AND CONCLUSION
Based on available data, Amazon has consistently ranked high in terms of customer satisfaction. Furthermore, Amazon's customer-centric approach, extensive product selection, and efficient delivery system have contributed to its success in satisfying customers. Amazon's customer service is also highly rated, with various channels for customers to seek assistance and prompt resolution of issues.
From the regression analysis, we found that, if amazon increases the service quality in those whose ratings are quite low or less than 5, amazon can increase its customer satisfaction rating automatically.
Amazon must improve its e-commerce activity, Price flexibility, delivery speed, and Advertising activity to improve its customer satisfaction level. The box plot suggests that these lower-rated factors have maximum dispersion below 5.
In conclusion, Amazon has a strong reputation for providing exceptional customer service, and its focus on customer satisfaction has helped it become a leading e-commerce platform. To maintain the same amazon must improve the ratings of lower-rated factors.?