Multi-objective learning-to-rank in product search
Learning a ranking model in product search involves satisfying many requirements such as maximizing relevance of retrieved products wrt user query, as well as maximizing the purchase likelihood of these products. Label aggregation is a popular solution approach for multi-objective optimization (MOO) by aggregating the multiple labels of training examples, related to different objectives, to a single label. A stochastic label aggregation method randomly selects a label per training example according to a given distribution over the labels [Carmel 20]. A theoretical proof shows that any optimal solution can be generated by a proper parameter setting of the stochastic aggregation process.
Query-item relevance labels typically used to train models used in learning-to-rank (LTR) for information retrieval are often noisy measurements of human behavior, e.g., product rating for search. Coarse measurements make ground truth ranking non-unique wrt a single relevance criterion. To resolve ambiguity, it is desirable to train a model using many relevance criteria, giving rise to Multi-Label LTR. Moreover, it formulates multiple goals that may be conflicting yet important to optimize for simultaneously, e.g., product quality and purchase likelihood so as to increase revenue. [Mahapatra 22] employs MOO wherein information from labels are combined to meaningfully characterize trade-off among goals.
Embedding-based approaches (EBR) need to be strong in both relevance estimation and personalized retrieval; existing models learn to rank positive item before negatives in each single-positive training sample and do not take relations between multiple positive and negative items in the same page view into account. This damages retrieval performance of EBR models. A Multi-Objective Personalized Product Retrieval (MOPPR) model is proposed with four hierarchical optimization objectives [Zheng 22]: relevance, exposure, click and purchase. Entire-space multi-positive samples are constructed to train MOPPR and a modified softmax loss is adopted for optimizing multiple objectives.
领英推荐
Ensemble models in E-commerce combine predictions from multiple sub-models for ranking and revenue improvement. Point-wise scoring approach however disregards relationships between items and leads to homogeneous displayed results, while diversified display benefits user experience and revenue. Also, learning paradigm focuses on ranking metrics and does not directly optimize revenue. RAEGO [Wang 22] replaces ensemble model with a contextual Rank Aggregator (RA) and explores best weights of sub-models by the Evaluator-Generator Optimization (EGO). A new rank aggregation algorithm TournamentGreedy produces the best average weighted Kendall Tau Distance (KTD) amongst all the considered algorithms with quadratic time complexity. Under the assumption that the best output list should be Pareto Optimal on the KTD metric for sub-models, RA algorithm has higher efficiency and coverage in exploring optimal weights. Combined with bayesian optimization/gradient descent, optimal weights are found for sub-models given a chosen RA model is solved.