登录查看更多内容

Product categorisation: recent work

Muthusamy Chelliah

发布日期: 2023年1月20日

Earlier work: Automatic categorization in product catalog (state-of-the-art)

Product classification automatically predicts a taxonomy path in a predefined hierarchy given a textual description or title. For efficiency, a suitable representation for the document feature vector and fast algorithms for prediction are required. A distributional semantics representation and two-level ensemble approach [Gupta 16] utilize (wrt taxonomy tree) path-wise, node-wise and depth-wise classifiers to reduce error in the final task.?

Data size, category skewness and noisy metadata are challenges in product categorization. DeepCN [Ha 16] is an end-to-end model using multiple RNNs dedicated to metadata attributes for generating features from text metadata and fully connected layers for classifying item categories from the generated features. Categorization errors are propagated back through fully connected layers to RNNs for weight update in the learning process. This allows diverse attributes to be integrated into a common representation thus overcoming sparsity and scalability.

Fine-grained leaf categories in taxonomies are defined by the most descriptive/specific words of products. Finding them remains challenging due to blurred concepts (i.e. multiple equivalent or synonymous categories), unstable category vocabulary (i.e. emerging new products and evolving language habits), and lack of labeled data. Proposed model (NPC) [Chen 19] to address these is equipped with a character-level convolutional embedding layer to learn compositional word representations, and a spiral residual layer to extract word context annotations capturing complex long range dependencies and structural information. A product is categorized by jointly recognizing categories from description and predicting categories from predefined category vocabularies. Furthermore, to avoid extensive human labor, NPC is able to adapt to weak labels, generated by mining the search logs, where the customer’s behaviors naturally connect products with categories.

Product classification in a multi-level (or hierarchical) setting is not suitable for a dynamic taxonomy. An open-world model (OW Learning) instead can automatically classify a product based on a set of categories in an e-commerce platform with various sellers. An emerging product not belonging to any existing category herein should be classified as unseen. Further, this unseen set may keep growing. When the number of products belonging to a new category is large enough, it should be added to the existing set. An open-world model should easily accommodate this addition with a low cost of training since it is impractical to retrain the model from scratch every time a new class is added. L2AC [Hu 19] leverages the huge amount of product descriptions from public datasets and forms thus the OWL task : each path to a leaf node tree-structured category system is considered as a class; products belonging to multiple classes are removed to ensure the classes have no overlap.

Existing algorithms take title/description as input to classify a product into a leaf category. A paradigm based on machine translation (MT) instead [Tan 20] converts a product’s natural language description into a sequence of tokens representing a root-to-leaf path in a product taxonomy based on the vocabulary of categories. This generation allows new paths to be created based on product titles. Although such system-created paths utilize existing nodes in a product taxonomy tree, paths (which are permutations of nodes) need not pre-exist in the tree. When the paths are added to the tree to form new edges between previously unconnected nodes in a tree, they transform the tree into a DAG, which offers a richer representation of the product. Such category paths better accommodate a user’s many conceptualizations of a product thus promoting user-friendly navigation and are more adaptable to new products.

领英推荐

The Right Machine Learning Lifecycle Tool?

Soumya D (. 6 个月前

What to Expect from a Good RAG System

Rendy Bambang Junior 3 个月前

AI Projects That Developers Will Love

Hanu Koshti 4 个月前

Building category trees that reflect users’ dynamic interests is carried out by taxonomists. This manual construction however leads to outdated trees as it is hard to keep track of market trends. While taxonomists can identify candidate categories, i.e. sets of items with a shared label, most such categories cannot simultaneously exist in the tree, as platforms set a bound on the number of categories an item may belong to. To address this setting, tree construction is formalized [Avron 22] where categories are maximally similar to desirable candidate categories while satisfying combinatorial requirements and provide a model that captures practical considerations. Two heuristic algorithms, are demonstrated to be effective over datasets from real-life e-commerce platforms, far exceeding the worst-case bounds. A natural special case is identified for which a solution with tight approximation guarantees is devised. The approach facilitates continual updates and maintaining consistency with an existing tree. Finally, input candidate categories derived from result sets to recent search queries are included to reflect dynamic user interests and trends.

[Gupta 16] Product Classification in E-Commerce using Distributional Semantics

[Ha 16] Large-Scale Item Categorization in e-Commerce Using Multiple Recurrent Neural Networks

[Chen 19] Fine-Grained Product Categorization in E-commerce

[Hu 19] Open-world Learning and Application to Product Classification

[Tan 20] E-Commerce Product Categorization via Machine Translation

[Avron 22] Automated Category Tree Construction in E-Commerce

Stephanie Horbaczewski

2 年

We’re launching our fused multimodal foundational model for e-commerce with categorization benchmarks. We’d love to hear more about your categorization work!

6 次回应

查看更多评论

要查看或添加评论，请登录

Muthusamy Chelliah的更多文章

Next Basket Recommendation - Potpourri (SOTA)

2023年4月22日

Next Basket Recommendation - Potpourri (SOTA)

Next basket recommendation (NBR) aims to infer a set of items that a user will purchase at the next visit by…
Next Basket Recommendation - Potpourri (Recent)

2023年4月20日

Next Basket Recommendation - Potpourri (Recent)

Traditional recommender systems mainly aim to model inherent and long-term user preference, while dynamic user demands…
Repeat purchase recommendation for consumable replenishment: SOTA

2023年4月18日

Repeat purchase recommendation for consumable replenishment: SOTA

In e-commerce and retail industry, a user purchases a set of items (a basket) at a time. Recommending items for the…
Aspect/sentiment-aware review summarization (SOTA)

2023年4月15日

Aspect/sentiment-aware review summarization (SOTA)

Several pipeline methods [Bhaskar 22] apply GPT-3 to summarize a large collection of user reviews in a zero-shot…
Aspect/sentiment-aware review summarization (Recent)

2023年4月13日

Aspect/sentiment-aware review summarization (Recent)

Existing unsupervised, opinion summarization techniques follow a two-stage framework: first creating synthetic…
Aspect/sentiment-aware review summarization (Seminal)

2023年4月10日

Aspect/sentiment-aware review summarization (Seminal)

Opinion summarization has been traditionally approached with unsupervised, weakly supervised and few-shot learning…
Text generation [3]: explainable recommendation

2023年1月12日

Text generation [3]: explainable recommendation

Current approaches to generating sentence explanations are either limited to predefined templates, which restrict…
Text generation [2]: product reviews

2023年1月9日

Text generation [2]: product reviews

Building data-driven models that can generate reviews for the given products/ratings helps understand how a specific…
Multimodal product summarization

2023年1月3日

Multimodal product summarization

Existing approaches for generating a concise/readable product summary given its long text description and image suffer…
Comparative summarisation for explainable recommendation

2022年12月27日

Comparative summarisation for explainable recommendation

Earlier, relevant articles Comparative summarization of product reviews Explainable product recommendation:…

See all articles

Product categorisation: recent work

Muthusamy Chelliah

Earlier work: Automatic categorization in product catalog (state-of-the-art)

领英推荐

Muthusamy Chelliah的更多文章

社区洞察

其他会员也浏览了

model deployment

An Effective Content Extraction Workflow That I Trust

Technologies for Running a Machine Learning Environment

How I use Gen AI?

Day 17: Building Reusable Components in MLOps

Day 17: Building Reusable Components in MLOps

How to build your own RAG chatbot using LangChain and Streamlit

Building a Multi-Agent Assistant with Advanced Retrieval Augmented Generation (RAG)

Technologies for Running a Machine Learning Environment

Introducing Pro-ML: LinkedIn’s Architecture for Enabling Machine Learning at Scale

Earlier work: Automatic categorization in product catalog (state-of-the-art)

领英推荐

Muthusamy Chelliah的更多文章

Next Basket Recommendation - Potpourri (SOTA)

Next Basket Recommendation - Potpourri (Recent)

Repeat purchase recommendation for consumable replenishment: SOTA

Aspect/sentiment-aware review summarization (SOTA)

Aspect/sentiment-aware review summarization (Recent)

Aspect/sentiment-aware review summarization (Seminal)

Text generation [3]: explainable recommendation

Text generation [2]: product reviews

Multimodal product summarization

Comparative summarisation for explainable recommendation

社区洞察

其他会员也浏览了

model deployment

An Effective Content Extraction Workflow That I Trust

Technologies for Running a Machine Learning Environment

How I use Gen AI?

Day 17: Building Reusable Components in MLOps

Day 17: Building Reusable Components in MLOps

How to build your own RAG chatbot using LangChain and Streamlit

Building a Multi-Agent Assistant with Advanced Retrieval Augmented Generation (RAG)

Technologies for Running a Machine Learning Environment

Introducing Pro-ML: LinkedIn’s Architecture for Enabling Machine Learning at Scale