登录查看更多内容

Cost savings by using DeepSeek R1 for Product Taxonomy Classification

Itransition Group

Software development company

发布日期: 2025年2月12日

Problem statement

We’ve recently encountered a challenge: automating the classification of a large volume of products into a specialized taxonomy without a sufficient annotated dataset for a classification model training. Our client operates an online marketplace that aggregates goods from various sellers, with each employing different data quality standards and diverse data population methods. On average, we process around 1,000 products daily, and our database consists of over 100,000 products spanning more than 2,000 categories.

Overall solution architecture

To address our classification needs, we developed a reasoning-driven LLM agent designed to map products to the most appropriate categories based on detailed product attributes.?

A reasoning-driven LLM agent solution's architecture — Solution architecture

Our primary steps included:

Data Enrichment: We supplemented each product listing with additional information like manufacturer part number and brand.
Taxonomy Descriptions: We generated descriptive keywords to produce structured taxonomy data for the products.
Closest Category Matching: We compared these product descriptions to predefined taxonomy categories, pinpointing the closest matches.
Category Refinement: Finally, we further refined the selection to ensure accuracy.

Within the pilot version and limited amount of request, we relied on GPT-4o for this task. However, after the successful pilot before the production usage where it should have replaced all manual checks, we decided to replace it with self-hosted LLM and after testing chose DeepSeek R1 Mini.

Transition to DeepSeek R1 Mini

By leveraging LangGraph, we decoupled our solution from specific model dependencies, enabling a seamless switch from GPT-4o to R1 Mini. We used Ollama's wrapper for model communication, requiring no additional modifications to our existing infrastructure.

领英推荐

How Can Data Quality be Increased for ML Models?

Xorbix Technologies, Inc. 1 个月前

Future Trends in Data Quality: AI and Machine Learning

XenonStack 2 个月前

AutoML Revolution: Future of Automated Machine…

DataThick 1 年前

Key differences

Switching to the R1 Mini introduced the need for a slightly different output pattern, as it incorporates a built-in reasoning phase not present in GPT-4o. Consequently, we adjusted our LangChain-based pipeline by adding an extra processor to align with the R1 Mini's format specifications. Beyond that minor alteration, our workflow remained largely intact, thanks to the robust abstraction layer provided by LangGraph.

Results and benefits

Consistent Accuracy: Our product classification accuracy remained on par with GPT-4o-based processes. We've compared 200 products with 70 expected categories from a test dataset – 98% of categories were assigned the same way as GPT-4o and 2% required manual review (no false positives found).
Single-Machine Deployment: The entire solution now runs on a single GPU-based machine, simplifying our infrastructure.
Extended Processing Time: The runtime increased by roughly 4x (from 2 hours to 8 hours), but this processing speed was still acceptable to the business.
Cost Savings: Initially, we spent around €100 per 1,000 products processed. Now, running 8 hours GPU costs us €10. Further savings might be achieved by using spot instances.

Conclusion

Shifting to the DeepSeek R1 Mini provided both financial relief and flexibility, all without sacrificing classification precision. The ability to effortlessly switch models – made possible by LangGraph – proved critical for maintaining a smooth operational flow. We expect this approach to remain sustainable and cost-effective as our product listings continue to expand.

#AI #DeepLearning #ProductClassification #LangGraph #CostOptimization

Cost savings by using DeepSeek R1 for Product Taxonomy Classification

Itransition Group

Software development company

Problem statement

Overall solution architecture

Transition to DeepSeek R1 Mini

领英推荐

Key differences

Results and benefits

Conclusion

其他会员也浏览了

How LLMs Unlock Insights And Transform Unstructured Data

Empowering Intelligence: Automated Machine Learning (AutoML) Unveiled - Making Machine Learning Accessible to All

Smarter, better, faster documentation with Secoda AI: Secoda Wrap 30

Solving the Machine Learning Puzzle: Qi Platform's MLAAS Module Explained ??

ML Modeling and Output Integration: A Data Scientist's Guide for 2025

Big News at alwaysAI...

ML Operationalization: Building a path to real-world business success

The AI-Data Connection: Why Anthropic’s MCP Matters

From Blank Canvas to a Brilliant Presentation with AI

Retrieval-Augmented Generation (RAG) Ecosystem