Techtonic Waves: Manufacturing Process Optimization through Explainable Machine Learning
Pangea Tech
Pangea Tech is a global product-focused company with leading capabilities in AI, Analytics, and Technology Services
Introduction
Welcome to Techtonic Waves, where we explore how AI solutions create measurable business impact. Each edition examines real-world challenges and their innovative solutions, providing a detailed analysis of how organizations harness AI to drive efficiency and growth.
This week, we delve into how a virtual metrology solution based on Machine Learning and explainable AI revolutionized production processes for a global manufacturing company. Facing significant challenges in measuring critical material properties—with testing limited to once every 12-16 hours—the company was operating largely on educated guesses rather than data-driven decisions. Here's how we transformed their approach to real-time process monitoring and optimization.
The Challenge: Limited Visibility in Critical Manufacturing Processes
In manufacturing environments, certain processes are notoriously difficult to monitor through direct testing. This challenge stems from several factors:
Our client, a leading National Metals Manufacturing company with a revenue of close to $700 Million, faced precisely this dilemma with a critical process in the manufacturing chain. Physical testing was limited to approximately once every 12-16 hours—in ideal circumstances, this meant morning and evening readings, but in practice, it often translated to just one measurement per day.
This severe limitation in measurement frequency created a significant blind spot in operations. Process deviations occurring between readings went undetected until the next scheduled test, by which time material quality could be significantly affected. To compensate for this lack of visibility, the engineers relied on their accumulated experience and “Thumb-rule” formulas.
As the Operations Director explained: "Static formulas work well in textbooks, but in real-world manufacturing environments, there are too many variables and conditions that make those formulas incomplete or inaccurate."
The engineers thus frequently under or over-corrected the process, leading to, predictable but costly consequences—the resulting quality deviations reduced yield and thus lengthened production schedules for all operations downstream.
Solution Design: Machine Learning with Explainability
After thorough analysis of the plant's requirements, PangeaTech proposed and implemented a solution centered on machine learning with explainability capabilities, supplemented by retrieval-augmented generation (RAG) for operational guidance.
Addressing Data Limitations
The first challenge encountered was the limited historical data available—a common obstacle in manufacturing environments still early in their digitization journey. The plant had only digitized approximately two years of data, amounting to roughly 1,200 data points (2 readings per day × 600 days). This fell far short of the typical 10,000 data points recommended for robust machine learning models.
To overcome this limitation, our team employed three innovative approaches:
Feature Engineering and Discovery
Initial discussions with plant engineers identified 13 process parameters that theoretically influenced the material property being measured. However, our experience in both analytics and manufacturing environments suggested that real-world processes are affected by many more factors than theoretical models account for.
Through collaborative sessions with operators, engineers, and quality control officers, we expanded the parameter list to 50 distinct variables. This comprehensive approach captured not just the textbook factors but also the plant and environment-specific variables.
To determine which of these parameters truly impacted the process and to what degree, we conducted over 200 experimental model iterations. This rigorous feature engineering process allowed us to build a model that captured the complex interrelationships affecting the material property in question.
Model Development and Refinement
The initial model achieved 80-85% accuracy in predicting the material property of interest. Through an iterative process of collecting additional data, model retraining, and adjusting model parameters, we progressively improved performance to 94-95% accuracy at the end of the three-month rollout period.
We ultimately implemented an XGBoost machine learning model that delivers hourly predictions for all production trains, increasing measurement frequency from once every 16 hours to once every hour, which provides significantly improved visibility into production operations.
Explainability: Moving Beyond Black-Box Predictions
A critical innovation in our approach was the implementation of explainability features that address a fundamental limitation of conventional machine learning: the "black box" problem. As our Senior Experts?noted, "The great upside of machine learning is that it can detect patterns humans can't—but that also means it sometimes gives answers that seem counterintuitive to experts."
Our explainability framework provides two key capabilities:
Operational Integration through RAG
To bridge the gap between predictions and actions, we implemented a Retrieval-Augmented Generation (RAG) system trained on the company's Standard Operating Procedures (SOPs) and technical documentation. This integration serves two critical functions:
Importantly, these recommendations are grounded in the company's own documentation rather than generated by AI. As our solution architect emphasized, "The system isn't telling operators what to do—it's helping them quickly access the company's established procedures for the specific situation they're facing."
Visualization and Decision Support
The entire solution is delivered through interactive dashboards, providing:
Impact: Transforming Operations Through Data-Driven Decision Making
The implementation has fundamentally transformed how the plant manages this critical process:
The plant now relies on this system daily for operational decision-making. Process variability has decreased significantly, with corresponding improvements in product consistency, yield, and resource utilization.
As summarized by the Plant Manager: "We've moved from reactive to proactive process management. Instead of discovering problems after they've affected our output, we're identifying and addressing issues before they become significant."
Traditional Six Sigma methodologies rely on tools like Multivariate Analysis (MVA), Designed Experiments (such as Factorial Design), and Response Surface Analysis to optimize processes and reduce variability. However, by integrating Machine Learning and Retrieval-Augmented Generation (RAG), the analysis becomes more comprehensive, explainable, and adaptive. These AI-driven approaches not only enhance data-driven decision-making but also seamlessly embed Change Management and Control—the final steps of the DMAIC process—directly into the solution, ensuring sustained improvements from the outset.
Stay Tuned for More AI & Tech Insights!
Techtonic Waves will bring you valuable knowledge every week, empowering you to make informed decisions in the age of AI.
We invite you to join us on this journey. Engage, learn, and lead the way with AI-driven transformation.
Comment, Follow Pangea Tech, and subscribe to Techtonic Waves to stay ahead of the curve. Let's embrace the future, together.