Neural network pipeline for QA in IVF laboratory
Serdj Sergeev
Embryology lab Director, Neural network development for IVF data analytics
The term "pipeline" originates from business and engineering, where it refers to a sequence of processes or steps that transform raw inputs into a desired output. In the context of business operations, a pipeline often denotes a structured, repeatable process that ensures consistency, quality, and efficiency from start to finish. The term has been widely adopted in fields like software development and data science, where it signifies a series of automated steps or workflows designed to achieve a specific goal. In our approach, the use of the term "pipeline" is intentional and carefully chosen to reflect the structured, systematic nature of our neural network-based method for predicting pregnancy probabilities in IVF protocols. By drawing a parallel between business pipelines and our IVF process, we emphasize the importance of a continuous, integrated workflow that not only predicts outcomes but also supports ongoing quality assurance and risk management.
The IVF process involves multiple stages—each crucial for the final outcome. From patient preparation and ovarian stimulation to embryo culture and transfer, each step must be meticulously managed to maximize the chances of success. Our neural network pipeline mirrors this process, taking into account various clinical and laboratory parameters, analyzing them in a sequential manner, and producing a comprehensive assessment of pregnancy probabilities. By structuring our approach as a pipeline, we ensure that each step in the IVF process is systematically evaluated and optimized.
Our data-driven approach is designed to enhance quality assurance by continuously monitoring and evaluating key performance indicators (KPIs) throughout the entire IVF process. By incorporating a pipeline approach, we can identify potential risks and inefficiencies at various stages of the treatment, allowing for timely interventions and adjustments. This proactive risk management is crucial for improving overall IVF success rates and minimizing the likelihood of adverse outcomes. The model's predictions are not only used to forecast outcomes but are also compared with actual clinical results. This comparison allows us to refine the pipeline over time, improving its predictive accuracy and making it more robust. The iterative nature of our pipeline ensures that it adapts to new data and evolving clinical practices. Unlike traditional approaches that may focus solely on real-time decision-making, our pipeline extends its utility into the post-analytical phase of IVF. This is akin to a business pipeline that continues to deliver value even after the initial product or service has been delivered. By analyzing the outcomes of IVF protocols in retrospect, our pipeline provides critical insights into the factors contributing to both successes and failures. This comprehensive approach ensures that lessons learned from past treatments are integrated into future protocols, thereby enhancing the overall quality and effectiveness of the IVF process.
Our developed neural network pipeline leverages the improved DNN model to measure and predict the probability of clinical pregnancy occurrence across different IVF treatment protocols. This pipeline not only integrates a comprehensive set of laboratory and clinical parameters but also provides a systematic approach to quality assurance (QA) in the IVF laboratory. The pipeline's performance is comparable to or exceeds the predictive capabilities of traditional methods.
DNN model prefomance
This model was trained on our data, with a mean CPR per embryo transfer of 61.93%. The metrics for the model after fitting were: test accuracy = 0.72, AUC = 0.79, PRC = 0.69, precision = 0.72, recall = 0.52, F1 score = 0.61, and MCC = 0.41.
While various studies have employed deep learning models to predict pregnancy outcomes in IVF. The may focus of them relie on "black-box" time-lapse approaches like "Scores" of embryos with limited interpretability and generalizability. Instead of that our DNN model has calibrated predictions with logistic regression and provide predictions in celear interpretable format - pregnancy probabbility in % of that chance. Moreover, traditional approaches often lack detailed comparisons to the predictive performance of clinical embryologists, which is crucial to validate their effectiveness.
Data collection
We applied our neural network data-driven analytical system to data from embryo transfer cyles that were perfomend in 2024 year (288 ET with known implantation outcome). The predictions of clinical pregnancy rates were compared against actual outcomes across various time intervals, including quarterly and monthly analyses. Detailed comparisons were made between the model's outputs and the actual clinical results.
Results
After completing the full training process, the predicted CPR was 56.18%, which showed no significant difference (p = 0.1144) from the actual CPR in our clinic. Utilizing this DNN model, we can compare actual and predictive CPR across time intervals to understand the likelihood of achieving pregnancy. With our neural network model, we established a lower threshold limit for the probability of clinical pregnancy occurrence for each year of operation. A significant difference (p < 0.05) was noted for patients in 2021-2022 years compared to 2023 year, with a decrease in the likelihood of clinical pregnancy ranging from 10% at the beginning of the year to 20% after the third quarter of 2023. The theoretically calculated probabilities using the DNN model align with the actual pregnancy frequency reports in these specified time intervals. In other words, our KPI calculation and DNN model prediction analysis demonstrate that the decrease in clinical pregnancy rates from the third quarter of 2023 is a process not directly related to the quality of stimulation (patient preparation) or laboratory work but depends on the patients' initial clinical data.
Data analysis showed that our developed model demonstrates high accuracy in predicting pregnancy occurrence. This is particularly evident when examining quarterly and monthly graphs, where a close correspondence between predicted and actual pregnancy rates is observed. For instance, in Q2 2024 after a comprehancive qallity control system integration in our work, the predicted pregnancy rate was 58.9% compared to the actual rate of 59.1%, showing a difference of merely 0.2 percentage points.
Temporal Dynamics and Model Improvement
The time series chart of pregnancy probabilities demonstrates significant variability in predictions at the individual case level, underscoring the complexity of predicting IVF success. However, when aggregating data at monthly and quarterly levels, there is a clear trend towards improved prediction accuracy over time. This improvement is particularly noticeable from late 2023 to 2024, where predicted and actual rates almost coincide. For example, the mean absolute error (MAE) between predicted and actual rates decreased from 0.15 in Q1 2022 to 0.03 in Q2 2024, representing a statistically significant improvement (p < 0.05, paired t-test).
This improvement can be explained by several factors:
1. Increase in the volume of training data for the model over time.
2. Iterative improvements in the model algorithm with more cleaned data and combination with the KAN network model.
3. Stabilization of laboratory parameters in implemented QC program, increasing the predictability of outcomes.
Deviations and Variability
Despite overall high accuracy, there are periods when the model's predictions deviate from actual results. These deviations can be explained by the following factors: changes in clinical control: improvements in stimulation protocols, embryo transfer techniques may influence actual results that the model cannot immediately account for; and external factors: seasonal fluctuations, changes in patient composition, or other medical factors may influence results without being fully accounted for in the model.
For instance, in Q2 2022, the model predicted a pregnancy rate of 14.3% while the actual rate was 43.3%, representing the largest observed discrepancy in our dataset. It was a period of lab reconstruction and inadequate functionality and we exlude it from the observation.
With Baes method we perfomed a prediction of the future perspectives in clinical pregnancy rate acievement, that can be a threshold limit for our future perfomance mesurement and a boarderline for our results expectation:
Seasonal Patterns
An interesting observation is the presence of recurring seasonal patterns in pregnancy rates. These patterns can be explained by several hypotheses:
1. Biological factors: Seasonal changes in hormonal background or oocyte quality may affect IVF success.
2. Behavioral factors: Seasonal changes in patients' lifestyle (e.g., diet, physical activity, stress) may influence IVF outcomes.
3. Operational factors: Seasonal changes in clinic operations (e.g., staff workload) may affect results.
Our data shows that pregnancy rates tend to be higher in the spring and fall months. For example, the average pregnancy rate in April-May and September-October was 62.5%, compared to 54.8% in July-August and December-January. This difference was statistically significant (p < 0.05, two-sample t-test).
领英推荐
QA in problem cases
Utilizing our pipeline aproach we perfomed a comprehensive comparison of distributions for various key factors between problem and non-problem cases in IVF cycles. The analysis covers nine different parameters, each represented by a separate histogram.
1. Number of COCs (Cumulus-Oocyte Complexes):
The distribution for non-problem cases shows a higher peak and is shifted slightly to the right compared to problem cases. This suggests that non-problem cycles tend to yield a higher number of COCs, which could be indicative of better ovarian response to stimulation.
2. Fertilization rate:
Both distributions appear relatively similar, with a slight right-skew. However, the non-problem cases show a higher peak at the upper end of the distribution, indicating a tendency towards higher fertilization rates in successful cycles.
3. BL rate (Blastocyst rate):
The distributions for both groups are bimodal, but non-problem cases show a more pronounced peak at higher BL rates. This implies that successful cycles more frequently achieve higher blastocyst formation rates.
4. TGBDR (Totall Good Blastocyst Development Rate):
The distributions are similar, but non-problem cases show a slightly higher peak and are shifted somewhat to the right, suggesting better development of good blastocysts in successful cycles.
5. Frequency of receiving COCs:
Both distributions are highly skewed towards the right, with a sharp peak near 1.0. This indicates that most cycles, regardless of outcome, tend to successfully retrieve COCs.
6. KPIScore:
Non-problem cases show a distribution shifted noticeably to the right compared to problem cases, with a higher peak at higher scores. This suggests that higher KPIScores are associated with more successful outcomes. That statement approved by our previous studies.
7. MII (Mature oocytes):
The distribution for non-problem cases is shifted to the right and has a higher peak, indicating that successful cycles tend to yield more mature oocytes.
8. Number of embryos on day 5:
Non-problem cases show a distribution with higher counts at lower numbers of embryos, while problem cases have a more even distribution across a wider range. This could suggest that quality rather than quantity of embryos by day 5 is more indicative of success.
9. Day 6-7 development:
Both distributions are right-skewed, but non-problem cases show a higher peak at lower values, suggesting that successful cycles may have more embryos developing earlier rather than later.
So, successful IVF cycles generally tend to have higher numbers of COCs and mature oocytes, indicating better ovarian response and oocyte quality; higher blastocyst formation rates are associated with non-problem cycles; the KPIScore appears to be a good predictor of cycle success, with higher scores correlating with better outcomes; fertilization rates show less pronounced differences, suggesting that other factors may be more critical in determining cycle success.
These findings provide valuable insights into the factors differentiating successful and unsuccessful IVF cycles, potentially guiding improvements in clinical protocols and patient management strategies.
Opportunities for further model improvement:
1. Adaptive learning: Implementing mechanisms allowing the model to adapt more quickly to changes in clinical practice.
2. Accounting for seasonality: Including seasonal factors in the model may increase prediction accuracy.
3. Expanding the feature set: Including additional parameters, such as detailed embryo characteristics or genetic markers, may improve prediction accuracy.
4. Personalization: Developing sub-models for different patient groups can improve prediction accuracy for individual cases.
Conclusion
Our neural network model demonstrates high accuracy in predicting the probability of pregnancy occurrence in IVF protocols. The observed deviations and seasonal patterns provide valuable information for further model improvement and optimization of clinical practice. The ongoing refinement of the model, combined with improvements in clinical and laboratory processes, promises to further increase the effectiveness of IVF treatment for infertility.
In this study, we proposed a neural network-based approach to predict pregnancy probability, demonstrating significant advantages in both predictive accuracy and quality assurance (QA) within the IVF laboratory. Our method differs fundamentally from approaches like those described in the traditional time-lapse approach, which primarily focuses on interpreting and measuring morphokinetic of human embryos during the in vitro culture phase. Time-lapse excels in real-time analysis by providing interpretable measurements that either match or surpass human embryologists' capabilities, it is primarily designed to optimize embryo selection but it is nothing when we dont have anough embryos et the end of embryology work. Alsow it is inherently limited to the period of embryo development and does not extend its utility into the post-analytical phase of IVF treatments.
Our new pipeline, in contrast, integrates a deep neural network (DNN) model that not only predicts clinical pregnancy outcomes but also extends into the post-analytical phase. This allows for comprehensive QA and risk minimization beyond the immediate embryo selection process. By incorporating a wide range of laboratory and clinical parameters into our model, we can evaluate the overall quality of the IVF protocols and identify potential issues in the system that may contribute to unsuccessful outcomes. Our method provides a broader perspective by tracking and analyzing clinical pregnancy probabilities over time. This enables us to not only predict outcomes but also retrospectively assess the efficacy of IVF protocols. By comparing predicted and actual clinical pregnancy rates across different time intervals, our approach aids in identifying trends, anomalies, and potential risks within the IVF process.
Importantly, our method's post-analytical capabilities allow for a deeper investigation into failed IVF treatment cycles. By evaluating KPIs and identifying areas where the system's control mechanisms might have faltered, we can proactively address issues that could negatively impact pregnancy outcomes. This holistic approach ensures that quality assurance is maintained throughout the entire IVF process, from embryo culture to the final clinical outcome.