Bridging the Gap Between AI and Clinical Practice with Cumulative Probability of Pregnancy in IVF Protocols Analytics

Bridging the Gap Between AI and Clinical Practice with Cumulative Probability of Pregnancy in IVF Protocols Analytics

The application of artificial intelligence (AI) in reproductive medicine has unlocked transformative opportunities for improving outcomes in IVF protocols. In our latest study, we integrated Transformer-based neural networks and Kolmogorov-Arnold Networks (KAN) to develop an ensemble model for predicting clinical pregnancy rates (CPR). By leveraging advanced calibration methods, we aimed to enhance the reliability and interpretability of our predictions.

To ensure accurate probabilistic predictions, we employed the Conformal Prediction framework (CREPES). This approach allowed us to not only produce calibrated probabilities but also quantify the uncertainty through a robust 95% confidence interval. By integrating CREPES calibration into our ensemble model, we’ve taken a significant step towards making AI predictions in IVF not only accurate but also trustworthy. This approach ensures that every prediction is accompanied by a quantifiable measure of confidence, paving the way for more informed clinical decisions.

In our study, calibration played a crucial role in enhancing the reliability of our ensemble neural network model. To achieve this, we employed the Conformal Prediction with CREPES framework [Henrik Bostr?m Proceedings of the Eleventh Symposium on Conformal and Probabilistic Prediction with Applications, PMLR 179:24-41, 2022]. This state-of-the-art approach is designed to ensure accurate probabilistic predictions while providing confidence intervals, addressing one of the most critical challenges in AI-driven clinical decision-making: “The package implements standard, normalized and Mondrian conformal regressors and predictive systems, and is completely model-agnostic, using only the residuals for the calibration instances, possibly together with difficulty estimates and Mondrian categories as input, when forming the conformal regressors and predictive systems”.

Neural Network Performance Highlights after CREPES calibration: Mean Squared Error (MSE): 0.1976; Maximum Calibration Error (MCE): 0.099; Expected Calibration Error (ECE): 0.067; Empirical Coverage: 0.94 (target confidence level: 0.95). The calibration performance highlights a high alignment between predicted probabilities and observed outcomes.

The calibration process aligns the predicted probabilities with observed frequencies, offering a more interpretable output for clinicians. Unlike traditional methods, CREPES provides prediction sets with explicit uncertainty estimates, making predictions more actionable. The calibrated model demonstrates minimal deviation in empirical coverage while preserving high predictive accuracy. With our pipeline we observed good results in our clinical practice, aligned with the model predictions: Calibrated CPR prediction: 51.66%; Actual CPR: 56.33%; Confidence Interval (95%): 47.86% – 55.46%. Our model achieved an empirical coverage of 0.94 for the confidence level of 0.95, demonstrating the robustness of the conformal prediction approach in IVF data.

We conducted that calibration and probability analysis in bad prognosis IVF treatment cycles with recurrent implantation failure that were performed in 2024 year. That data was not easy for analytics with Skewness = -1.88 - the distribution is negatively skewed, indicating a longer tail on the left side and Kurtosis = 10.34, meaning that distribution has heavy tails and a sharp peak compared to a normal.

Model Calibration with Conformal Prediction (CREPES)

In our study, calibration played a crucial role in enhancing the reliability of our ensemble neural network model. To achieve this, we employed the Conformal Prediction with CREPES framework. This state-of-the-art approach is designed to ensure accurate probabilistic predictions while providing confidence intervals, addressing one of the most critical challenges in AI-driven clinical decision-making.

The calibration process was performed using the WrapClassifier class from the CREPES repository. Below is an outline of the steps:

Wrapping the Model The trained ensemble model (combining Transformer and KAN) was wrapped using the WrapClassifier class to enable conformal prediction.

A dedicated calibration dataset (X_cal, y_cal) was used to refine the predicted probabilities.

Once calibrated, predictions on the test dataset (X_test) were obtained with improved probability distributions.

wrapped_model = WrapClassifier(ensemble_wrapper)
wrapped_model.calibrate(X_cal, y_cal)
calibrated_predictions = wrapped_model.predict_p(X_test)

from sklearn.metrics import brier_score_loss, log_loss

brier_score = brier_score_loss(y_test, calibrated_predictions[:, 1])
log_loss_score = log_loss(y_test, calibrated_predictions)

print(f"Brier Score: {brier_score}")
print(f"Log Loss: {log_loss_score}")
        

Predictions were complemented with confidence intervals to quantify uncertainty. Using a confidence level of 95%, we ensured robust coverage:

confidence = 0.95
prediction_sets = wrapped_model.predict_set(X_test, confidence=confidence)
coverage = np.mean([y_test[i] in set(np.where(prediction_sets[i])[0]) for i in range(len(y_test))])
print(f"Empirical coverage at {confidence} confidence level: {coverage}")
        

By integrating CREPES calibration into our ensemble model, we’ve taken a significant step towards making AI predictions in IVF not only accurate but also trustworthy. This approach ensures that every prediction is accompanied by a quantifiable measure of confidence, paving the way for more informed clinical decisions.

Descriptive Statistics for Calibrated Probability:

Count: 664 (number of observations); Mean: 0.2477 (SD = 0.0230) (average calibrated probability); Minimum: 0.1167 (lowest calibrated probability); 25th Percentile (Q1): 0.2392 (probability below which 25% of observations fall); Median (Q2): 0.2507 (middle value, indicating central tendency); 75th Percentile (Q3): 0.2607 (probability below which 75% of observations fall); Maximum: 0.3789 (highest calibrated probability)

These results suggest that most calibrated probabilities are centered around the mean with a slight left skew. There are a few extreme low values contributing to the skewness and high kurtosis.

Based on our calibrated model predictions, the average probability of achieving pregnancy per embryo transfer in our patient population was approximately 25% (mean = 0.25). This value represents the likelihood of a successful pregnancy resulting from a single embryo transfer cycle.

Cumulative Probability Analysis:

Given the cumulative nature of probabilities over multiple attempts, our analysis indicates the following probabilities for achieving pregnancy. Using the cumulative probability formula: Pcum= 1?(1?p)n,

where p = 0.25 (mean success probability per transfer from our model) and n is the number of transfers, we observe a significant increase in the likelihood of success with each additional transfer:

- After four embryo transfers: approximately 68%

- After five embryo transfers: approximately 76%


Implications for Clinical Practice: model validation and the QC of embryo transfer technique importants:

These findings emphasize the importance of ensuring the availability of at least four to five good-quality blastocysts for transfer to maximize the likelihood of a successful pregnancy in most patients. With a cumulative probability nearing 76% after five transfers, the recommendation underscores the necessity of robust laboratory protocols to achieve and maintain high-quality blastocyst development. Is it good enough or not?

Our conclusions are supported by the calibrated probabilities generated using the CREPES method, which ensures reliable probability estimates and confidence intervals. The method allows for confident predictions and actionable insights to guide individualized IVF treatment strategies. We compared our results with published paper: Gill, et al.,2024. Does recurrent implantation failure exist? [https://doi.org/10.1093/humrep/deae040]. The authors of it postulated: “The fourth and fifth euploid blastocyst transfers resulted in similar live birth rates of 40% and 53.3%, respectively, culminating in a cumulative live birth rate of 98.1% (95% CI = 96.5-99.6%) after five euploid blastocyst transfers”. The next reply for that study was: “Our findings suggest that ‘extraembryonic’ unexplained RIF may occur in <2% of patients, as five consecutive euploid transfers yielded a cumulative live birth rate of over 98% (Gill?et al., 2024). If a subsequent study shows that the sixth euploid transfer achieves a similar live birth rate as the first five, the true prevalence of unexplained RIF could be even <1%. If a seventh transfer achieves similar results, RIF prevalence might be <0.5%. Despite screening more than 123?000 patients from 25 clinics, we were unable to find an adequate number of patients who underwent a sixth euploid blastocyst transfer to reliably estimate its success rate. It seems we need even wider collaborations, as suggested by Dr Elzeiny.” [?https://doi.org/10.1093/humrep/deae185].

The same results we observed in our data but we were under that successful threshold limit that was conducted in cited research. 76% of CPR is not 98%... Our analysis estimates a cumulative probability of 76.3% after five transfers, assuming a 25% success rate per transfer. This lower value may reflect differences in patient selection, with our dataset representing a broader population rather than exclusively euploid transfers. Additionally, our probability estimates are derived from calibrated model predictions rather than observed clinical outcomes. It means that we need a strong and precise analysis of embryo transfer procedures in our clinic to be shore that we are doing our best in it and do not compromise results. But our Neural network model is robust also in small data samples compared to cited papers. Our use of the CREPES calibration method ensures robust probability estimates and provides confidence intervals for predicted outcomes. While Gill et al. rely on observed live birth data, our method enables probabilistic modeling across a diverse patient cohort, emphasizing generalizability but potentially underestimating outcomes in highly selected subpopulations. In our patient population, the calibrated probabilities suggest a higher threshold for guaranteed pregnancy, potentially indicating the presence of additional factors influencing success rates beyond embryo quality alone. This highlights the importance of considering non-embryonic factors, such as uterine receptivity and patient-specific clinical conditions and the importance of clinical quality management system beyond the embryology laboratory.

Moreover, obtained results are closer to our expectations about theoretically calculated numbers of oocytes and blastocysts in cryo programs for fertility preservation according to consensus benchmark values: Utilizing probability theory, to answer the question of how many oocytes are needed for development into high-quality blastocysts by day 5, we need to multiply the probabilities: 0.75 * 0.35 = 0.25. If we add the frequency of aneuploidies found in oocytes, in the 35-39 age group, we need to analyze a minimum of 3-4 blastocysts to ensure the presence of at least one euploid with 90% confidence interval… and with the help of our Neural network KAT model we can find that the possible implantation rate for PGT-A single euploid embryo transfer is about 79%, which is vary close to our cumulative implantation rate of 76% after 5-th transfer, established in that study.

Importance of Monitoring Clinical and Procedural Quality:

While our study demonstrates that cumulative pregnancy probabilities align with observed clinical outcomes in the broader IVF population, our findings also highlight the potential for improvement in specific cases. A critical aspect that emerges from this analysis is the necessity of monitoring the quality and consistency of clinical procedures during embryo transfer.

Our model, calibrated and trained on internal data, reflects the interplay between embryology laboratory performance and clinical practices. Despite achieving commendable cumulative pregnancy rates in the general cohort, the outcomes for complex cases remain suboptimal. This disparity suggests that factors beyond embryo quality—such as clinician technique during transfer, uterine preparation protocols, and individualized patient management—play a pivotal role in determining success rates.

Key considerations include:

Clinical Skill and Consistency: Embryo transfer is a delicate procedure requiring precision and consistency. Variability in technique across clinicians or insufficient adherence to best practices may contribute to lower-than-expected success rates in challenging cases.

Integration of KPI Monitoring: While KPI frameworks provide a robust measure of embryology performance, they lack direct insights into the clinical phase. Comprehensive monitoring systems that integrate laboratory KPIs with clinical execution metrics could provide a more holistic understanding of factors affecting outcomes.

Feedback Loops and Continuous Training: Establishing mechanisms for feedback and continuous professional development for clinicians involved in embryo transfer is essential. This ensures alignment with the latest advancements and adherence to standardized protocols.

Our findings suggest that even in clinics with high overall pregnancy rates and benchmarked KPIs, there is room to optimize success in complex cases by focusing on clinical quality control. Without evaluating the impact of clinical procedures, our predictive probabilities risk being interpreted in isolation, potentially overlooking critical opportunities for intervention.

So, we can conclude that we need to pay much more attention for our procedures because of our results are not the same as for big data published analysis across more than 25 IVF centers. But obtained results are in range of theoretically estimated with regression methods for our data. Newer the less, our KAT model helps us to understand that and point out the necessity of future improvement! This statement highlights the critical role of individualized protocols and laboratory excellence in IVF success. And by integrating advanced probabilistic modeling with clinical outcomes, we aim to enhance individualized treatment strategies and improve predictive accuracy for diverse IVF populations.


要查看或添加评论,请登录

Serdj Sergeev的更多文章

社区洞察

其他会员也浏览了