Why We Don't Use Time-Lapse in Our Clinic
Serdj Sergeev
Embryology lab Director, Neural network development for IVF data analytics
An important feature of our laboratory's work is the ability to make choices. Sometimes these choices are obvious: for instance, using methods with proven clinical effectiveness and safety, or utilizing only certified equipment and consumables. However, sometimes the choice isn't so clear-cut: whether to perform a biopsy for genetic testing, or which embryo to transfer for the highest probability of implantation and successful cycle completion. These choices aren't always easy to implement, which is why several approaches exist for evaluating an embryo's developmental potential.
For continued ontogenesis, an embryo needs to establish a connection with the mother's body. Our task is to transfer the most promising embryo to achieve pregnancy and the birth of a healthy child. While numerous factors influence whether this will occur, it's impossible to identify a single primary factor. ? Blastocyst morphology involves the presence of good quality trophectoderm cells on day 5 of development, from which the fetal portion of the placenta will form. These are the cells that embryologists describe when assessing morphology under the microscope. How an embryo appears during its development is recorded in the embryological protocol and is a necessary, though not always sufficient, condition for selecting it for transfer. From an appearance standpoint, it's possible to identify embryos with stage-specific and non-stage-specific development. When selecting embryos for transfer, preference is given to those that correspond to normal development based on their morphology. When discussing the correlation between such morphological assessment and transfer outcomes, reliable data has only been obtained for embryo evaluation using the SART classification, which distinguishes just three classes: "Good," "Fair," and "Poor." This means that what's important for subsequent implantation and development isn't the individual developmental anomalies detected in the laboratory, but rather an overall quality assessment of the embryo before transfer. Unfortunately, this gradation is highly subjective. ? Various studies have shown that selecting the best embryo for transfer based on its description at the blastocyst stage has relatively low accuracy - even with expert selection, the chance of correctly identifying the embryo most likely to implant doesn't exceed 60%.
This is where various algorithms for independent assessment of blastocyst images using computer programs, including artificial intelligence and computer vision methods, can assist embryologists. However, even these tools increase the accuracy of implantation prediction by no more than 10%, which is why it seems very interesting to supplement this approach with both clinical patient data and individual quality indicators obtained during embryo culture. Beyond discrete developmental stage descriptions, it's often necessary to consider the morphokinetic characteristics of embryo cleavage. This selection parameter includes two components: morphological assessment and specific time intervals of cell division. Using time-lapse imaging techniques, it's possible to track changes in cell distribution within the embryo, evaluate the nature of their interaction and spatial organization, and determine the cell cycle length of individual blastomeres. Currently, algorithms exist for embryo selection based on developmental kinetics, allowing embryologists to identify embryos recommended for priority transfer according to key parameters of various stage durations. The main time intervals for such assessment include pronuclear fusion timing, first, second, and third zygote division timing, duration of the cell cycle's synthetic phase, and timing of compaction and blastocyst cavity formation. However, returning to evidence-based approaches, we can see that only the final timing values of blastocyst cavity formation and complete blastocyst development can be used as reliable predictors of subsequent implantation. ?Time-lapse technologies are quite appealing in that they allow for detailed and complete documentation of embryo development without removing it from the incubator. Furthermore, such time-annotated individual development data enables the construction of mathematical models for predicting the implantation potential of specific embryos.
However, practice shows that expensive equipment and strict logical approaches don't always yield tangible results. Typically, patients are offered beautiful videos from such systems without explanation of their practical applications. While the time-lapse video sequence is indeed captivating, it should be considered only as a small pleasant bonus to the detailed information about the pre-implantation development of a specific embryo. We must understand the purpose for which it was created - obtaining detailed individual time intervals of developmental events in vitro, not just attractive images. Without a properly annotated dataset of obtained images, even the most promising time-lapse system cannot provide additional information to the embryologist making embryo selection decisions for transfer.
Moreover, artificial intelligence algorithms included in time-lapse systems are often oversimplified for commercial use in various clinics. The idea of incorporating modern computer technologies into image processing and analysis is valuable, but it must be understandable and interpretable. Time-lapse photographs themselves are very noisy compared to static images, which imposes significant limitations on working with them, and modern neural networks trained on them make quite a few errors, misclassifying approximately every fourth embryo. Nevertheless, such errors are simply ignored at international conferences, presentations, and publications, where all attention is focused on the most impressive metrics.
For ease of use and reproducibility of results, time-lapse system developers resort to one small but effective trick - designating model predictions through ranked outcome variables. However, what this embryo rank means in the context of its developmental prospects remains unclear. This system most closely resembles similar point-based assessments, like the grades we received during our school education. As we all know, straight-A students aren't always the smartest in class, and C students might turn out to be geniuses in the future. History has numerous examples of discrepancies between ranks and grades in academic records and a person's subsequent life path. The same applies to this embryo classification system. While universal, it's far from the best option for selection. ?The question of embryo selection using time-lapse systems becomes even more ambiguous in cases where we simply don't have enough embryos to implement it. In other words, it's not entirely clear for which patient groups this technology is truly worth using.
On one hand, to increase IVF program effectiveness, it would seem logical to offer additional selection of the most promising embryos when choosing the most viable blastocyst for implantation from multiple options. In such cases, it's difficult to demonstrate the advantages of this methodology to patients since they already have relatively high success chances without it. In case of failure, it's always possible to repeat the transfer with remaining cryopreserved embryos, which proves more cost-effective than paying for time-lapse culture system. ?Therefore, typically, such patients aren't particularly interested in additional methods of embryo analysis. In the opposite situation - in complex cases with a small chance of positive results - time-lapse could genuinely help. However, these protocols usually yield only a few blastocysts, in other words, there's simply not much to choose from. And although "difficult" patients could actually benefit from this methodology, its implementation in routine practice depends more on the clinic's marketing policy rather than the real expediency of additional selection and ranking of embryos for transfer.
In the vast majority of cases, when analyzing data in IVF protocols, we aim to assess the chances of program success. The success of a specific cycle is typically defined by the occurrence of clinical pregnancy after embryo transfer. Such assessment is valuable not only for predicting transfer outcomes in individual protocols but also plays a special role as a quality control tool. Research published in the field of AI in IVF focuses primarily on this predictive capability of algorithms based on patients' clinical data or morphokinetic assessment of individual embryo development. This approach has several limitations as it doesn't account for changes in laboratory parameters and can only be used as a tool for detecting deviations from optimal values over a relatively long time period. Obviously, transfer outcomes are determined by numerous factors that commercially available models don't consider. Only recently have comprehensive systems been developed that combine patients' initial clinical data with embryo morphology or its morphokinetic developmental history. Analysis of predictions from such algorithms results in inflated expected protocol success probabilities, which isn't surprising given their exclusion of the laboratory implementation aspect of the program.
The main challenge with modern machine learning systems in medicine is the interpretability of their functional results. Models that find practical application, as opposed to those remaining confined to published articles, are distinguished by their ability to provide understandable final prognoses without additional abstract methods. When considering AI as an assistant in making important decisions, including patient consultations, we need such programs to provide data that we can verify, explain, and visually present to patients, rather than incomprehensible mathematical values that only add confusion to our work. Perhaps the most common question encountered in such consultations concerns the probability, or chances of IVF program success. Unfortunately, in most currently available algorithms for assessing such probability, we can only rely on some ephemeral output variable in the form of a scored evaluation for transferring a specific embryo. Certainly, numerous studies exist examining this value's correlation with pregnancy probability or obtaining transferable embryos with certain significance levels. However, this hardly simplifies our task of providing patients with information about their protocol success chances. ?Most modern AI systems for predicting protocol success utilize rank or score-based final assessments as the foundation for their primary prediction model. For example, in TL incubators, this involves ranking embryos according to their total morphokinetic parameters: KID'sScore?, IDAScore?, and various modifications depending on the system manufacturer.
Similar ranking values were possible to identify in KPI indicators of individual culture protocols. In our case, we used the total KPIScore assessment, representing the final sum of laboratory and clinical KPIs, validated for patients in our dataset. Thanks to this approach, we not only supplemented the model with another resulting parameter but also ensured verification of its predictions based on KPIScore distributions among different patient groups. ?To ensure accurate probabilistic predictions, we used the Conformal Prediction system (CREPES). This approach allowed us not only to obtain calibrated probabilities but also to quantitatively assess uncertainty using a reliable 95% confidence interval. By integrating CREPES calibration into our model, we made a significant step toward making AI predictions in IVF not only accurate but also reliable.
领英推荐
This approach ensures that each prediction is accompanied by a measurable confidence indicator, paving the way for more informed decision-making. ?As a result, we conducted a detailed comparison of the developed neural network with analogues and obtained quality metrics that surpass them and are close to the accuracy of modern time-lapse systems - but without time-lapse, imajes and expensive equipment. Using recurrent-type neurons in our neural network structure helped achieve similar quality metrics compared to convolutional ultra-precise neural networks (CNN) that form the basis of modern AI time-lapse systems (ImageNet). For comparison, literature data from three such CNN architectures were selected: VGG16, ResNet50, and DenseNet121. Of all quality metrics, only sensitivity proved higher in these than in our deep learning model (DNN).
This was caused by the use of real blastocyst images in CNN models; therefore, embryos that didn't develop to the blastocyst stage were excluded from their observations and, consequently, didn't participate in classification. ?To verify prediction completeness in the test sample, 1,600 protocols were analyzed using embryo selection with PGT-A. This demonstrated the functionality of the neural network analysis algorithm (AUC: 0.67 – 0.75) and allowed comparison of its accuracy with available logistic regression models and other machine learning approaches (AUC: 0.62 – 0.64) currently offered as commercial solutions for assessing pregnancy probability in protocols, as well as other published neural network-based solutions (AUC: 0.63 – 0.74).
The model's average error in predicting clinical pregnancy occurrence was 18%. High accuracy in predicting embryo transfer results using our model was demonstrated during cross-validation (accuracy: 78% - 87%). It turned out that the neural network model surpasses traditional machine learning prediction models in correct class separation for clinical pregnancy occurrence (OR = 6.66). Moreover, the developed neural network has an AUC-ROC metric (0.68 - 0.73) comparable to time-lapse system models KIDScore?, IDAScore? V.2 (0.67) Embryoscope, Life Whisperer (0.65) Irvine Scientific for embryo transfer based on their morphokinetic characteristics and better Precision-Recall metrics for describing pregnancy chances.
When comparing the neural network model with Eeva? algorithm metrics (AUC: 0.53 - 0.61) for individual parameters (Aivf) and AUC: 0.64 for combined ones, and comparing with GERI AI? (AUC: 0.61), MIRI AI model (AUC: 0.69) and STEM (AUC: 0.77), comparable accuracy and completeness in class description were demonstrated. Similar data were obtained when comparing the developed model with other neural network solutions used in IVF, including ALIFE health artificial intelligence model (ROC-AUC: 0.62 – 0.64), Fairtility artificial intelligence model (ROC-AUC: 0.68 – 0.70), CLOE-EQ? (ROC-AUC: 0.63 – 0.72) and combined neural network models IVF2.0? (AUC = 0.72 – 0.78).
It's worth noting an important advantage of our model - such a neural network can be further trained on data from any new clinic, and the result of this process is the creation of a powerful tool personalized to the subpopulation of specific patients and specific laboratory for evaluating the success of each cycle completed with transfer.
A comprehensive analysis of the effectiveness of various neural network models developed within our "from in vitro to in silico" project for IVF outcome prediction revealed the following patterns in their performance and applicability. When tested on a balanced dataset, the DNN demonstrated stable, albeit moderate, performance indicators with a classification accuracy (CA) of 0.70, AUC = 0.74, and precision-recall curve (PRC) of 0.64. The specific precision and recall metrics were 0.47 and 0.44 respectively, indicating a certain conservatism in the model's prediction of positive outcomes.
In turn, the Kolmogorov-Arnold network demonstrated a somewhat different pattern of results: with slightly lower CA (0.68), it showed improved discriminative ability with AUC = 0.76. Particularly noteworthy in the obtained KAN model was the substantial improvement in precision and recall indicators, reaching 0.61 and 0.62 respectively, twice exceeding those of time-lapse systems, which indicates a significantly higher ability of the model to identify and correctly classify successful IVF program cases resulting in pregnancy. ?
The application of various integrated architectures in the metamodel structure allowed for significantly improved prediction accuracy. Calibration of the model's output data using the conformal prediction method demonstrated high effectiveness, as evidenced by the obtained odds ratio of 6.01 with a standard deviation of 0.65 (compare with TL OR = 1,811, CI = 1.666-1.976). Comparative analysis of the ensemble model with neural network transformers showed equivalent quality metrics between them, while leaving the right to better interpretability of results to the DNN-KAN architecture. ?Comparison of the obtained metamodel with other neural network models for IVF prediction showed that it demonstrates the same performance indicators (p = 0.471) as other artificial intelligence-based solutions (AUC = 0.62–0.77) and has similar metrics to time-lapse analytical systems using additional clinical data for implantation predictions (AUC = 0.72-0.78); convolutional neural network models for static images (AUC = 0.74), including new-generation neural networks based on genetic algorithms (AUC = 0.77); and complex CNN+MLP models based on multimodal assessment of blastocysts considering maternal age, transfer day, antral follicle count, number of retrieved oocytes, and endometrial thickness (AUC = 0.75–0.79).
In conclusion, our work illustrates the transformative potential of neural networks in IVF that are may be not only a time-lapse technologies, emphasizing the feasibility of integrating alternative architectures to enhance prediction accuracy, reduce costs, and increase interpretability. By combining clinical, laboratory, and morphokinetic data, these models represent a paradigm shift toward more precise and patient-centric ART outcomes. Future research should continue exploring the synergy between novel neural network designs and emerging technologies to refine embryo selection strategies, ultimately increasing the likelihood of successful pregnancies and healthy births.