The Shortcomings of Regression Analysis in Modern IVF Research
Serdj Sergeev
Embryology lab Director, Neural network development for IVF data analytics
Application of Regression Methods to Predict Key IVF Outcomes
We applied multiple linear regression methods with (OLS) to predict key outcomes in assisted reproductive technology (ART), including the total number of good-quality blastocysts (TGBDR) and the number of mature oocytes (MII). The analysis utilized clinical and stimulation protocol data, such as patient age, anti-Müllerian hormone (AMH) levels, follicle-stimulating hormone (FSH) dosage, and the duration of ovarian stimulation. While regression models provided initial insights into relationships between predictors and outcomes, their limitations highlighted the need for advanced approaches to better capture the complexity of the data.
For the prediction of TGBDR, a multiple regression model revealed a weak association between predictors and the target variable. The coefficients indicated minor effects: the cumulative FSH dose had a small negative coefficient (-0.00005), suggesting a negligible decrease in TGBDR with increasing FSH dosage. Similarly, time-related variables, including the start day of the protocol (-0.0056), start day of stimulation (-0.0052), and duration of stimulation (-0.0093), showed minimal negative influences. The only variable with a slight positive association was the day of oocyte retrieval (0.0010). Despite these findings, the coefficient of determination (\(R^2\)) was -0.02, indicating that the model explained none of the variance in the target variable. Furthermore, the mean squared error (MSE) was 0.0939, and the Pearson correlation coefficient between predicted and actual values was 0.13 (p = 0.4325), confirming the absence of a statistically significant correlation.
The second regression model evaluated stimulation parameters and their association with ART outcomes. Here, the cumulative FSH dose exhibited a slight positive effect (coefficient: 0.00083), while variables such as the duration of stimulation (-0.1427) and the day of oocyte retrieval (-0.1824) had small negative coefficients. These results aligned with the expected trends but failed to establish meaningful relationships. The \(R^2\) value of -0.04 and an MSE of 15.88 further emphasized the model's inability to describe the data adequately. The Pearson correlation coefficient of 0.12 (p = 0.4659) was again statistically insignificant, indicating a weak and unreliable relationship between predictors and outcomes.
To predict the number of mature oocytes (MII), regression analysis incorporated patient characteristics such as age, AMH, FSH, and the number of previous ART attempts. Among these, AMH demonstrated a modest positive effect (coefficient: 0.2468), consistent with its established role as a marker of ovarian reserve. In contrast, FSH levels had a slight negative impact (-0.0334), while age showed no influence (coefficient: 0.0), and the number of ART attempts (-0.0834) showed a weak negative relationship. Despite these trends, the model’s \(R^2\) value was 0.00, indicating no explanatory power. The MSE was 15.20, and the Pearson correlation coefficient remained low at 0.12 (p = 0.4667), reinforcing the model's limited predictive capability.
Oocyte Cohort Analysis
This study explores the predictive factors influencing implantation success in in-vitro fertilization (IVF) protocols. Using logistic regression, we compared two approaches: one employing a binary outcome variable for implantation success and another utilizing probabilities predicted by a neural network. Our findings highlight the role of oocyte cohort size while demonstrating differences in model interpretability and performance.
The regression model using neural network-predicted probabilities demonstrated weaker performance, with no significant predictors and negative pseudo R-squared values. This outcome may reflect:
Studies frequently highlight the role of blastocyst formation and quality in implantation potential. For example, prior work has demonstrated that higher-quality blastocysts correlate with increased success rates. In this study, neither binary logistic regression nor neural network-based analysis replicated this finding, suggesting dataset-specific variability or insufficient sample size to detect smaller effects.
Model Performance: Pseudo R-squared: 0.01295, Log-Likelihood: -449.03,LLR p-value: 0.01906
Significant Predictors (p < 0.05): Oocyte cohort size (OCC): OR = 1.026, p = 0.007, Non-significant Predictors: Patient age (p = 0.213), Blastocyst formation rate (p = 0.687), Good-quality blastocyst formation rate (p = 0.562)
Average implantation probabilities for large and small oocyte cohorts: Large cohort: 57.95%, Small cohort: 48.72%
Comprehensive Evaluation of the Predictive Value of Pronuclear Morphological Patterns
In this study, we explored the use of linear regression, PCA, k-means, t-SNE (t-distributed Stochastic Neighbor Embedding) and Kolmogorov-Arnold networks for visualizing high-dimensional data, focusing on its ability to handle complex nonlinear dependencies and high collinearity. By applying t-SNE with three components, we successfully visualized pronuclear morphology patterns in 3D space, confirming the accuracy of PCA classification and the need for a combined approach to zygote evaluation. Linear regression and clustering analysis offered complementary insights into the assessment of zygotes using the Z Score, but most categories showed no significant correlation with embryo implantation, suggesting the limited prognostic value of this scale in IVF contexts.
To improve outcome prediction, we developed an ensemble model consisting of a deep recurrent neural network and a Kolmogorov-Arnold neural network (KAN), integrated with Bayesian methods to create a probabilistic framework for implantation prediction. The model accurately reflected implantation dynamics, yielding an average predicted implantation probability of 0.39, which closely matched the actual pregnancy rate of 42.70%.
领英推荐
Our findings indicate that the standard Z Score system may overestimate implantation potential in zygotes with higher scores (Z1, Z2, Z3) and underestimate it in zygotes with lower scores (Z5). Based on these observations, we propose consolidating the Z Score categories into two groups—one for promising zygotes (Z1, Z2, Z3) and another for less promising ones (Z4, Z5)—which better aligns with both predicted and actual clinical outcomes. This refined classification system improves the accuracy of neural network models and aligns with other research findings, highlighting its potential for enhancing IVF outcome predictions.
The application of a neural network model in conjunction with clustering significantly enhanced the predictive capability of our analysis. Neural networks, by design, can handle complex, non-linear relationships in datasets, making them ideal for modeling multifactorial processes such as embryo development and pregnancy success.
Our neural network model was able to predict pregnancy probabilities for each cluster with notable accuracy, offering deeper insights into the relationship between embryo quality and pregnancy outcomes. And we implemented it for Z Score system. Importantly, the combination of clustering and neural networks allowed for both segmentation and prediction, bridging the gap between understanding patterns in the data and making accurate predictions. The flexibility of our neural network model to adapt to different clusters provides a robust framework for personalized IVF treatment strategies, aligning with the goal of improving clinical decision-making and increasing success rates.
Multifactorial ANOVA revealed significant differences in fertilization rates across the Z Score groups (F-statistic = 107.831, p < 0.00001), but there were no significant differences in blastocyst and high-quality blastocyst formation rates (F-statistic = 1.332 and 1.544, p > 0.05, respectively). Notably, positive correlations between the number of zygotes in the Z2 and Z3 categories and blastocyst formation were identified, with correlation coefficients of 0.4717 for Z2 and 0.620 for Z3. These correlations remained significant for high-quality blastocysts, with coefficients of 0.373 for Z2 and 0.510 for Z3. Categories Z1, Z4, and Z5 showed weaker, statistically insignificant correlations with blastocyst formation (p > 0.05).
Statistical significance of these correlations was confirmed using Pearson's correlation coefficient, with the strongest correlations observed between the number of Z2 zygotes and blastocyst count (r = 0.908, p < 0.0001), as well as between Z3 zygotes and Z3 blastocysts (r = 0.897, p < 0.0001). Interestingly, we found a negative correlation between Z3 zygotes and Z2 blastocysts (r = -0.223, p < 0.0001), and a similar negative trend between age and Z5 (BN/SN) zygote count (r = -0.087, p = 0.017).
Quantitative measures of MII and OCC showed moderate positive correlations with the number of Z2 zygotes (r = 0.494 and r = 0.452, respectively, p < 0.0001), emphasizing the importance of mature oocytes for fertilization competence. Correlations between the number of implanted embryos and zygote categories were weak but statistically significant for Z2 (r = 0.134, p < 0.0001) and Z3 (r = -0.138, p < 0.0001), indicating the limited predictive value of pronuclear morphology alone for implantation outcomes.
A multiple linear regression analysis, which evaluated the relationship between Z Score categories and the number of high-quality blastocysts, revealed a variable degree of explanatory power. The highest R2 was observed for Z3 (0.264), suggesting that this category explains 26.4% of the variability in the number of high-quality blastocysts. In contrast, Z2 had a lower R2 of 0.140, while Z4 and Z5 (BN/SN) showed minimal predictive value with R2 values of 0.019 and 0.036, respectively.
The regression analysis coefficients also highlighted weak to moderate positive relationships between the number of zygotes and the formation of high-quality blastocysts, especially for Z2 and Z3 (0.114 and 0.145, respectively). However, Z4 and Z5 (BN/SN) demonstrated weak negative coefficients (-0.006 and -0.013), suggesting a slight negative correlation, though these results should be interpreted cautiously due to the low R2 values.
The regression analysis also showed that the Z Score’s morphological assessment had limited predictive value for embryo implantation. This finding highlights the complexity of predicting IVF outcomes based solely on zygote morphology, suggesting that more comprehensive models are required. Future research should explore non-linear relationships between zygote morphology and implantation success, using advanced machine learning models like XGBoost and neural networks to capture these dynamics.
Conclusion
Logistic regression assumes a linear relationship between independent variables and the log odds of the outcome. This assumption may oversimplify the intricate, non-linear relationships inherent in IVF data, such as the multifaceted interactions between embryological parameters, patient demographics, and implantation outcomes. In IVF, many biological processes, such as oocyte maturation, blastocyst development, and implantation, follow non-linear dynamics. Regression models often struggle to generalize well to such phenomena, leading to underperforming predictions or overlooked relationships.
The robustness of logistic regression heavily depends on the quality, distribution, and variability of the input data. In heterogeneous datasets, which are common in IVF due to variations in patient populations, treatment protocols, and laboratory practices, the method may yield biased or unreliable results.
While regression analysis offers interpretable coefficients and insights into variable relationships, its linear nature and sensitivity to multicollinearity make it less suited for multidimensional and nonlinear datasets commonly encountered in ART research. For instance, multicollinearity among stimulation parameters may obscure the true impact of individual predictors, further limiting the model's utility. Variability in patient characteristics, stimulation protocols, and laboratory conditions reduces model performance. And linear regression assumes additive relationships, potentially oversimplifying complex interactions among predictors.
Given these limitations, the use of logistic regression in IVF research should be approached with caution. To address these challenges, future research will explore machine learning approaches such as neural networks, gradient boosting, or transformer-based architectures, which excel at handling nonlinearity and high-dimensional feature spaces. These advanced methods may uncover hidden patterns and provide more accurate predictions, aligning with the increasing complexity of ART datasets. Additionally, incorporating dimensionality reduction techniques, such as principal component analysis (PCA), may help mitigate multicollinearity and improve model performance.
?