Dissolution Profile Comparisons with Bootstrap BCAF2 - a stats explainer with JMP
Chandramouli R
Global Technical Enablement Engineer at JMP | Driving Innovation in Pharma, Healthcare, and Life Sciences through Advanced Data Solutions
Dissolution testing stands at the core of pharmaceutical product evaluation, bridging formulation characteristics and eventual in vivo performance. One central question in these tests is whether two dissolution profils often representing a “test” product and a “reference” (RLD) product are sufficiently similar to infer therapeutic equivalence or to waive additional clinical studies.
For several decades, the similarity factor f2 has served as a cornerstone for making this determination. However, f2 in its conventional form lacks a robust inferential framework, raising concerns about its reliability, particularly when data sets are small or highly variable. To address these limitations, researchers have introduced Bootstrap-Based BCAF2 (Bias-Corrected and Accelerated f2), which combines nonparametric statistical resampling with the widely recognized f2 metric. This blog provides a comprehensive technical overview of BCAF2, illustrating how it refines dissolution profile comparison by delivering confidence intervals, controlling error rates, and offering a more reliable basis for determining similarity.
1. Introduction
In the domain of solid oral dosage forms, dissolution testing is a critical quality control tool. Regulatory agencies such as the U.S. Food and Drug Administration (FDA) and the European Medicines Agency (EMA) often allow a formulation to waive clinical bioequivalence (BE) studies particularly for certain strengths of an approved product if it can demonstrate comparable dissolution to an established reference (U.S. FDA, 1997; European Medicines Agency [EMA], 2010). As an industry-standard approach, the f2 statistic summarizes the squared average differences between percentage dissolved values of test and reference products across multiple time points (Moore & Flanner, 1996).
Conventional f2 is easy to calculate and interpret. An f2 value of 100 indicates perfect agreement between profiles, whereas f2 ≥ 50 implies that the average difference at each time point does not exceed 10% (Polli et al., 1997). Despite its regulatory popularity, several shortcomings have been noted:
A modern consensus recognizes the need for more robust tools that retain f2’s intuitive appeal yet address its vulnerability to bias and variability (Chow & Ki, 2001; O’Hara et al., 1998). Bootstrap-Based BCAF2 emerges as such a method, leveraging nonparametric resampling to generate confidence intervals for the f2 statistic. By introducing bias correction and acceleration, it adjusts for skewness and median bias in the sampling distribution of f2 (Efron & Tibshirani, 1993). This article explores BCAF2’s theoretical basis, practical applications, and its ability that has reshaped regulatory acceptance for dissolution profile comparisons.
2. Concept of Bootstrap-Based BCAF2
2.1 The Bootstrap Resampling Framework
The bootstrap is a versatile statistical technique particularly useful when the exact distribution of a statistic is unknown or intractable. In dissolution testing, it involves sampling with replacement from the pooled or paired data points that constitute the test and reference profiles. For each bootstrap sample, one recalculates f2, effectively building an empirical distribution of f2 estimates that reflect the variability in the observed data (Noce et al., 2020). Commonly, thousands of bootstrap replicates are generated (e.g., 1,000 to 5,000 or even more), yielding a rich distribution that can then be used to construct confidence intervals.
2.2 The Bias-Corrected and Accelerated (BCA) Method
While the percentile bootstrap interval is straightforward, it often proves conservative. A more refined approach is the Bias-Corrected and Accelerated (BCA) bootstrap. BCA fine-tunes the percentile boundaries by two parameters:
The end result is a confidence interval that more closely matches the nominal level. When applying BCA to f2, the adjustments correct for the small-sample biases and skewed distributions that can arise with complex, bounded dissolution data (Boddu et al., 2024).
2.3 Core Advantages of BCAF2
3. Statistical Foundations of BCAF2
3.1 The f2 Metric
Originally proposed by Moore and Flanner (1996), the similarity factor f2 is a transformation of the root-mean-squared difference between the mean dissolution profiles of a test and reference product across n sampling times:
If the profiles match perfectly, the sum of squared differences is 0, yielding an f2 of 100. When the difference grows, f2 decreases, with 50 demarcating a widely accepted similarity threshold (Shah et al., 1998; Polli et al., 1997). Regulators have historically used f2 ≥ 50 to avoid further clinical testing for formulations that satisfy additional criteria (EMA, 2010).
3.2 Constructing a Confidence Interval for f2
A key challenge is that f2 is a non-linear function of sample averages, and its distribution does not conform nicely to classic parametric assumptions. The bootstrap circumvented this issue by:
Mathematically, the lower and upper limits of a 90% confidence interval for f2 come from percentiles of the sorted bootstrap distribution, shifted by z? and scaled by a factor incorporating acceleration (Efron & Tibshirani, 1993; Xu et al., 2021). Thus, BCAF2 ensures the intervals are neither overly conservative nor too lenient.
3.3 Hypothesis Testing Perspective
Using a 90% confidence interval in BCAF2 effectively implements a one-sided test at the 5% significance level. The null hypothesis is that the true f2 is below 50 (i.e., the profiles are not similar). If the entire CI lies above 50, the null is rejected, supporting similarity at 95% confidence (Islam & Begum, 2018). This framework mirrors common bioequivalence testing, where 90% confidence intervals are used to decide equivalence for pharmacokinetic metrics. The analogy improves regulatory comfort and aligns dissolution testing with well-established statistical norms in drug approvals.
3.4 Power and Error Control
Simulation studies have consistently shown that conventional f2 does not reliably maintain a 5% type I error rate. In some cases, the chance of wrongly concluding similarity (“false positive”) can exceed 15–20%, undermining confidence (Liu et al., 2024; Hoffelder, 2019). Conversely, a percentile bootstrap approach might be overly cautious, resulting in too many false negatives. BCAF2’s bias correction and acceleration often strike a superior balance between type I error control and test power (i.e., the ability to detect true similarity), outperforming simpler alternatives (Boddu et al., 2024).
4. Comparative Analysis: BCAF2 vs. Other Methods
4.1 Conventional f2
Strengths: Simplicity, historical acceptance, and ease of interpretability. Weaknesses: No confidence interval or error-rate control, inconsistent performance under small samples or high variability (Shah et al., 1998).
In borderline cases where f2 hovers around 50 traditional methods provide limited insight. For instance, an observed f2 of 49 or 51 might have overlapping uncertainty, but the conventional rule could yield drastically different regulatory outcomes. BCAF2, however, quantifies that uncertainty, guiding decisions through a confidence interval rather than a binary cutoff (Noce et al., 2020).
4.2 Simple Percentile Bootstrap
Before BCA intervals grew prevalent, a straightforward percentile bootstrap for f2 was investigated. While a percentile interval is undeniably more rigorous than a point estimate criterion, it can be excessively conservative in real-world data scenarios (In vitro dissolution profile comparison using bootstrap bias corrected similarity factor, f2 - PubMed; Liu et al., 2024). As a result, it may demand more samples or cause potential rejections of genuinely similar products. BCAF2 corrects these biases, delivering intervals that align closer to nominal coverage, thus mitigating over-conservatism (Xu et al., 2021).
4.3 Multivariate ANOVA or General Linear Models
Some researchers have attempted to address profile differences by employing repeated-measures ANOVA or other general linear models (Yüksel et al., 2000). While these methods can test for significant differences across time points, they lack a single, intuitive summary metric akin to f2 (Costa, 2001). They also often rest on assumptions about normality and homoscedasticity that real dissolution data may violate. Consequently, their regulatory acceptance remains limited compared to the widely recognized f2 (Stevens et al., 2015).
4.4 Mahalanobis Distance Approaches
Hotelling’s T2 or Mahalanobis distance methods interpret dissolution profiles as vectors in multivariate space (Hoffelder, 2019). Equivalence testing can be done if one specifies an allowable “distance” threshold, but establishing that threshold to match the intuitive “±10% difference” rule is non-trivial (Collignon et al., 2019). High within-batch variability also complicates setting a stable acceptance region. Regulators have expressed hesitancy, noting that large variance can paradoxically make passing easier by inflating the covariance matrix (EMA, 2018). BCAF2 bypasses these complexities by staying in the f2 framework and penalizing high variance with wider confidence intervals.
4.5 Other Similarity Metrics (e.g., PCA or Euclidean Distance)
Principal component analysis, various distance measures, or kinetic modeling have also been proposed (Paix?o et al., 2017; Saranadasa & Krishnamoorthy, 2005). Although theoretically sound, they often lack the regulatory familiarity and straightforward acceptance criteria that f2 provides. BCAF2 is viewed favorably because it refines f2 rather than replacing it, minimizing disruption to existing guidelines while adding statistical rigor (Zhang et al., 2010).
5. Practical Applications and Regulatory Landscape
5.1 Use in Post-Approval Changes and SUPAC
When manufacturers implement post-approval changes (e.g., a shift in manufacturing site or excipient composition), regulators require evidence that the new product’s dissolution profile remains similar to the original. Conventional f2 works fine if data variability is minimal and sample sizes meet the recommended threshold (e.g., 12 units) (FDA, 1997). However, real-world data often violate these assumptions. By applying BCAF2:
Regulators have shown interest in these bootstrap methods precisely because they can handle scenarios outside the rigid scope of conventional f2. Some agencies explicitly reference or allow a bootstrap-based approach for products with elevated variability (Noce et al., 2020).
5.2 Biowaivers for Lower Strengths
For immediate-release or modified-release products with multiple strengths, BCAF2 can confirm that the dissolution profiles of lower strengths match those of higher strengths. Demonstrating profile similarity via BCAF2 can eliminate the need for clinical BE studies on every strength, speeding development timelines without sacrificing confidence in equivalence (European Medicines Agency, 2010).
5.3 Transdermal and Non-Oral Products
Although historically associated with oral products, the concept extends to patches, ointments, or other dosage forms where release profiles can be measured over time. Regulatory bodies sometimes allow in vitro release testing in place of in vivo BE for certain locally acting products. BCAF2’s enhanced reliability suits these circumstances, especially when data sets are limited or noise levels are high (Stevens et al., 2015).
5.4 Regulatory Acceptance and Software
Though not yet universally mandated, acceptance of bootstrap methods is increasing. The EMA has indicated a preference for bootstrap-based confidence intervals for f2 in high-variability contexts (EMA, 2018). The FDA also remains open to alternative statistical methods that bring a sound rationale (LeBlond et al., 2016). Various open-source and commercial software packages support BCAF2, including DDSolver, bootf2, and specialized scripts in R, SAS, and JMP (Zhang et al., 2010; Noce et al., 2020). Sponsors have used these tools for data analyses in regulatory submissions, provided they validate the software and thoroughly document the methodology.
6. Future Directions and Emerging Challenges
6.1 Computational Advancements
While bootstrap analyses are already computationally feasible, rising computational power enables more complex procedures, such as multi-level or stratified bootstrapping across multiple batches. Future expansions might incorporate Bayesian or machine learning elements. For instance, real-time dissolution monitoring could continuously feed data into a streaming bootstrap procedure to decide whether a batch meets similarity criteria early in manufacturing (Kaity et al., 2023).
6.2 Integration with Modeling and Machine Learning
AI-driven pattern recognition could flag when two profiles are likely similar, then refine the final decision via BCAF2. Machine learning might also help optimize how many bootstrap iterations to run or identify outliers in real time (Wang et al., 2016). Nonetheless, any black-box ML solution would need to be transparent enough to satisfy regulatory scrutiny and maintain the clarity that f2 currently offers.
6.3 Method Harmonization
Globally, regulators are not entirely aligned on dissolution requirements (Milanovic et al., 2021). Various thresholds, sampling times, and acceptance cutoffs persist. An ICH-led initiative could one day harmonize these differences, offering a standardized bootstrap-based approach for dissolution profile comparison. This would reduce duplicative testing and conflicting requirements across regions (Chow & Ki, 2001).
6.4 Multi-Product Comparisons
In some settings, multiple test products may need to be compared to a single reference especially in large-scale generic drug manufacturing. Although conceptually possible, simultaneously applying BCAF2 to multiple profiles calls for careful multiple-comparison adjustments or advanced experimental designs that remain an area of active research (Boddu et al., 2024).
6.5 In Vivo Correlation
Although in vitro/in vivo correlation (IVIVC) lies beyond the immediate scope of f2, robust in vitro comparisons can enhance confidence in or even supplant certain clinical studies. Future research could combine BCAF2-based similarity assessments with physiological or mechanistic modeling to predict if observed in vitro similarity reliably translates to bioequivalence (Rescigno, 1992).
7. Conclusion
Bootstrap-Based BCAF2 has emerged as a highly effective means to address the limitations of the conventional f2 metric. It confers the following advantages:
Across simulations and case studies, BCAF2 has shown stronger power to declare true similarity while safeguarding against false positives (Liu et al., 2024; Xu et al., 2021). Regulatory agencies have increasingly recognized the value of bootstrap approaches, referencing them in guidance for high-variability data. As computational resources become more abundant and advanced statistical tools gain traction, BCAF2 stands poised to become a mainstay in future dissolution profile comparisons. This evolution aligns seamlessly with the broader shift toward science-based, statistically robust evaluations in pharmaceutical development, ultimately enhancing the confidence in product quality and the safety of patients worldwide.
References
Boddu, R., Kollipara, S., Bhattiprolu, A. K., Parsa, K., Chakilam, S. K., Daka, K. R., Bhatia, A., & Ahmed, T. (2024). Dissolution profiles comparison using conventional and bias corrected and accelerated f2 bootstrap approaches with different softwares: Impact of variability, sample size and number of bootstraps. AAPS PharmSciTech, 25(1), 5. https://doi.org/10.1208/s12249-023-02710-9
Chow, S. C., & Ki, F. (2001). Statistical comparison between dissolution profiles through Hotelling’s T2 statistic. Pharmaceutical Technology, 25(2), 46–54.
Collignon, O., Moellenhoff, K., & Dette, H. (2019). Equivalence analyses of dissolution profiles with the Mahalanobis distance: A regulatory perspective and a comparison with a parametric maximum deviation-based approach. Biometrical Journal, 61(3), 779–782. https://doi.org/10.1002/bimj.201800325
Costa, P. (2001). An alternative method to the evaluation of similarity factor in dissolution testing. International Journal of Pharmaceutics, 220(1-2), 77–83. https://doi.org/10.1016/S0378-5173(01)00651-2
Efron, B. (1987). Better bootstrap confidence intervals. Journal of the American Statistical Association, 82(397), 171–185. https://doi.org/10.2307/2289144
Efron, B., & Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Chapman & Hall/CRC.
European Medicines Agency (EMA). (2010). Guideline on the Investigation of Bioequivalence (CPMP/EWP/QWP/1401/98 Rev.1). London: EMA.
European Medicines Agency (EMA). (2018). Question & Answer: The adequacy of the Mahalanobis distance to assess the comparability of dissolution profiles. EMA/CHMP (London).
FDA. (1997). Guidance for Industry: Dissolution Testing of Immediate Release Solid Oral Dosage Forms. U.S. Department of Health and Human Services.
Hoffelder, T. (2019). Equivalence analyses of dissolution profiles with the Mahalanobis distance. Biometrical Journal, 61(5), 1120–1137. https://doi.org/10.1002/bimj.201700257
Islam, M. M., & Begum, M. (2018). Bootstrap confidence intervals for dissolution similarity factor f2. Biometrics & Biostatistics International Journal, 7(6), 397–403. https://doi.org/10.15406/bbij.2018.07.00237
Kaity, S., Sah, S. K., Karanwad, T., & Banerjee, S. (2023). Bootstrap statistics and its application in disintegration and dissolution data analysis. Molecular Pharmaceutics, 20(8), 3791–3803. https://doi.org/10.1021/acs.molpharmaceut.3c00222
LeBlond, D., Altan, S., Novick, S., Peterson, J., Shen, Y., & Yang, H. (2016). In vitro dissolution curve comparisons: A critique of current practice. Dissolution Technology, 23(1), 14–21. https://doi.org/10.14227/DT230116P14
Liu, S., Zhang, J., Cao, Y., & Chow, S. C. (2024). In vitro dissolution profile comparison using bootstrap bias-corrected similarity factor, f2. Journal of Biopharmaceutical Statistics. Advance online publication. https://doi.org/10.1080/10543406.2023.2168366
Milanovic, I., Medarevic, D., & Ibri?, S. (2021). A critical overview of FDA and EMA statistical methods to compare in vitro drug dissolution profiles of pharmaceutical products. Pharmaceutics, 13(10), 1703. https://doi.org/10.3390/pharmaceutics13101703
Moore, J. W., & Flanner, H. H. (1996). Mathematical comparison of dissolution profiles. Pharmaceutical Technology, 20(6), 64–74.
Noce, L., Gwaza, L., Mangas-Sanjuan, V., & García-Arieta, A. (2020). Comparison of free software platforms for the calculation of the 90% confidence interval of f2 similarity factor by bootstrap analysis. European Journal of Pharmaceutical Sciences, 146, 105259. https://doi.org/10.1016/j.ejps.2020.105259
O’Hara, T., Dunne, A., Butler, J., Devane, J., & Brock, I. (1998). A review of methods used to compare dissolution profile data. Pharmaceutical Science & Technology Today, 1(5), 214–223. https://doi.org/10.1016/S1461-5347(98)00053-4
Paix?o, P., Gouveia, L. F., Silva, N., & Morais, J. A. G. (2017). Evaluation of dissolution profile similarity – Comparison between the f2, the multivariate statistical distance and the f2 bootstrapping methods. European Journal of Pharmaceutics and Biopharmaceutics, 112, 67–74. https://doi.org/10.1016/j.ejpb.2016.10.020
Polli, J. E., Rekhi, G. S., Augsburger, L. L., & Shah, V. P. (1997). Methods to compare dissolution profiles and a rationale for wide dissolution specifications for metoprolol tartrate tablets. Journal of Pharmaceutical Sciences, 86(6), 690–700. https://doi.org/10.1021/js960473x
Rescigno, A. (1992). Bioequivalence. Pharmaceutical Research, 9(7), 925–928. https://doi.org/10.1023/A:1015809201503
Saranadasa, H., & Krishnamoorthy, K. (2005). A multivariate test for similarity of two dissolution profiles. Journal of Biopharmaceutical Statistics, 15(2), 265–278. https://doi.org/10.1081/BIP-200049832
Shah, V. P., Tsong, Y., Sathe, P. M., & Liu, J. P. (1998). In vitro dissolution profile comparison – statistics and analysis of the similarity factor, f2. Pharmaceutical Research, 15(6), 889–896. https://doi.org/10.1023/A:1011976615750
Stevens, R. E., Gray, V. A., Dorantes, A., Gold, L., & Pham, L. (2015). Scientific and regulatory standards for assessing product performance using the similarity factor, f2. AAPS Journal, 17(2), 301–306. https://doi.org/10.1208/s12248-015-9723-y
Wang, Y., Snee, R. D., Keyvan, G., & Muzzio, F. J. (2016). Statistical comparison of dissolution profiles. Drug Development and Industrial Pharmacy, 42(5), 796–807. https://doi.org/10.3109/03639045.2015.1078349
Xu, Z., Merino-Sanjuán, M., Mangas-Sanjuan, V., & García-Arieta, A. (2021). Estimators and confidence intervals of f2 using bootstrap methodology for the comparison of dissolution profiles. Computer Methods and Programs in Biomedicine, 212, 106449. https://doi.org/10.1016/j.cmpb.2021.106449
Yüksel, N., Kan?k, A. E., & Baykara, T. (2000). Comparison of in vitro dissolution profiles by ANOVA-based, model-dependent and model-independent methods. International Journal of Pharmaceutics, 209(1-2), 57–67. https://doi.org/10.1016/S0378-5173(00)00554-8
Zhang, Y., Huo, M., Zhou, J., Zou, A., Li, W., Yao, C., & Xie, S. (2010). DDSolver: An add-in program for modeling and comparison of drug dissolution profiles. AAPS Journal, 12(3), 263–271. https://doi.org/10.1208/s12248-010-9185-1
Enhanced Dissolution Testing with JMP
Unlock the full potential of your dissolution testing with JMP's advanced Fit Curve platform. Designed for precision and versatility, JMP supports a comprehensive suite of both model-dependent and model-independent methods, including F1, F2, Multivariate Distance, and T2EQ.
JMP goes beyond the basics, incorporating sophisticated analyses like the BCA Bootstrap F2 and an extensive array of model-free comparisons. Dive deep into your data with options such as Higuchi Curves for matrix system release, Hixson-Crowell Curves for particle dissolution analysis, Korsmeyer-Peppas Curves for polymeric system release mechanisms, and Sigmoid Curves for fitting sigmoidal data patterns.
Experience the future of dissolution testing with JMP, where innovation meets accuracy, ensuring your research is always at the cutting edge.
Optimize Your Dissolution Testing with JMP Pro Curve DOE Powered by Generalized Regression
Refine and perfect your dissolution release profiles with JMP's robust Curve DOE feature, powered by Generalized Regression. Elevate the precision and efficiency of your dissolution testing process, saving significant time and reducing development cycles. With JMP Pro's cutting-edge technology, you can optimize your products without the need for repetitive and iterative experiments, bringing your product to market faster.
Previous Blogs on Dissolution
Business Development Professional specializing in Customer Acquisition Strategies and Product Launch Planning
12 分钟前Great insights on BCAF2! The bootstrap-powered approach indeed offers a robust solution for equivalence testing, particularly in cases of high variability or small sample sizes. The bias correction and acceleration features are especially appealing when dealing with skewed data. Well-timed for pharmaceutical science and quality control discussions. #DissolutionTesting #Bioequivalence #Regulatory #ConfidenceInterval #PharmaceuticalScience #QualityControl #BCAF2 #generics #biowaivers #JMP
AGM Quality Assurance at Viatris
11 小时前Interesting
Very well explained