Practical DFT-D4 calculations by using optimal VDW D4 parameters for functional-basis set pairs (presenting examples of good, bad, and ugly results)
Introductions, motivations
The most modern fourth generation DFT dispersion corrections (D4 VDW corrections) have been implemented in our software based on the article of [Caldeweyher E, Ehlert S, Hansen A, Neugebauer H, Spicher S, Bannwarth C, Grimme S. A generally applicable atomic-charge dependent London dispersion correction. The Journal of Chemical Physics. 2019;150(15):154122].
The two-body dispersion energies are approximated with the
equation. We cannot go into every detail here, but we note that Sn are optimizable parameters where S6 is usually kept one for most functionals and S8 are optimized for each supported functionals. The most frequently used “damping” function has the form of
where the a1?and the a2?are also optimizable parameters. The value of the damping function is 1 at the asymptotically large atomic separation where the simple dispersion energy equation above is valid and smoothly goes to 0 with decreasing the atomic distance where the DFT functional takes over the description of the electron correlations. Thus,?a1 and a2?are also functional dependent parameters. Traditionally all four parameters are determined by DFT calculations for popular functionals using large basis sets close to the basis set limits and fitted to very high quality CCSD(T)-CBS standard intermolecular interaction benchmark data sets. We discovered, however, that the assumption of these parameters to be basis set independent is an inaccurate approximation in some cases and significant accuracy gains can be obtained for medium size and very practical basis sets by optimizing S8, a1, a2 parameters for functional/basis set combinations. For instance, the RMSD error of the DFT-D4 energies compared to the accurate CCSD(T)-CBS energies in the S66x8 standard intermolecular benchmark set using revTPSS functional and def2-svpd basis set is 1.6 Kcal/mol using the published D4 parameters. This error is rather large and even good quality force fields can provide more accurate results. After optimization of the S8, a1, a2 parameters using our private CCSD(T)-CBS data set we have repeated the same DFT-D4 calculations for the S66x8 standard set by using it as the test set. We noticed that the RMSD error of the interaction energies was reduced from 1.6 Kcal/mol to 0.62 Kcal/mol. This is a very significant improvement in accuracy. It is easy to understand the reason why such improvements are possible. Medium size basis sets like the def2-SVPD have some significant BSSE (Basis Set Superposition Error) which make intermolecular interaction artificially too strong while using very large basis sets close to the basis set limits the BSSE is negligible. When the VDW dispersion corrections are optimized using very large basis sets the resulting corrected DFT-D4 interactions energies are close to the reference energies (CCSD(T)-CBS) but obviously if we add the same VDW D4 corrections to the DFT energies obtained with medium basis set with some significant BSSE then we overestimate the reference interaction energies due to the BSSE. It is therefore much more beneficial to have a different VDW correction which goes down to zero more rapidly with the decrease of atom-atom distances than the original VDW correction does. Since medium size basis sets are the most important in practical computational drug design projects this research and developments offer significant impact in improving the accuracy of practical VDW corrected DFT-D4 calculations and this is the primary motivation of this work. Our QFDFT software can now offer not only exceptional calculation speed but also superior accuracy upon the re-parameterization of S8, a1?and a2?parameters.
There are two important notes to highlight before we jump into the details. First, note that we do not look for general conclusions for all quantum chemistry problems. We want to focus on problems that are dominating the computational drug design field i.e., we would like to have accurate intermolecular interaction energies and accurate geometries since this is essential for a large range of problems from protein ligand interactions and interaction with solvent molecules, through the important non bonded interactions in conformational and strain energies to organic crystal structure and solubility predictions. Second, note that we do not want to develop a much more sophisticated method that can deal with all deficiencies coming from some basis sets. If we obtain three new D4 parameters for our supported basis set/functional combinations, then the calculation expenses do not change at all and implementing three new parameters for each functional-basis set pairs requires practically negligible development time (which is ideal for our startup company).
At the moment QFDFT supports PBE, BP86, TPSS, revTPSS, RGE2, revSCAN, R2SCAN, revM06_L functionals and 6-311G**, 6-311++G**, 6-311G(df,pd), 6-311++G(df,pd), def2-SVP, def2-SVPD, def2-TZVP basis sets for production calculations and additional pc2 and def2-TZVPPD large basis sets for special tests purposes. We have optimized the D4 parameters for all functional and production basis set pairs. revM06_L functional does not need VDW corrections of course. The training set utilizes our private CCSD(T)-CBS dimer set with nearly 200 dimers having 10+ points each along a given direction of interaction. This training set has some minimal overlap with the S66x8 benchmark set which we used as a test set. Note, that the S66x8 set was part of the training set in the parameter optimizations of Professor Grimme’s research group and therefore we expect that at large basis sets using Grimme’s parameters provide slightly more accurate results for the S66x8 set but it does not necessary mean that it is more accurate in general since the optimum values are different for our training set for instance. Nevertheless, the differences are very small for large basis sets as we will show below.? QFDFT program automatically utilizes the optimal QF VDW D4 parameters for all production calculations and with a simple command line option one can request to use Grimme’s parameters if it is desired by the user and if it is available. The revSCAN and the RGE2 funtionals, for instance, do not have optimized Grimme’s D4 parameters as far as we know. In addition, our parameterization is not considering exclusively the energy values of the dimer’s data set and we have added two more quantities both in the training phase to obtain optimized D4 parameters and we calculate those quantities in the testing phase as well. First, we have determined the minimum locations and the minimum energies for all dimers based on the CCSD(T)-CBS and the actual model energy curves by fitting a simple quadratic function at their minimum and we considered the RMSD of the minimum locations as well as the RMSD of the minimum energies in our parameter optimization process. In this analysis we have determined the same quantities for the S66x8 test set and we tabulated the RMSD and the MD (mean deviation) of all energy points compared to the CCSD(T)-CBS indicated as (A) in the tables, the RMSD and the MD of the minima locations indicated as (ML) in the tables and RMSD and MD for the minima energies indicated as (MV) that stands for minima values. There are some rare cases when the given model energy curve is repulsive. We simply excluded those dimers in the statistics of the minimum locations and minimum values. All energies are in Kcal/mol and the geometry stats are based on using Angstrom.
The good results
The table below shows our best results using revSCAN, revTPSS and R2SCAN functionals and def2-TZVP, 6-311++G**, 6-311++G(df,pd) basis sets. Any combination of these basis sets and functionals, regardless of whether we use QF or Grimme’s D4 parameters, provide accurate results with only some minor differences here and there. The revSCAN functional with QF parameters seems to be the most accurate one. We have not found optimized D4 parameters yet from Grimme’s group for the revSCAN functional. Both QF and Grimme’s D4 parameters are available for revTPSS and R2SCAN functionals and can be chosen with a simple command line option in all our QF applications. The default is QF D4.??
The bad results
All results are very inaccurate and basically below force filed quality by using D4 parameters from Grimme’s group which were optimized using a very large basis set. This statement is true for all functionals that we have tested so far. The QF optimized D4 parameters make the def2-SVPD basis set much more reasonable for DFT-D4 calculations. Having said that the computational costs with the def2-SVPD basis set are very similar, almost the same as with def2-TZVP using QFDFT and the later basis set looks to be more accurate and therefore choosing def2-TZVP or even 6-311++G** basis set is recommended.????
领英推荐
The ugly results
The results above clearly indicate that something could be wrong with Grimme’s D4 parameter for BP86 functional because regardless of the choice of the basis set, we obtained extremely inaccurate and below force field quality results. After triple checking our implementation, we have contacted Professor Grimme’s research group, and we have received the following reply:
?
“Hi Laszlo,
?
interesting question. I investigated a bit and found out that the BP functional as implemented in Turbomole is using a different LDA correlation functional than Orca. If I recall correctly I performed the BP calculations with Turbomole back then.
?
Whether this actually has an impact on the D4 parameters needs checking, I haven't recalculated the BP interactions with Orca yet to redo the fit and see whether this might be the cause. On the other hand it might just be a suboptimal fit for the BP functional with D4.
?
That's all I have at the moment.”
?
Based on this reply it seems to us that perhaps not the correct BP86 functional has been used during the D4 parameterizations at Professor Grimme’s research group. We hope that the situation is not the same for the previous generation D3 parameterizations because almost countless scientific papers, proposals, reports have been using DFT with dispersion corrected BP86 functional over the last decades or so, and having such blow for the accuracy of all such published results would not look good for the community. Note also that the results could be much more accurate by using the same incorrectly implemented BP86 functional which had been used during the parameterizations. Our applications obviously do not support Grimme’s D4 parameters for DFT-D4 calculations with the BP86 functional while the QF optimized D4 parameters provides reasonably accurate results.???
This article together with all tabulated raw data is downloadable from our website.
Merry Computing!
Retired from OpenEye Scientific Cadence Molecular Sciences
11 个月It seems to me that the basis set dependence of dispersion correction in DFT calculations didn't receive much attention in recent years (correct me if I am wrong). Therefore I am finding results presented here by Laszlo as particularly important in accurate QM predictions of different molecular properties.