I recently commented on the risks of under-characterizing materials in high-throughput experiments (HTE) / self-driving labs (SDLs). Robert Palgrave's X post on a recent pair of Nature papers (1, 2) revived these concerns.
First, a disclaimer: Doing HTE / SDL research is hard. We're talking Peter Thiel "complex coordination" hard — orchestrating robots, data management, computation/ML, materials, humans, and so much more. (This is challenging in environments that reward individual achievement more than "team.") So hats off to any group taking on this really hard problem, especially in academia and labs.
But back to the issue of under-characterizing new materials with SDL workflows. Here are some ideas for how we can raise the bar together. I'd love your feedback.
A proposed, draft list of checksums for reporting "novel materials" from SDLs:
- Read the standards for reporting novel crystal structures. (I believe Matshona Dhliwayo is quoted, "Bend the rules only if you have learned them; break the rules only if you have mastered them.")
- Repeat the synthesis & measurements N times. How much is N? John Gregoire mentioned one collaborator required N = 10, ideally 100. Some say that's excessive, others say they learned a lot from this exercise. Let's discuss.
- Perform a credible crystallographic analysis, minimum p-XRD with "low" residuals. Palgrave shares some guidance in the aforementioned post.
- Perform a credible elemental analysis. I recall John Perkins advocating for RBS & XRF for low & high Z, respectively. Others prefer depth-dependent XPS. We can negotiate if solid rationale is presented and measurement artifacts avoided.
- When reporting a multinary system, one should disprove the null hypothesis that the reported mixed-phase material actually decomposed (e.g., via spinodal decomposition). E.g., show that the p-XRD pattern cannot be fit using one or more of the likely decomposition sub-phases (Palgrave's post), and/or elemental distribution is homogeneous (a harder sell b/c of limited spatial resolution). Crystallographers who work on intermetallics fret over this a lot.
- Facilitate scientific reproduction by open sourcing data & code. Skirting open source standards by invoking bio-related "dual use" concerns for a materials paper, which admittedly carries a different risk profile, opens the door to less charitable interpretations.
- Cite relevant literature. Personally, I think Chen & Ong's universal interatomic potentials paper is chronically under-cited. What else comes to mind?
- Disclaimer of experimentalist bias, but I don't think it's "discovered" until it's synthesized & characterized, because of the persistent gap between first-principles simulations and experiment in matsci. Yes there's value in synthetic databases. Synthetic databases are predictions, not discoveries; let's call it such.
- Let's be precise with benchmarks. The post below mentions "only 28k [new materials] discovered in the last decade." In a recent review, I counted 111,345 new entries into the Inorganic Crystal Structure Database between 2012 and 2022.
We can all do better — and that includes myself and my own group. SDL is hard, really hard. It's pushed me to my limit as scientist and manager. And we can all improve by working together. The recent advances are exciting, and I'm happy they're getting good press and helping to bring this community mainstream. I welcome feedback on how we can do better as a community, as we forge ahead together into an exciting era of SDL-enabled science. Thanks for reading, for sharing your ideas, for supporting the authors who did a herculean feat of complex coordination and put their best work out there, and for striving to constantly improve.
Thanks to Rob Palgrave for providing valuable feedback on an earlier draft!
Cofounder and Chief Science Officer @Optigon. Passionate about cheap solar and quantum tech for energy applications. ??@MIT ??@UW and @Pepperdine #firstgen
12 个月I think moving beyond just XRD characterization, especially if you can pair in optical, elemental, structural characterization in a high-throughput manner, could give more confidence in claiming new material discovery
Program Director @ ARPA-E | Entrepreneur | PhD in Physical Chemistry
12 个月I am interested in expert’s priority list of filling the gap one at a time instead of making big claims. Let’s be a bit more practical and realistic. What do funding agencies need to focus on?
i^4 Making it Real and Positive (i^2)^2 Creative & Analytical Solutions Provider / Strategy & Business Development / Cross-Fertilization
12 个月Core to FAIR challenges with AI results may be related to whom will own the findings and priority rights that will further emerge from such massive output?
Research Scientist at Google DeepMind
12 个月The 111,345 number from ICSD is for stable plus metastable. We are counting and reporting only stable in the paper (on the computational convex hull). The fact that 28k refers to the stable materials is pretty clear in the paper, the blog post, and the twitter post (https://x.com/GoogleDeepMind/status/1729895680959811781?s=20). I couldn't find the full text from the included screenshot, maybe it's ambiguous there?
Professor of Materials Science and Engineering University of Toronto and Research Scientist at Natural Resources Canada | Ressources naturelles Canada
12 个月I would add that if you are truly performing detailed and careful diffraction analysis you can almost never discount the present of secondary phases. Influences of microstructure, strains, and disorder confound our ability to draw definitive conclusions. One can at best say we can only definitively state that we have no more than x% of secondary phase within a possible envelope of microstructural states.