Utilizing LLMs Today in Industrial Materials and Chemical R&D
Large language models (LLMs) are exciting and potentially transformative tools that should be a part of every materials and chemical R&D organization technology solution set. Despite the buzz around LLMs being all encompassing problem solvers by themselves, in practical applications they are part of a well-engineered solution involving several other important digital technologies as well.?
Based on our work with customers and review of the most recent academic literature related to LLM technologies within materials and chemical R&D, we find two categories to be most mature and ready for adoption in industry: knowledge extraction and lab assistants.
LLM Technical Concepts
Large language models (LLMs) are a subset of generative AI that are deep-learning based foundation models trained on “large” sets of text data (ex. chunks of the internet like Wikipedia and Github) and which require a “large” number of model parameters, on the order of tens of billion to a few trillion. There is now a rich ecosystem of available LLMs ranging from large, expensive, closed source models (GPT, Claude, Gemini) to smaller, cheaper, open source models (Llama, Mixtral, Gemma).
First, you should be familiar with three main technical concepts around LLMs:
Anthropic’s Claude 2.1 sets the current industry state-of-the-art for the maximum context window, standing roughly at 150,000 words (i.e. 200,00 tokens). Google’s latest teaser of Gemini 1.5 Pro promises an order of magnitude larger context window.
What goes into productionizing LLMs?
Despite the commendable prowess of LLMs for many different tasks, the scope of what can be directly achieved with a foundation model alone is limited, especially for tasks in niche, highly technical domains like scientific R&D.?
Getting useful results from an LLM requires developing a well-engineered solution that also leverages other important software tools. The top considerations that go into building and productionizing such a solution can be broadly classified into the categories below.
LLM Choice
The very first decision is which LLM(s) to try. Closed source LLMs, such as GPT4, are high performing but also come with higher operational costs. The smaller open source alternatives trail slightly behind in performance at much lower operational costs. Licensing and the costs associated with deployment are other important factors in deciding between open vs. closed source LLMs. Whichever you start with should be integrated into the solution in a modular fashion, allowing you to easily test different models and exchange models down the road when higher performing and/or lower cost models are released in the future.?
LLM Performance Improvement
Once you have your initial model selected, you need to figure out how to use it to produce the intended behavior with sufficient accuracy. The first, quickest way to adjust the model output is through Prompt Engineering. In most R&D cases, this provides some initial improvements, but more advanced optimization is needed to achieve sufficient performance. Retrieval Augmented Generation (RAG) and fine-tuning are the most common and effective methods, but broadly, these methods are better suited for two different kinds of optimizations [2].?
a. Prompt Engineering
In prompt engineering, one modifies the prompt text to obtain a better result from the model. Prompt engineering methods span from implementing slightly abstract general guidelines, such as “writing clear instructions”, to scientific approaches, such as Chain-of-Thought (CoT) prompting [1]. There are several available tools (ex. LangChain) that aid with the systematic exploration of input prompts.
b. Retrieval Augmented Generation (RAG)
RAG involves augmenting the prompt with results from a search algorithm to provide new contextual information to the LLM, such as domain specific knowledge. Setting up a RAG pipeline for a company involves many steps, starting from an existing knowledge base: cleaning the knowledge base, data parsing and ingestion, chunking, indexing, embedding, retrieval, and compression. Each step can and should be optimized so that the best information is included with the original prompt. Since RAG may add a lot of domain specific information to the prompt, it consumes a significant part of the context window and results in higher operational costs.?
c. Fine-Tuning
Fine-tuning involves tuning the model weights (parameters) and is better suited to tune the outputs of an LLM to a particular format. For example, even the popular ChatGPT application allows users to choose from two foundation models, GPT4 and GPT3.5, that are fine-tuned using chat conversations, so that the model behaves like a chatbot. Fine-tuning requires constructing a large, clean dataset to be fed to the model for training before subsequent usage and hence results in a higher upfront cost. It should also be noted that a fine-tuned model typically requires smaller input prompts and hence would be less expensive to operate.
Tool Development & Orchestration
Another important part of productionization-related work is developing tools and documentation to pass to the LLM’s function calling API. Having the LLM as the user of such tools needs to be taken into consideration while developing them. These tools can also include other LLMs that are optimized for particular subtasks. Complex solutions can utilize multiple LLM and non-LLM tools to perform various subtasks in the overall workflow, from improving the context of the RAG pipeline to improving the responses to specific types of prompts.?
领英推荐
LLMs in Industrial Materials and Chemical R&D
How well do LLMs understand domain-specific technical terminology, concepts, and relationships? More importantly, can they understand the narrow subdomain of interest to a business—in this case, materials science and chemistry—and incorporate the intricacies of a company’s process and terminology??
Within the general materials science and chemistry domains, there are several benchmarks developed to answer these questions, including ChemLLMBench [3] and MaScQA [4]. Evaluating model performance on domain-specific tasks is essential when developing solutions that are effective at the enterprise level.?
LLMs ability to convert natural language into actions is powerful in all industries. For materials and chemical R&D specifically, this means LLMs provide the ability to quickly go from an idea in a scientist’s head to the systematic exploration of possibilities, making them excellent candidates as assistants in lab work. In addition, few-shot learning is especially beneficial in materials and chemical research since obtaining large amounts of data for training traditional ML models involves conducting many experiments and is often a significant bottleneck.?
However, being language models, LLMs only work well for a limited set of tasks and do not predict numerical values very well. Also, the tendency of LLMs to ‘hallucinate’ and demonstrate overconfidence in incorrect answers is particularly harmful in research, where there are significant consequences if scientists are misled in arbitrary directions. RAG, however, does help prevent hallucinations by providing critical contextual information within the prompt so that the model does not have to generate such information and can also be used to provide citations, enabling a human expert to quickly distinguish between facts and model hallucinations. Special prompting techniques such as Chain-of-Thought [1] or Self-Consistency [5] prompt the model to elaborate on its reasoning and/or evaluate the previous thoughts leading to the response. The function calling API can also replace critical parts of the generated model response with output from deterministic functions optimized for a particular subtask.
Use Cases Ready for Adoption Today
Again, two primary use case categories for LLMs in materials science and chemical R&D rise to the top for practical adoption today ??
1. Knowledge Extraction & Summarization
LLMs are excellent for knowledge extraction and summarization and are already being utilized in materials science and chemistry for many use cases, for example, auto-generating books that summarize scientific papers [6].?
There are several steps within a materials and chemistry company's workflow where such knowledge summarization is valuable, including but not limited to market research, chemical synthesis planning, computational screening, and responding to customer service requests. In each of these cases, the specifics of search are very different in terms of types of relevant queries, knowledge sources to search over, and kinds of information within these sources that need to be retrieved. For example, when conducting market research for a new material or chemical, one would need to search over websites of competitors, internal documents, and existing patents for determining information such as properties of existing materials, costs, synthesis options, etc. On the other hand, for responding to customer service requests, search needs to be performed over email communications, issue trackers, and internal knowledge bases for names of specific chemicals, issue descriptions etc. To obtain the relevant information in these cases, you may also need specialized tools for extracting it from chemical names, tables, figures, and images.?
RAG is a popular and important technique in domain-specific knowledge extraction. A RAG system can be set up either over an internal repository of documents or an online source of publications such as arXiv or Google Scholar. The performance of the RAG pipeline, however, depends strongly on the performance of the search algorithm that is part of it. Therefore, before optimizing the LLM-based summarization component of the pipeline, it is important to optimize the search algorithm. LLMs can help improve search performance by reranking results from a cheaper algorithm. They can also be used to construct an initial validation data set by asking them the inverse question: given some input chunks of text, generate an appropriate search query which would have these chunks as results.
Bottomline, LLM-based knowledge solutions can be extremely useful in a variety of different ways but need to be tailored to specific use cases to be most effective.
2. Lab Assistants & Automations
As mentioned, due to the natural language capabilities of LLMs, a system of one or more of them can be leveraged to serve as a capable assistant in the lab who automates and abstracts tasks away from the scientist.?
The function calling API is a key feature for building such a system as it provides an interface between the LLM and arbitrary scientific code. The depth of implementation for such an assistant or automation system can range from a simple assistant performing a very specific task to full automation of a complex array of several tasks. The right level of implementation for an organization depends on the organization’s business and the existing workflows, the value generated, as well as the human and physical constraints.
ChemCrow [7] and CRESt [8] are LLM-based systems that act as a central agent making complex decisions (ex. when to search for new knowledge) and calling on external entities to perform complex actions (ex. executing an automated chemical reaction). Although systems like ChemCrow and CRESt demonstrate the possibility of using LLMs to autonomously perform complex planning and execution, the robustness of such fully autonomous systems has not been investigated.?
Instead of naively targeting full automation, a good starting point for companies could, for example, be a “Molecule Design Assistant”. An LLM-based assistant can be developed that allows chemists to provide chemical ideas in natural language and apply them to molecule design spaces, generate additional molecule candidates, and analyze candidate molecules using various computational tools. The central idea behind such an assistant is to connect the LLM to tools that perform specific tasks such as:?
Such a solution combines the natural language capabilities of LLMs with the power of modern physics-based and data-driven computational chemistry tools to make it easier for experimental researchers to leverage powerful computational tools that help them with R&D decision-making.?
The Time is Now
LLMs have come on the scene rapidly and the technologies will continue to improve alongside other digital technologies, but leveraging them in materials science and chemical R&D isn't just a speculative venture for some “AI future.” They should be part of your current innovation strategy and can be implemented today. Companies who take advantage of the unprecedented opportunities that LLMs provide will lead the market in an increasingly technology-driven industry.
If you would like more specific advice and help developing an LLM-based solution that’s right for your company, Enthought’s experts can help .
Interested in more? Check out: The Modern Materials Science and Chemistry Lab
References
Very insightful! A few months ago there was a paper from MS research showing that GPT-4 by itself was OK but not great on diverse tasks in materials R&D (https://arxiv.org/abs/2311.07361). Do you think that engineering (RAG, fine-tuning, etc) can lift LLM performance to the next level?
Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer
8 个月It's remarkable to witness the integration of LLMs into materials science and chemical R&D, bridging the gap between theoretical exploration and practical application. How do you envision the future trajectory of LLM utilization in these fields, considering the evolving nature of AI technologies and the dynamic demands of scientific research?