LangSim – Large Language Model Interface for Atomistic Simulation
The general success of large language models (LLM) raises the question if they could be applied to accelerate materials science research and to discover novel sustainable materials. Especially, interdisciplinary research fields including materials science benefit from the LLMs capability to construct a tokenized vector representation of a large body of literature, larger than the literature any human could read in their lifetime.
Still at least at the current stage, the reliability and trustworthiness of the answers generated by LLMs remains an open challenge. So called hallucinations, can lead to responses which sound reasonable but could not be further from the truth. This is based on the general limitation of machine learning models or any flexible model with a large number of parameters, they are great for interpolation but their extrapolation capabilities are limited. One way to address this limitation is the development of agents which provide access to domain knowledge to the LLM. These agents can be Web APIs or Python functions and when the LLM is asked to perform a task which involves using this agent, it derives the parameters for querying the interface from the user’s requests, queries the interface and then includes the returned information in the response to the user. With this modification the reliability of the LLMs can be drastically improved.

Based on this experience and the expertise in workflows for computational materials science simulation based on the pyiron workflow framework, LangSim a large language model interface for atomistic simulation was developed. The LangSim framework provides agents for atomistic simulation implemented by materials science experts, resulting in improved predictive capabilities for atomistic simulation. For example, when ChatGPT 4o is tasked to calculate the bulk modulus for Aluminum with the atomistic simulation environment and the effective medium theory simulation code, it fails at this task as it makes a mistake in converting the unit for the bulk modulus. In contrast, by extending ChatGPT with the LangSim framework it is able to complete the required simulation workflow without any mistakes. Furthermore, LangSim is not limited to ChatGPT but can be combined with a wide range of different LLMs.
Beyond standard calculations for unary elements, LLMs can also be applied for inverse materials design. For example, when a bulk modulus of 145 GPa is required, the LLM based on the LangSim agents for atomistic simulation can identify the solid solution of a Cupper Gold alloy which has a bulk modulus of 145 GPa. This is would not be possible with a database of pre-computed results, but rather requires the capability to dynamically evaluate selected configuration and iteratively determine the concentration of the corresponding alloy. With the LangSim project Jan Janssen and his team of international researchers won the first prize of the “2024 LLM Hackathon for Applications in Materials and Chemistry”.
References: