Purdue University engineers have conducted a groundbreaking study exploring the integration of large language models, such as ChatGPT, in autonomous vehicles (AVs). Led by Ziran Wang, an assistant professor in Purdue’s Lyles School of Civil and Construction Engineering, the study aimed to determine how well AVs can interpret passenger commands and drive accordingly using these advanced artificial intelligence algorithms.
The research, which is set to be presented at a conference on September 25, may be one of the first experiments to test the capabilities of real AVs in utilizing large language models to understand and respond to passenger instructions. Wang believes that for AVs to achieve full autonomy in the future, they must be able to comprehend and act upon all passenger commands, even when implicit.
Current AVs require explicit and precise instructions from passengers, either through button presses or explicit speech recognition. In contrast, large language models have the potential to interpret and respond to a wide range of commands in a more human-like manner due to their ability to draw relationships from vast amounts of text data and continuously learn over time.
In the study, large language models were not directly driving the AVs but rather assisting their driving using existing features. Wang and his team trained ChatGPT with various prompts, ranging from direct commands to more indirect requests. These models were then integrated into the AVs, which considered traffic rules, weather conditions, and sensor data from cameras and light detection and ranging systems.
When a passenger command was detected by the vehicle’s speech recognition system, the large language models reasoned the command with the predefined parameters set by the researchers. Subsequently, the models generated instructions for the AV’s drive-by-wire system, which controls throttle, brakes, gears, and steering, enabling the vehicle to drive according to the command.
The researchers conducted most of their experiments at a proving ground in Columbus, Indiana, which provided a safe environment for testing the AVs’ responses to passenger commands at highway speeds and in two-way intersections. They also evaluated the AVs’ parking abilities according to passenger instructions in the parking lot of Purdue’s Ross-Ade Stadium.
Based on survey responses from the study participants, the AVs equipped with large language models demonstrated a lower rate of discomfort compared to baseline data on people’s experiences in level four AVs without assistance from such models. The AVs also outperformed baseline values in terms of safety and comfort, including reaction time to avoid collisions and acceleration/deceleration rates.
However, the study highlighted the need for improvement in response time, as the large language models took an average of 1.6 seconds to process a passenger’s command. Additionally, the issue of “hallucination” in large language models, where they may misinterpret learned information and respond incorrectly, remains a challenge that must be addressed before integrating these models into AVs.
Further testing, regulatory approval, and addressing the hallucination issue are necessary steps for vehicle manufacturers to consider implementing large language models in AVs. Wang and his team continue to conduct experiments with various chatbots based on large language models, with ChatGPT showing promising results so far.
Future directions include exploring communication between large language models in different AVs to aid decision-making at intersections and studying the use of large vision models to enhance AVs’ performance in extreme winter weather conditions prevalent in the Midwest.