If you’ve ever collaborated with someone on a complicated dish, you know the degree of coordination needed. One person chops this, another sautés that, and you dance around with hot pans and knives. When you’d like something done, you might silently prod each other by putting supplies or tools within each other’s reach. Learn how artificial intelligence learns how to coerce individuals by watching a game of cooperative cooking.
What kind of interactions like this could a robot handle?
Research presented at the Neural Information Processing Systems, or NeurIPS, conference in New Orleans at the end of 2023 offers some hints. It was discovered that AI could pick up the skills necessary to influence a human partner by observing people working together in a basic virtual kitchen.
Artificial intelligence and humans will work together more frequently in the future, both online and offline. And occasionally, just like a supportive colleague who is aware of our shortcomings, we may want an AI to influence our decisions and tactics subtly. “According to Stefanos Nikolaidis, who oversaw the Interactive and Collaborative Autonomous Robotic Systems (ICAROS) lab at the University of Southern California in Los Angeles and was not involved in the work, “the paper addresses a crucial and pertinent problem” on how AI may learn to affect people.
With the help of this new work, AI can now learn to work with humans without even having to practice on us. According to Nikolaidis, it might assist us in enhancing human-AI relations and identifying situations in which AI may choose to exploit humans—either because humans have programmed it to do so or because it has decided to do so on its own.
Acquiring knowledge through observation
Researchers have already trained AI to affect individuals in a few different ways. Many techniques use a technique known as reinforcement learning (RL), in which an artificial intelligence (AI) interacts with its environment, which may consist of people or other AIs and is rewarded for making decisions that result in the intended outcomes. For instance, AlphaGo, a DeepMind programme, used RL to master the board game Go.
However, teaching an ignorant AI to communicate with humans through trial and error alone can be very time-consuming and even dangerous if, for example, blades are involved (as they might be in a real kitchen). An alternative approach would be to teach one AI to mimic human behaviour and then utilise that AI as a constant human stand-in for another AI to become conversant with. Researchers have used this strategy, for instance, in a straightforward game where players form partnerships with money units. However, accurately mimicking human behaviour in more complicated settings, like a kitchen, might be challenging.
A team at the University of California, Berkeley, conducted a new study using a technique known as offline reinforcement learning. Rather than using real-time interaction to build strategies, offline reinforcement learning (RL) analyses previously recorded behaviour. Offline reinforcement learning (RL) was brought to bear on the challenging task of persuading human collaborators. Previously, RL was mainly used to assist virtual robots in moving or AIs in solving mazes. This AI learns by observing human interactions rather than by engaging with people.
Humans are already rather skilled at working together. Therefore, compared to when an individual interacts with an AI for the first time, less data is required to demonstrate competent collaboration when two humans are working together.
Preparing Soup
The Overcooked video game, in which two chefs split up the work to create and serve meals—in this example, soup—earns points for the researchers’ use in the experiment. It’s a two-dimensional world with pots on a stove, tomatoes, onions, and dishes. Each virtual chef can walk up, down, left, or right at each time step, stand motionless, and interact with objects in front of it.
Initially, the researchers gathered information from couples engaged in the game. Then, they used offline reinforcement learning (RL) or one of three alternative techniques to train AIs. (The AIs were developed using neural networks, a type of software architecture meant to resemble the functioning of the brain in all cases closely.) In one approach, AI mimicked human behaviour. It mimicked the most incredible human performances in another. The third approach used AIs to practice on each other while ignoring human data. The fourth was offline reinforcement learning, where AI does more than only mimic; it also synthesises the most salient features from what it observes to outperform the behaviour it tracks. It applies counterfactual reasoning, estimating the score it would have received in different scenarios and making adjustments accordingly.
The AIs played the game twice. The team received double points if the human partner brought the soup in the “human-deliver” variation: soup that contained only tomatoes and no onions received double points in the “tomato-bonus” edition. After instruction, the chefbots engaged in human-to-human play. The AIs had to extract general principles to score higher because the scoring system differed during training and evaluation from when the original human data were collected. Importantly, AIs had to prod humans to follow these criteria during evaluation because they were unfamiliar with them, such as no onion.
Training with offline reinforcement learning produced an average score of 220 on the human-deliver game, around 50% higher than the best comparable approaches. It resulted in an average score of 165 on the tomato-bonus game, or nearly twice as many points. The research explained how the bot would set a bowl on the counter next to the human when it wanted the human to deliver the soup, supporting the theory that the AI had learned to manipulate people. The researchers did not find any examples of someone giving a dish to someone else in this way in the human-human data. However, there were instances in which someone picked up and placed down dishes, and the AI might have found it helpful to connect these actions.
Influencing how others behave
The researchers also devised a way for the AI to deduce and subsequently affect human culinary procedures, not simply the immediate actions of the humans. In real life, you may take up the duty of peeling the carrots for your cooking partner if you know they take their time and stop reaching for them. The present strategy of their partner could be inferred by altering the neural network to consider both the game state and past activities of their companion.
The group gathered human-to-human data once more. Then, they used either the prior offline RL network architecture or this one to train AIs. Inferring the partner’s strategy increased results by almost 50% on average when tests were conducted with human partners. For instance, the bot learned to continuously block the onions in the tomato-bonus game until players eventually left them alone. According to UC Berkeley computer scientist and study coauthor Joey Hong, “It was surprising that the AI worked so well with humans.”
“It’s a great idea to avoid using a human model,” says computer scientist Rohan Paleja of Lexington, Massachusetts’s MIT Lincoln Laboratory, who was not involved in the project. “It makes this approach applicable to many real-world problems where accurate human simulations are currently lacking.” He added that the system is data-efficient, having only needed to watch 20 human-human games (each requiring 1,200 steps) to reach its full potential.
Nikolaidis believes the technique has the potential to improve AI-human cooperation. However, he wishes the authors had done a better job of recording the behaviours shown in the training set and the precise ways in which the novel approach altered participants’ behaviour to raise scores.
Better or worse?
In the future, we might collaborate with AI partners in digital fields like writing, research, and trip planning, as well as in kitchens, warehouses, operating rooms, and battlefields. (For some of these jobs, we already employ AI tools.) “This kind of strategy could help people achieve their objectives when unsure of the best way to accomplish this,” says Emma Brunskill, a Stanford University computer scientist who was not involved in the project. She suggests that an AI could monitor data from fitness applications and develop more effective notification systems to encourage users to stick to their New Year’s health goals (SN: 3/8/17). According to Hong, the technique may also teach people how to persuade others to give more to charities.
Yet, there is a negative aspect to AI influence. As Brunskill puts it, “online recommender systems can try and make us buy more or watch more TV—not just for the moment, but also to shape us into people who buy more or watch more.”
Prior research unrelated to human-AI cooperation has demonstrated how reinforcement learning (RL) can assist recommender systems in changing users’ preferences to make them more predictable and satisfying—even when they do not want their preferences changed. Furthermore, Micah Carroll, a computer scientist at UC Berkeley who collaborates with one of the research authors, claims that even if AI is helpful, it might do so in ways we find objectionable. One could consider obstructing a co-chef’s way a type of coercion. “As a field, we have not yet integrated means for an individual to convey to a system the kinds of influence they are comfortable with,” he claims. “For instance, ‘I don’t mind if an AI tries to convince me to follow a certain course of action, but please don’t force me to.'”
Currently, Hong wants to use his method to improve chatbots (SN: 2/1/24). Large-language models that power ChatGPT and similar interfaces are usually not trained to conduct multi-turn conversations. “When you ask a GPT to do something, it often responds with its best estimate of what it believes you want,” the man explains. “To better understand your true intent and tailor its responses, it won’t ask for clarification.”
Gaining the ability to guide and assist others in conversation seemed like a practical short-term use. With its two dimensions and constrained menu, “Overcooked,” in his words, “is not going to help us make better chefs.”