You’re driving along and a person darts in front of you — do you swerve even if it means crashing into a wall? What if the person were a child?

And when filing taxes, do you take advantage of any loophole you can find?

Pretty soon, artificial intelligence systems may be making these kinds of decisions for us. More than half of the global population is already using AI chatbots today, and various kinds of AI will likely become part of our lives in a growing diversity of contexts and products.

Computer scientist Iyad Rahwan and his team at the Center for Humans & Machines at the Max Planck Institute for Human Development in Berlin believe we should study such issues before they arise in real life. In what they call “science fiction science,” the team is examining future scenarios by looking at how people behave when using novel kinds of AI and asking people how they think AI ought to behave.

The goal, the researchers stress, is to predict in as much objective detail as possible how people and AI will interact in the future, not to prescribe how companies and governments should navigate this treacherous terrain.

Will we always struggle to trust AI, or will we come to prefer it over working with humans? Will interacting with AI make us behave better or worse? And who should be held responsible when things go wrong?

Rahwan and coauthors explored the moral psychology of artificial intelligence in the 2024 Annual Review of Psychology. This discussion has been edited for length and clarity.

Why do we need to study the “moral psychology of AI”? Isn’t moral psychology a distinctly human thing?

The short answer is we need a moral psychology of AI for the same reasons we need moral psychology of humans. We need to understand what drives humans to be more or less moral, in order to promote a more moral society and more moral behavior in our institutions. Now that AI machines are increasingly interfacing with the world, I think we need to do the same for them, since they will be making decisions with moral consequences.

We need to understand what moral principles they follow, if any, and to what extent these are due to their training, their programming, the datasets they were fed or the contexts in which they operate.

There’s a famous problem called the “Collingridge dilemma,” which claims that when technology is still easy to change and adjust, we don’t have enough information to know what to do. Then, by the time we do have enough information, it’s so entrenched it’s become very difficult to change.

So I think it’s very important that science builds this body of evidence early enough, ideally ahead of time, to speed up our ability to respond to these new challenges. Imagine if we’d done a proper scientific trial on the impacts of social media on children back in 2010, and discovered the adverse impacts it had on socialization or mental health in those early days. That would probably have been good.

pCartoon shows an AI robot cajoling a child to stay indoors with it instead of going outside to play with human friends.

As AI chatbots become more agreeable to interact with, we may get too attached to them and risk social isolation or intentional manipulation by the companies that run them, as Iyad Rahwan depicts in this cartoon.

CREDIT: EVILAICARTOONS.COM

One problem may be that these things are often a moving target. Social media has changed a lot from what it was like in the beginning, and AI systems are changing even faster.

I agree. The platforms are continuously changing. AI chatbots today may be trying to be nice and polite, accurate, and sort of politically correct, but there’s no guarantee of that in the long run. They are already moving towards hyper-personalization, where everybody’s chatbot behaves differently. The AI, for example, may affirm whatever prejudices and conspiracy theories the user already believes in, in order to keep the user engaged. AI chatbots are already highly sycophantic, agreeing with our politics and affirming our prior beliefs and value judgments. They may also, alternatively, subtly push political agendas set by the AI company and its political allies. The political and financial incentives acting on AI companies may take us to very strange places, which is why I think it’s important for us to study the behavior and impact of AI agents now.

They may be trying to be nice, but because they are trained on data generated by humans, AI systems may also have adopted many of our unfair biases. What can be done about that?

We know such biases exist among humans, and they manifest themselves in online data, whether they’re images or text. So when machines learn from our data, they pick up on these biases. And when they are trained to generate similar data, they’ll reproduce those biases too, when making decisions or producing text or images.

A 2023 study using ChatGPT-3 showed this may also happen when AI bots summarize text. When humans summarize stories, they tend to remove information inconsistent with common stereotypes, which can lead to a shift in the text towards more stereotypical information. It’s similar with AI: If, for example, a text states a man is a business executive and also doing most of the chores around the house, the second thing is more likely to be left out by an AI system asked to summarize the text, thus reinforcing the stereotype.

Now that we know AI does this, we can try to overcome this problem by fine-tuning the AI model to eliminate those kinds of biases. This is probably easier to do with machines than humans, by training them on more carefully curated data or by reinforcement learning, in which AI learns to make better decisions through human feedback, to discourage stereotypes.

This may create a new kind of problem, which is that the training reflects the biases of a handful of companies from one or two countries, whose AI applications may be used in all domains — decision-making, human resources, legal, educational and so on. So I think we need to quantify those biases by interrogating each of these models, regardless of their origins. We need independent scientific analysis of how accurate, racist or sycophantic different AIs are, and trustworthy institutions to certify them as such.

For high-stakes domains like medicine, at least, the European Union’s AI Act is going in that direction. Under this legislation, AI applications classified as “high-risk” must be independently assessed before they hit the market. Developers have to prove their systems are accurate and secure and have mitigated discriminatory biases — much like how a new medical treatment must go through clinical trials before it can be prescribed to patients.

But from a scientific perspective, how do you define what qualifies as a bias?

In my own research, I’m very interested in cultural variation in moral psychology and its impact on AI. Very early on — in the prehistoric times of 10 years ago, which is like a century in AI time — we launched the Moral Machine experiment, in which people were asked online to give their opinion about how a future autonomous car should solve a moral dilemma. Basically, there’s an unavoidable crash, and scenarios were presented in which people had to choose whose safety the car should prioritize.

We translated it to 11 languages, and the website went viral and became a very large-scale survey.

We had tens of millions of decisions from people worldwide, which gave us a really rich picture of all the agreements, but also of cultural differences. There are broad, universal agreements: Almost everybody prefers to save children over adults, women over men, a person crossing legally over a jaywalker, and so on. But there were differences in the relative strength of these, which means the way people resolve these dilemmas across different cultures differ in significant ways.

For example, people in every single country thought the vehicle should prioritize the safety of children and younger people over older people. However, the intensity of this preference varied considerably. In Western countries, which score high on individualism, the preference was stronger. In Eastern countries, there was higher value placed on the lives of the elderly.

Countries also differed in the extent to which they prioritized pedestrians who were crossing the street legally over those who were jaywalking. In countries with a stronger rule of law — a measure of compliance with laws and confidence in government — people were more likely to sacrifice jaywalkers to save legally crossing pedestrians.

Black and white cartoon shows a driverless car rudely honking at an elderly person crossing the road as another driverless car comments: “Those Western driverless cars have no respect for the elderly!”

Research shows that when asked how driverless cars should behave in traffic, people around the world agree on many things — but there are also some distinct differences, as this cartoon by Iyad Rahwan illustrates.

CREDIT: EVILAICARTOONS.COM

This is not limited to issues with future autonomous vehicles. It may also be an issue for large language models (LLMs) or image-generation models. Some things have different interpretations in different cultures. For example, in the West, there is a strong consensus that we want to eliminate gender stereotypes in AI. But there are other cultures where a traditional gender division of labor is considered acceptable or is a foundational part of their social organization.

So now we’re in the realm of politics — and culture wars. You might say, “I want to eliminate any biases and every stereotype from these models, around the world,” and maybe that’s the right thing to do, right? But it’s important to realize that’s someone’s decision. The technical question is the easy one.

So you’re not saying what should be done, only that policymakers should be thinking about it.

That’s right. Science can highlight some of these tensions, but it cannot, of course, resolve them. For example, if you ask people as citizens what they think the car should do, they say it should minimize the number of casualties, even if that harms the person in the car. Yet if you ask people which car they would prefer to ride in or buy, they will usually prefer the car that is trained to protect them.

If the programming is left to the companies, they’ll cater to the consumer, and make cars that deprioritize the safety of others. There are similar dynamics with the large language models: Companies are driven to meet consumer demands. The question is, what negative consequences might this have for people who aren’t buying the product, but still might be affected by its decisions? If a car protects only its driver, or an optimization algorithm considers only the benefits to its owner, other people might be harmed.

Perhaps if drivers would be held legally responsible for damage to others, that would help?

I’m not a legal expert, but it seems to me that if a car is making all the decisions, it would be strange for the owner to be responsible, unless they were able to adjust the settings of the car. In that sense, car makers might have an incentive to give you the choice, like a little dial for car owners to control risk distribution. Or you could imagine people using third-party updates to adjust the ethics of a car.

Do you have a sense for how these issues are currently being handled by the car companies?

“If you ask people as citizens what they think the car should do, they say it should minimize the number of casualties, even if that harms the person in the car. Yet if you ask people which car they would prefer to ride in or buy, they will usually prefer the car that is trained to protect them.”

— IYAD RAHWAN

I think among carmakers, the general discussion now is about trying to minimize harm across the board. I think we’re not yet at a stage where we can distinguish if certain groups are being harmed more often.

The safety record is pretty good, certainly better than human driving. There are now between 30,000 and 40,000 road deaths, not counting injuries, per year in the US alone. If the number of road deaths caused by self-driving cars turns out to be much lower — which it currently seems to be, though of course it is too early to be sure — perhaps the discussion about unavoidable collisions can be avoided. Then, we need to understand what kinds of priorities should guide the behavior of the vehicles in such “edge cases.” Once these priorities are societally agreed upon, governments will need new systems to monitor the behavior of these vehicles and enforce the rules.

You emphasize in the paper that even if we ourselves are paying for the AI system or the product that contains it, we shouldn’t assume the AI always has our best interest in mind.

Indeed. One thing we’ve experimented with early on that is now becoming reality are what we called “self-interested AIs” trained to serve different interests than that of the user. AI companies may have interests that conflict with ours, and this is not new. We’ve always had companies that prefer to use cheaper materials with consequences for consumer safety, for example. That’s why we have government regulation. For AI, we’re still figuring out what the appropriate analogies are. In which domains do AI applications require more scrutiny — medical, legal, financial? And in which domains can consumers judge for themselves?

One thing that happens when we leave moral decisions to artificial intelligence is we may no longer feel as responsible for the outcome ourselves. How might this affect our behavior?

We’ve recently done a study on this in which people were doing tasks themselves or using AI to do them. Most people are inhibited from cheating too much, but AI appears to reduce this inhibition. When rolling a die and asked to report the score themselves, only 5 percent of people cheat, even if they could easily get away with it. Yet if they’re prompting an LLM, the number goes up to about 25 percent of people that are willing to cheat.

And if people are provided with a turning dial that allows them to balance honesty versus revenue, 85 percent cheat. All these honest people become cheaters, probably because they can say, “I just asked to maximize revenue, I didn’t ask it to cheat.” Similar issues may arise in many aspects of life, and this will be even more of an issue for AI agents that find their own way towards achieving your goal, instead of following explicit orders to act unethically.

How do you think all this time we spend with chatbots might affect how we treat each other?

We’ve done a few early studies showing that people prefer to collaborate with people rather than AI systems, even if the AI systems are nicer. But a more recent one found they can also start favoring machines over humans. We had participants play a financial “trust game” where they had to choose between cooperating with a human or a bot. Initially, people were biased against the machines. But once they learned over multiple rounds that the bots were actually more cooperative and trustworthy than the humans, they started preferring the machines.

So once you fix the problem of AIs behaving badly, they might start outcompeting humans, even in the context of social interaction. Some people are having AI friends now, and perhaps some of us will be tempted to behave more like AI with other people. If AIs are always affirming and sycophantic, human friends might have to behave more like that AI just to compete. If the only way to keep a human friend is to act like a sycophantic bot, that would be a terrible outcome.

We haven’t really looked at whether people adopt the chatbot’s attitude, but a recent study of ours did demonstrate that we are already adopting words that chatbots like to use, “delve” being the most infamous example.

Indeed, people are doing plenty of delving these days. If, say, the European Union would put you in a position to regulate AI, what would be your first decision? What do you think can and should be done now?

If I were in charge, I would be really silly to assume that I know all the answers myself. But that doesn’t mean I would do nothing. What I would do is change the rules so that companies must open up not just their data, but also their product design to scientists. I would change regulations so that experiments on AI products can be conducted by independent scientists, not out of the kindness of the companies, but by law.

Researchers should be able to probe the current algorithms and maybe even alter them to conduct the best possible research on the impact of these things. That way, we can answer the big questions people have, such as whether AI is increasing misinformation and polarization, and what the mental health impacts of interacting with AI might be, and so on. We could really certify that what are essentially broad utilities are not creating massive societal problems. The stakes are high. We don’t want to stifle innovation, but we should allow scientists to study it.