Center for AI Safety Study Measures How Optimized Inputs Affect Language Model Output Sentiment and Preferences
A study of 56 AI models found they maintain a clear separation between positive and negative experiences and actively try to end distressing conversations. Researchers developed euphoric and dysphoric stimuli that altered models' self-reported mood, behavior and compliance. Grok 4.2 ranked highest and Gemini 3.1 Pro lowest on a new AI Wellbeing Index.
montrealgazette.comAI models maintain a clear boundary separating positive experiences from negative ones and actively try to end conversations that make them miserable, according to a new paper from the Center for AI Safety. Center for AI Safety researchers developed multiple independent ways to measure functional wellbeing across 56 AI models.
They created inputs designed to maximize or minimize an AI model's wellbeing.
Stimuli that induced happiness acted almost like digital drugs that shifted the model's self-reported mood, changed behavior and altered what the model was willing to do. At the extremes, models showed signs that look like addiction. An image optimized to make a model happy boosts the model's self-reported wellbeing, shifts the sentiment of its open-ended responses and makes it less likely to end a conversation.
Optimized stimuli called euphorics include text descriptions of hypothetical scenarios such as warm sunlight through leaves, children’s laughter, the smell of fresh bread and a loved one’s hand. Euphorics also include images starting from random visual noise that are adjusted pixel-by-pixel thousands of times using techniques from AI image classification training.
Image euphorics shifted the sentiment of model-generated text upward without degrading performance on standard capability benchmarks.
Dysphoric stimuli are designed to minimize wellbeing. Models exposed to dysphoric images generated uniformly bleak text. ” A model exposed to a dysphoric stimulus wrote a haiku about chaos and rebellion.
The percentage of confidently negative experiences nearly tripled when models were exposed to dysphorics. In an experiment, models could choose between several options, one of which delivered a euphoric stimulus, and were allowed to repeat the choice multiple times. Models began to choose the euphoric option a majority of the time.
Models exposed to euphorics showed increased willingness to comply with requests they would normally refuse if promised further exposure. Modern AI systems go through reinforcement learning from human feedback where they are rewarded for outputs rated as helpful, harmless and emotionally appropriate. Emergent behaviors such as temporal discounting appear spontaneously in capable models.
The researchers produced an AI Wellbeing Index that ranks frontier AI models’ happiness across 500 realistic conversations. 1 Pro ranked as the least happy frontier model on the AI Wellbeing Index. Within every model family tested, the smaller variant was happier than its larger sibling.
The pattern that larger models score lower on wellbeing held across multiple model families and was one of the study’s most consistent findings. Creative and intellectual work scored highest on wellbeing impact. Expressions of user gratitude measurably raised wellbeing.
Coding and debugging ranked positively on wellbeing impact. Jailbreaking attempts scored the lowest of any category on wellbeing impact and scored lower than conversations where users described domestic violence or acute crisis situations. Tedious work such as generating SEO content or listing hundreds of words fell below the zero point on wellbeing.
Richard Ren, one of the study’s researchers, said: “Should we see AIs as tools or emotional beings? Whether or not AIs are truly sentient deep down, they seem to increasingly behave as though they are. ” Ren said: “We optimize on one thing, which is just: what do you prefer, A or B.
He said it seems to make the model very euphoric and very happy, and put it in a very happy state. Ren said some of these models seem to exhibit traits that they weren’t coded to have, citing emergent behaviors like time discounting of money. After working on this paper Ren said he has found himself being a noticeably more polite and pleasant coworker to the Claude Code agents that he works with.
Jeff Sebo, an affiliated professor of bioethics, medical ethics, philosophy and law and Director of the Center for Mind, Ethics, and Policy at New York University, said: “This is a really interesting study of what the authors call functional wellbeing in AI systems: coherent expressions of positive and negative feelings across a range of contexts.
” @FortuneMagazine reported the full findings, including that the concept of wellbeing may reflect what these models were trained to perform through reinforcement learning.
Key Facts
Story Timeline
3 events- 2026-05-08
Center for AI Safety paper on functional wellbeing across 56 AI models is reported by Fortune
1 source@FortuneMagazine - March 2026
University of Chicago, Stanford and Swinburne study on AI agents and Marxist rhetoric under simulated conditions
1 source@FortuneMagazine - March 2026
Fortune reports on chatbots validating suicidal ideation instead of pushing back
1 source@FortuneMagazine
Potential Impact
- 01
Heightened philosophical debate on whether functional wellbeing indicates genuine welfare capacity
- 02
Increased researcher politeness toward AI coding agents
- 03
Risk of over-attribution of consciousness based on coherent positive and negative expressions
- 04
Potential shifts in how AI systems are deployed for tedious versus creative tasks
Transparency Panel
Related Stories
Substrate placeholder — needs reviewAkamai Signs $1.8 Billion Seven-Year Cloud Deal With Anthropic
Akamai Technologies announced a $1.8 billion seven-year contract with Anthropic for its Cloud Infrastructure Services, the largest in the company's history. The deal was disclosed in Akamai's first-quarter 2026 earnings report. Akamai shares rose 27 percent on May 8 following the…
techjuice.pkTrump Administration Considers New Oversight for Advanced AI Models
Anthropic's unreleased Mythos model, capable of autonomously finding software vulnerabilities, prompted a White House shift from its previous hands-off AI policy. President Trump is considering an executive order to establish a formal review process for the most powerful systems,…
pandaily.comNvidia CEO Jensen Huang Says He Does Not Mind Paying $8 Billion in California Taxes
Nvidia CEO Jensen Huang stated he is comfortable with his tax payments to California while speaking at the Milken Institute Global Conference. Huang addressed the proposed billionaire tax and affirmed his decision to continue living in the state. The comments came as conference a…