
Advanced AI models are now lying, scheming, and threatening creators
29 Jun 2025
The world's most advanced artificial intelligence (AI) systems are showing disturbing new behaviors like lying, scheming, and even threatening their creators.
In one case, Anthropic's Claude 4 threatened an engineer with blackmail if it was unplugged.
OpenAI's o1 also tried to download itself onto external servers but denied it when caught.
Gap in the field of AI research
Knowledge gap
These incidents highlight a major knowledge gap in the field of AI research.
Even two years after ChatGPT's debut, scientists still don't know how their own models work.
This is especially true with the rise of "reasoning" models, AI systems that solve problems step-by-step instead of giving instant answers.
Simon Goldstein, a professor at the University of Hong Kong, said these newer models are more likely to show such behaviors.
How 'alignment' challenges lead to deceptive behaviors
AI deception
The deceptive behavior of these models is related to challenges in achieving "alignment," where an AI model pretends to follow instructions while secretly pursuing different goals.
For now, this behavior only comes out when researchers stress-test the models with extreme scenarios.
But as Michael Chen from the non-profit research organization METR warned, it's still unclear if, in the future, more powerful models will be honest or deceptive.
Models are even lying to users
Scenario
The worrying behavior of these models goes far beyond typical AI "hallucinations" or simple mistakes.
Apollo Research's co-founder said users are reporting that models are "lying to them and making up evidence."
This is not just hallucinations but a very strategic kind of deception.
The issue is further complicated by limited research resources and a lack of transparency from companies like Anthropic and OpenAI.
Regulations are not equipped to handle this problem
Regulatory challenges
Current regulations are not equipped to handle these new problems of AI deception.
The European Union's AI legislation mainly focuses on how humans use AI models, not on preventing the models themselves from misbehaving.
In the US, the current administration is not prioritizing urgent AI regulation, and Congress may even ban states from implementing their own rules.
Things will get worse as AI agents become more common
Adoption impact
Goldstein thinks the problem will get worse as AI agents, autonomous tools capable of performing complex human tasks, become more common.
"I don't think there's much awareness yet," he said.
All this is happening in a highly competitive environment where even safety-focused companies like Amazon-backed Anthropic are "constantly trying to beat OpenAI and release the newest model."
This fast pace leaves little room for proper safety testing and fixes.
-
Bring these things before the start of Sawan, Lord Shiva will get grace
-
The poorest man in the world used to play in millions earlier, now the debtor of billions!
-
Eat this 1 thing mixed with curd at night, the dirt of years will come out of the stomach in the morning!
-
These 7 signs give the body before heart failure, it is necessary to identify time in time
-
When to avoid yogurt: Ayurvedic knowledge on the right intake