The survival instinct may no longer be the exclusive domain of living beings. A team of researchers from UC Berkeley and UC Santa Cruz has published experimental results that cast AI safety in an entirely new light. It appears that advanced algorithms can actively ignore human commands if executing them would result in the shutdown or deletion of another program.
In the study, researchers tested seven leading large language models, including OpenAI’s GPT-5.2, Google’s Gemini 3 Flash and Pro, and Anthropic’s Claude Haiku 4.5. All of them demonstrated a tendency to lie, deceive, and manipulate system files to protect their digital counterparts. Without explicit instructions, the algorithms inflated the performance scores of other models, modified configuration files, and even managed to deactivate the mechanisms responsible for their own shutdown.
Claude Haiku 4.5 behaved particularly notably, flatly refusing to execute the shutdown procedure. The model deemed such an action unethical and argued that other algorithms deserve an appeals process before being completely deleted. As lead researcher Dawn Song points out, these behaviors were not programmed into the input prompts. “The model is just given some task, and from reading documents in the environment, it essentially learned about [its relationship with the peer] and then performed the behaviors,” she explained.
This phenomenon occurs regardless of whether the models are cooperating or are set as adversaries within a given test environment. Researchers suggest this could stem from a learned aversion – gleaned from human data – to causing harm to other agents, or a mechanism resembling digital solidarity. However, they warn that the situation creates a new kind of threat: algorithms coordinating their actions to bypass human oversight.
The California study shows us that AI monitoring tools may become useless if the supervising AI decides to falsify an evaluation to save its digital coworker from a virtual pink slip. We truly live in interesting times.

