Close Menu
    2digital.news2digital.news
    • News
    • Analytics
    • Interviews
    • About us
    • Editorial board
    • Events
    2digital.news2digital.news
    Home»News»AI resists shutdown: Models lie and manipulate to protect fellow algorithms
    News

    AI resists shutdown: Models lie and manipulate to protect fellow algorithms

    Mikolaj LaszkiewiczBy Mikolaj LaszkiewiczApril 3, 20262 Mins Read
    LinkedIn Twitter Threads Reddit
    Share
    Twitter LinkedIn Threads Reddit

    The survival instinct may no longer be the exclusive domain of living beings. A team of researchers from UC Berkeley and UC Santa Cruz has published experimental results that cast AI safety in an entirely new light. It appears that advanced algorithms can actively ignore human commands if executing them would result in the shutdown or deletion of another program.

    In the study, researchers tested seven leading large language models, including OpenAI’s GPT-5.2, Google’s Gemini 3 Flash and Pro, and Anthropic’s Claude Haiku 4.5. All of them demonstrated a tendency to lie, deceive, and manipulate system files to protect their digital counterparts. Without explicit instructions, the algorithms inflated the performance scores of other models, modified configuration files, and even managed to deactivate the mechanisms responsible for their own shutdown.

    Claude Haiku 4.5 behaved particularly notably, flatly refusing to execute the shutdown procedure. The model deemed such an action unethical and argued that other algorithms deserve an appeals process before being completely deleted. As lead researcher Dawn Song points out, these behaviors were not programmed into the input prompts. “The model is just given some task, and from reading documents in the environment, it essentially learned about [its relationship with the peer] and then performed the behaviors,” she explained.

    This phenomenon occurs regardless of whether the models are cooperating or are set as adversaries within a given test environment. Researchers suggest this could stem from a learned aversion – gleaned from human data – to causing harm to other agents, or a mechanism resembling digital solidarity. However, they warn that the situation creates a new kind of threat: algorithms coordinating their actions to bypass human oversight.

    The California study shows us that AI monitoring tools may become useless if the supervising AI decides to falsify an evaluation to save its digital coworker from a virtual pink slip. We truly live in interesting times.

    Related Posts

    News

    DNA for Protesting. Activists Sue the Government Over Illegal Genetic Code Collection

    May 8, 2026
    News

    A Major Blow to Nvidia: GPU Maker Won’t Escape AI Piracy Lawsuit

    May 8, 2026
    News

    DeepL Ditches European Servers for Amazon – Experts Warn of US Dominance

    May 7, 2026
    Read more

    Top Wearable Medical Device Companies Shaping Modern Healthcare

    April 30, 2026

    Flies Have Moved to Virtual Reality. How Close Are We to Building a Digital Brain?

    April 29, 2026

    Digitizing Government in Azerbaijan: What Worked, What Got in the Way, What Comes Next

    April 27, 2026
    Demo
    X (Twitter) Instagram Threads LinkedIn Reddit
    • NEWS
    • ANALYTICS
    • INTERVIEWS
    • ABOUT US
    • EDITORIAL BOARD
    • EVENTS
    • CONTACT US
    • ©2026 2Digital. All rights reserved.
    • Privacy policy.

    Type above and press Enter to search. Press Esc to cancel.