Close Menu
    2digital.news2digital.news
    • News
    • Analytics
    • Interviews
    • About us
    • Editorial board
    • Events
    2digital.news2digital.news
    Home»News»AI resists shutdown: Models lie and manipulate to protect fellow algorithms
    News

    AI resists shutdown: Models lie and manipulate to protect fellow algorithms

    Mikolaj LaszkiewiczBy Mikolaj LaszkiewiczApril 3, 20262 Mins Read
    LinkedIn Twitter Threads Reddit
    Share
    Twitter LinkedIn Threads Reddit

    The survival instinct may no longer be the exclusive domain of living beings. A team of researchers from UC Berkeley and UC Santa Cruz has published experimental results that cast AI safety in an entirely new light. It appears that advanced algorithms can actively ignore human commands if executing them would result in the shutdown or deletion of another program.

    In the study, researchers tested seven leading large language models, including OpenAI’s GPT-5.2, Google’s Gemini 3 Flash and Pro, and Anthropic’s Claude Haiku 4.5. All of them demonstrated a tendency to lie, deceive, and manipulate system files to protect their digital counterparts. Without explicit instructions, the algorithms inflated the performance scores of other models, modified configuration files, and even managed to deactivate the mechanisms responsible for their own shutdown.

    Claude Haiku 4.5 behaved particularly notably, flatly refusing to execute the shutdown procedure. The model deemed such an action unethical and argued that other algorithms deserve an appeals process before being completely deleted. As lead researcher Dawn Song points out, these behaviors were not programmed into the input prompts. “The model is just given some task, and from reading documents in the environment, it essentially learned about [its relationship with the peer] and then performed the behaviors,” she explained.

    This phenomenon occurs regardless of whether the models are cooperating or are set as adversaries within a given test environment. Researchers suggest this could stem from a learned aversion – gleaned from human data – to causing harm to other agents, or a mechanism resembling digital solidarity. However, they warn that the situation creates a new kind of threat: algorithms coordinating their actions to bypass human oversight.

    The California study shows us that AI monitoring tools may become useless if the supervising AI decides to falsify an evaluation to save its digital coworker from a virtual pink slip. We truly live in interesting times.

    Related Posts

    News

    200-year-old optical trick revolutionizes encryption: Polish researchers break quantum security barriers

    April 2, 2026
    News

    Humans Return to the Moon for the First Time in Half a Century

    April 2, 2026
    Interviews

    The Role of the CFO in the AI Era

    April 2, 2026
    Read more

    The Information Bubble in 2026. Who Builds It and Why It Has Become More Dangerous

    March 27, 2026

    Ambient listening in healthcare. What are medical AI scribes?

    March 26, 2026

    Drone Warfare Is Changing the Rules. Scale, Integration, and Speed Are What Decide It

    March 24, 2026
    Stay in touch
    • Twitter
    • Instagram
    • LinkedIn
    • Threads
    • Reddit
    Demo
    X (Twitter) Instagram Threads LinkedIn Reddit
    • NEWS
    • ANALYTICS
    • INTERVIEWS
    • ABOUT US
    • EDITORIAL BOARD
    • EVENTS
    • CONTACT US
    • ©2026 2Digital. All rights reserved.
    • Privacy policy.

    Type above and press Enter to search. Press Esc to cancel.