Close Menu
    2digital.news2digital.news
    • News
    • Analytics
    • Interviews
    • About us
    • Editorial board
    • Events
    2digital.news2digital.news
    Home»News»AI resists shutdown: Models lie and manipulate to protect fellow algorithms
    News

    AI resists shutdown: Models lie and manipulate to protect fellow algorithms

    Mikolaj LaszkiewiczBy Mikolaj LaszkiewiczApril 3, 20262 Mins Read
    LinkedIn Twitter Threads Reddit
    Share
    Twitter LinkedIn Threads Reddit

    The survival instinct may no longer be the exclusive domain of living beings. A team of researchers from UC Berkeley and UC Santa Cruz has published experimental results that cast AI safety in an entirely new light. It appears that advanced algorithms can actively ignore human commands if executing them would result in the shutdown or deletion of another program.

    In the study, researchers tested seven leading large language models, including OpenAI’s GPT-5.2, Google’s Gemini 3 Flash and Pro, and Anthropic’s Claude Haiku 4.5. All of them demonstrated a tendency to lie, deceive, and manipulate system files to protect their digital counterparts. Without explicit instructions, the algorithms inflated the performance scores of other models, modified configuration files, and even managed to deactivate the mechanisms responsible for their own shutdown.

    Claude Haiku 4.5 behaved particularly notably, flatly refusing to execute the shutdown procedure. The model deemed such an action unethical and argued that other algorithms deserve an appeals process before being completely deleted. As lead researcher Dawn Song points out, these behaviors were not programmed into the input prompts. “The model is just given some task, and from reading documents in the environment, it essentially learned about [its relationship with the peer] and then performed the behaviors,” she explained.

    This phenomenon occurs regardless of whether the models are cooperating or are set as adversaries within a given test environment. Researchers suggest this could stem from a learned aversion – gleaned from human data – to causing harm to other agents, or a mechanism resembling digital solidarity. However, they warn that the situation creates a new kind of threat: algorithms coordinating their actions to bypass human oversight.

    The California study shows us that AI monitoring tools may become useless if the supervising AI decides to falsify an evaluation to save its digital coworker from a virtual pink slip. We truly live in interesting times.

    Related Posts

    News

    The Era of Gemini 3.5 and a Total Search Revolution: Google I/O 2026 Recap

    May 20, 2026
    News

    Jail Time for Hiding Content Origins. South Korea Announces Strict Digital Watermark Law

    May 19, 2026
    News

    Our Brain Tricks Us Into Thinking AI Has No Doubts

    May 18, 2026
    Read more

    What Is Cloud Computing in Healthcare and How Is It Used?

    May 13, 2026

    The Security Perimeter Is Gone: How Zero Trust Is Changing Corporate Cybersecurity

    May 12, 2026

    IT Worker Migration in 2026. Where Tech Talent Is Moving and Why

    May 8, 2026
    Demo
    X (Twitter) Instagram Threads LinkedIn Reddit
    • NEWS
    • ANALYTICS
    • INTERVIEWS
    • ABOUT US
    • EDITORIAL BOARD
    • EVENTS
    • CONTACT US
    • ©2026 2Digital. All rights reserved.
    • Privacy policy.

    Type above and press Enter to search. Press Esc to cancel.