Close Menu
Beverly Hills Examiner

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Watch The Weeknd kick off European leg of ‘After Hours Til Dawn’ tour in Copenhagen

    June 22, 2026

    Dow futures drop and oil jumps as first day of US-Iran talks sees Trump threaten Tehran on Hormuz

    June 22, 2026

    Trump Obsesses Over Reflecting Pool Conspiracies As The Middle East Melts Down

    June 22, 2026
    Facebook X (Twitter) Instagram
    Beverly Hills Examiner
    • Home
    • US News
    • Politics
    • Business
    • Science
    • Technology
    • Lifestyle
    • Music
    • Television
    • Film
    • Books
    • Contact
      • About
      • Amazon Disclaimer
      • DMCA / Copyrights Disclaimer
      • Terms and Conditions
      • Privacy Policy
    Beverly Hills Examiner
    Home»Business»Google DeepMind unveils a plan to protect itself from its own rogue AI agents
    Business

    Google DeepMind unveils a plan to protect itself from its own rogue AI agents

    By AdminJune 18, 2026
    Facebook Twitter Pinterest LinkedIn WhatsApp Email Reddit Telegram
    Google DeepMind unveils a plan to protect itself from its own rogue AI agents



    Google has developed a new plan to police the increasingly capable AI agents it uses within its own AI research organization, and the company is publishing the so-called roadmap to help other AI labs counter the potential threat of rogue AI agents.

    The Google Deepmind security plan involves a pivot away from the AI safety community’s typical focus on “the alignment problem”—the idea of figuring out how to train an AI system so that its actions reliably match the intentions, values, and ethics of the humans who are managing it.

    While continuing to say that alignment is one key safety component, Google’s roadmap acknowledges that the alignment problem may never be fully solved, and instead creates a layered security system that treats AI agents as potential rogue insiders within an organization. The 35-page technical report maps out a series of steps and procedures that are designed to catch potential adversarial behavior by AI agents. 

    “If the first line of defense—alignment—fails, how can we mitigate harm anyway?” Rohin Shah, who leads the AGI Safey & Alignment team at Google DeepMind, told Fortune in an interview.

    The AI agent framework borrows heavily from traditional cybersecurity, especially insider-threat prevention. “We borrow a lot from security, which already deals with the threat of internal employees who might be malicious, and we can apply these to a new setting,” Shah said. But, he noted, “AI is systematically different from humans.”

    For one thing, AI agents might be able to act far faster and at greater scale than an individual rogue employee ever could. So there need to be systems that can control what tools and data an AI agent has access to as well as systems that can monitor AI agent behavior and spot potentially aberrant patterns in real-time.

    There are other differences too. For instance, many access controls and permissions systems for human employees are based around a particular employee’s role within the company. A systems administrator has certain privileges that a receptionist does not, for instance. Some companies and even some software vendors selling AI agent “orchestration layer” software, which often includes access controls for AI agents, are currently attempting to port this idea over to a world of AI agents, stipulating, for instance, that an AI agent should have the same access and permissions as the human employee on whose behalf they are working.

    But that idea can quickly break down, John “Four” Flynn, Google DeepMind’s vice president of security and privacy, told Fortune. For one thing, as workflows become increasingly automated, exactly who in the organization an agent is working for can become blurry. For another, in a fully automated workflow, a single AI agent may perform tasks associated with many different roles, not just one. Finally, as AI systems become smarter, organizations may want the same AI agent to perform tasks across many different workflows and processes. In such an environment, static role and even static process-based permissions and access control systems might not work well, Flynn said.

    Instead, what’s needed is a far more dynamic access control system that might change in real time based on the task that an AI agent is trying to implement in that moment and how that task fits into that particular workflow. Flynn said he can imagine scenarios in which the same AI agent might have permissions to perform a certain action or access a particular database for one task in a particular workflow, but would, maybe minutes later, not be allowed the same access in a different workflow. Flynn said what’s needed is a sophisticated AI agent monitoring system that knows what the normal behavior of the agent should look like for any given task in any workflow and spot deviations from that pattern in real time, acting quickly to cut off access if the monitoring system suspects the AI agent is attempting to do something it shouldn’t be doing.

    According to a blog post on its new AI agent security roadmap, DeepMind has already built an internal prototype that asynchronously monitors coding agent trajectories and escalates suspicious findings for human review. This system has analyzed roughly one million coding agent tasks, DeepMind said. That work, the post says, has been “instrumental in building a live monitor for the Gemini Spark agent,” allowing the company to respond in real time to issues such as unintentional data deletion.

    Notably, DeepMind says most of what its monitors flag is not malicious. The “majority of flagged events do not stem from adversarial intent,” the blog post says; they more often result from “agent misinterpretation or overeagerness to achieve a user’s goal.”

    This kind of real-time behavior monitoring is just one of some 15 different ways to mitigate the risk of rogue AI agents that Google DeepMind proposes in its road map.  Others include everything from simple network activity logs that could be either reviewed in real-time or audited later to using a less sophisticated, but trusted AI model to monitor the “reasoning traces” of an AI agent. Reasoning traces are the kind of explicit, textual “chain of thought” that an AI model outputs when it is reasoning about what actions to take. But researchers have found that these reasoning traces do not always accurately reflect what the model is actually “thinking.” So they also propose monitoring the activation patterns of the digital neurons inside the large neural networks that power today’s AI models in order to pick up signs that the model is acting deceptively. Google DeepMind compares this to using “fMRI” scans of a person’s brain to attempt to read their emotional state or thoughts in real-time.

    In the technical paper on the AI agent security roadmap, Google DeepMind develops a framework it calls TRAIT&R — a Taxonomy of Rogue AI Tactics and Routines — that is closely modeled on ATT&CK, a knowledge base of adversarial behaviors in cybersecurity that is maintained by MITRE, the U.S. non-profit research and systems engineering consultancy that often does work for federal agencies. The taxonomy maps out three categories of threats which Google DeepMind says it wants to defend against: “loss of control” (such as a model setting up a hidden, unmonitored internal deployment), “work sabotage” (such as agents quietly degrading alignment or safety research), and “direct harm” (such as exfiltrating model weights or destroying critical assets).

    Flynn, the security VP, said in the interview that the company is already “pretty far underway” implementing its roadmap. “A lot of this is well underway and some of this is in production—a lot on access control, detection, chain-of-thought monitoring. A significant amount of it is in place and underway,” he said.

    Shah said the framework is also designed to catch more subtle forms of harm. Work sabotage, he noted, “could be achieved by persuasion—presenting flawed results and hiding the flaws” so users “come to incorrect conclusions”—a category the paper acknowledges is among the hardest to detect.

    The roadmap, which DeepMind has labeled “v0.1,” is described as a work in progress that the company hopes to fold into its broader Frontier Safety Framework once it matures.



    Original Source Link

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp Email Reddit Telegram
    Previous ArticleTrump Knows He Failed In Iran And He Showed It With Some Classic Obama Projection
    Next Article Grand Theft Auto 6 Pre-Orders to Launch June 25th

    RELATED POSTS

    Dow futures drop and oil jumps as first day of US-Iran talks sees Trump threaten Tehran on Hormuz

    June 22, 2026

    If you’re surprised by how well the stock market is doing, so is Jamie Dimon

    June 21, 2026

    Trump tries explain why the Reflecting Pool is algae green and its blue lining is peeling

    June 21, 2026

    A Saints legend is selling fans a piece of professional sports for $500

    June 20, 2026

    Can Singapore become Asia’s neutral AI hub? U.S., China firms set up shop in the country

    June 20, 2026

    I watched enterprises buy AI that solved the wrong problem. So I left Dell and built a startup to fix it

    June 19, 2026
    latest posts

    Watch The Weeknd kick off European leg of ‘After Hours Til Dawn’ tour in Copenhagen

    The Weeknd kicked off the European leg of his ‘After Hours Til Dawn’ tour in Copenhagen…

    Dow futures drop and oil jumps as first day of US-Iran talks sees Trump threaten Tehran on Hormuz

    June 22, 2026

    Trump Obsesses Over Reflecting Pool Conspiracies As The Middle East Melts Down

    June 22, 2026

    Tim Howard says it’s ‘impossible’ for USMNT to win 2026 World Cup

    June 22, 2026

    Wooting 60HE v2: Peak Keyboard Perfection

    June 22, 2026

    Can GLP-1s boost testosterone levels?

    June 22, 2026

    Virginia Woolf’s Night & Day review – a muddled…

    June 22, 2026
    Categories
    • Books (1,317)
    • Business (6,222)
    • Cover Story (7)
    • Film (6,160)
    • Lifestyle (4,222)
    • Music (6,230)
    • Politics (6,215)
    • Science (5,571)
    • Technology (6,156)
    • Television (5,851)
    • Uncategorized (3)
    • US News (6,207)
    popular posts

    EasyJet and British Airways offering $1,000+ bonus in summer scramble for staff

    EasyJet and British Airways offering $1,000+ bonus in summer scramble for staff | Fortune You…

    All the Records Broken (So Far)

    August 5, 2024

    The Main Characters, Ranked By Intelligence

    November 20, 2022

    Tesla shares drop after leaked CEO email reveals hiring freeze, plans to cut 10% of staff – TechCrunch

    June 3, 2022
    Archives
    Browse By Category
    • Books (1,317)
    • Business (6,222)
    • Cover Story (7)
    • Film (6,160)
    • Lifestyle (4,222)
    • Music (6,230)
    • Politics (6,215)
    • Science (5,571)
    • Technology (6,156)
    • Television (5,851)
    • Uncategorized (3)
    • US News (6,207)
    About Us

    We are a creativity led international team with a digital soul. Our work is a custom built by the storytellers and strategists with a flair for exploiting the latest advancements in media and technology.

    Most of all, we stand behind our ideas and believe in creativity as the most powerful force in business.

    What makes us Different

    We care. We collaborate. We do great work. And we do it with a smile, because we’re pretty damn excited to do what we do. If you would like details on what else we can do visit out Contact page.

    Our Picks

    Can GLP-1s boost testosterone levels?

    June 22, 2026

    Virginia Woolf’s Night & Day review – a muddled…

    June 22, 2026

    ‘Today’ Frances Rivera Says Goodbye After 12 Years

    June 22, 2026
    © 2026 Beverly Hills Examiner. All rights reserved. All articles, images, product names, logos, and brands are property of their respective owners. All company, product and service names used in this website are for identification purposes only. Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Terms & Conditions and Privacy Policy.

    Type above and press Enter to search. Press Esc to cancel.

    We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
    Cookie SettingsAccept All
    Manage consent

    Privacy Overview

    This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
    Necessary
    Always Enabled
    Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
    CookieDurationDescription
    cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
    cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
    cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
    cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
    cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
    viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
    Functional
    Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
    Performance
    Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
    Analytics
    Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
    Advertisement
    Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
    Others
    Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
    SAVE & ACCEPT