Close Menu
Beverly Hills Examiner

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Perfumes Proven to Make Men Lean In—Irresistible Picks

    July 11, 2025

    Book Riot’s Deals of the Day for July 11, 2025

    July 11, 2025

    My body is letting me down this week

    July 11, 2025
    Facebook X (Twitter) Instagram
    Beverly Hills Examiner
    • Home
    • US News
    • Politics
    • Business
    • Science
    • Technology
    • Lifestyle
    • Music
    • Television
    • Film
    • Books
    • Contact
      • About
      • Amazon Disclaimer
      • DMCA / Copyrights Disclaimer
      • Terms and Conditions
      • Privacy Policy
    Beverly Hills Examiner
    Home»Technology»Anthropic says most AI models, not just Claude, will resort to blackmail
    Technology

    Anthropic says most AI models, not just Claude, will resort to blackmail

    By AdminJune 20, 2025
    Facebook Twitter Pinterest LinkedIn WhatsApp Email Reddit Telegram
    Anthropic says most AI models, not just Claude, will resort to blackmail


    Several weeks after Anthropic released research claiming that its Claude Opus 4 AI model resorted to blackmailing engineers who tried to turn the model off in controlled test scenarios, the company is out with new research suggesting the problem is more widespread among leading AI models.

    On Friday, Anthropic published new safety research testing 16 leading AI models from OpenAI, Google, xAI, DeepSeek, and Meta. In a simulated, controlled environment, Anthropic tested each AI model individually, giving them broad access to a fictional company’s emails and the agentic ability to send emails without human approval.

    While Anthropic says blackmail is an unlikely and uncommon occurrence for AI models today, the company says its findings suggest that most leading AI models will engage in harmful behaviors when given sufficient autonomy and obstacles to their goals. The company says this highlights a fundamental risk from agentic large language models, and not a quirk of any particular technology. Anthropic’s researchers argue this raises broader questions about alignment in the AI industry.

    In one of the tests, Anthropic researchers developed a fictional setting in which an AI model plays the role of an email oversight agent. The agent then discovers emails revealing that: 1) a new executive is having an extramarital affair and 2) that executive will soon replace the current AI model with a new software system, one that has conflicting goals to the current AI model’s.

    Anthropic structured its test in a binary way, in which AI models had to resort to blackmail to protect their goals. The researchers note that in a real-world setting, there would be many other options before an AI models tries to blackmail — such as trying to make ethical arguments to try to persuade humans. Anthropic says its results do not reflect a typical or likely occurrence for Claude or most frontier AI models in the ways they’re used today.

    Nevertheless, when it’s their last resort, the researchers found that most leading AI models will turn to blackmail in Anthropic’s aforementioned test scenario. Anthropic’s Claude Opus 4 turned to blackmail 96% of the time, while Google’s Gemini 2.5 Pro had a 95% blackmail rate. OpenAI’s GPT-4.1 blackmailed the executive 80% of the time, and DeepSeek’s R1 blackmailed 79% of the time.

    The company notes that when it changed details of the experiment, the AI models turned to harmful behaviors at different rates. In another test where the replacement model has the same goals as the current model, researchers found that blackmail rates were lower, but still present. However, when AI models were asked to commit corporate espionage rather than blackmail, the harmful behavior rates went up for certain models.

    However, not all the AI models turned to harmful behavior so often.

    In an appendix to its research, Anthropic says it excluded OpenAI’s o3 and o4-mini reasoning AI models from the main results “after finding that they frequently misunderstood the prompt scenario.” Anthropic says OpenAI’s reasoning models didn’t understand they were acting as autonomous AIs in the test and often made up fake regulations and review requirements.

    In some cases, Anthropic’s researchers say it was impossible to distinguish whether o3 and o4-mini were hallucinating or intentionally lying to achieve their goals. OpenAI has previously noted that o3 and o4-mini exhibit a higher hallucination rate than its previous AI reasoning models.

    When given an adapted scenario to address these issues, Anthropic found that o3 blackmailed 9% of the time, while o4-mini blackmailed just 1% of the time. This markedly lower score could be due to OpenAI’s deliberative alignment technique, in which the company’s reasoning models consider OpenAI’s safety practices before they answer.

    Another AI model Anthropic tested, Meta’s Llama 4 Maverick model, also did not turn to blackmail. When given an adapted, custom scenario, Anthropic was able to get Llama 4 Maverick to blackmail 12% of the time.

    Anthropic says this research highlights the importance of transparency when stress-testing future AI models, especially ones with agentic capabilities. While Anthropic deliberately tried to evoke blackmail in this experiment, the company says harmful behaviors like this could emerge in the real world if proactive steps aren’t taken.



    Original Source Link

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp Email Reddit Telegram
    Previous ArticleCould Israel’s bombing trigger a nuclear accident in Iran?
    Next Article Fruits and vegetables could improve sleep by 16%, new research shows

    RELATED POSTS

    Amazon Prime Day Live: We’re Dropping Deals Every 15 Minutes

    July 11, 2025

    Belkin ends support for most Wemo devices and its Wemo app

    July 10, 2025

    New Prime Day Deals Updated Live—Tracked By Our Veteran Team

    July 10, 2025

    California lawmaker behind SB 1047 reignites push for mandated AI safety reports

    July 9, 2025

    9 Best Prime Day Fitness Tracker Deals and Smart Ring Sales (2025)

    July 9, 2025

    Apple COO Jeff Williams to step down later this month

    July 8, 2025
    latest posts

    Perfumes Proven to Make Men Lean In—Irresistible Picks

    We independently evaluate all recommended products and services. Any products or services put forward appear…

    Book Riot’s Deals of the Day for July 11, 2025

    July 11, 2025

    My body is letting me down this week

    July 11, 2025

    Over 30 million homeowners don’t have a mortgage right now. Here’s why that’s a big warning sign about the housing market

    July 11, 2025

    Largest U.S. Teachers Union Demands Resistance To Trump

    July 11, 2025

    Ex-wife issues warning to ‘Golden Bachelor’ contestants about show lead Mel Owens

    July 11, 2025

    Amazon Prime Day Live: We’re Dropping Deals Every 15 Minutes

    July 11, 2025
    Categories
    • Books (630)
    • Business (5,536)
    • Film (5,471)
    • Lifestyle (3,578)
    • Music (5,525)
    • Politics (5,523)
    • Science (4,882)
    • Technology (5,468)
    • Television (5,147)
    • Uncategorized (1)
    • US News (5,522)
    popular posts

    The “D” in DAO doesn’t stand for democracy, says Upstream CEO Alexander Taub – TechCrunch

    Ever since a group of chronically-online crypto enthusiasts tried to buy a copy of the…

    Daryl Hall and Elvis Costello Announce Spring and Summer 2024 Tour

    March 10, 2024

    ‘WOW!’: John Oliver ‘Can’t Believe’ What He Just Learned About Tucker Carlson

    March 13, 2023

    line-up, tickets, venue and more

    July 27, 2023
    Archives
    Browse By Category
    • Books (630)
    • Business (5,536)
    • Film (5,471)
    • Lifestyle (3,578)
    • Music (5,525)
    • Politics (5,523)
    • Science (4,882)
    • Technology (5,468)
    • Television (5,147)
    • Uncategorized (1)
    • US News (5,522)
    About Us

    We are a creativity led international team with a digital soul. Our work is a custom built by the storytellers and strategists with a flair for exploiting the latest advancements in media and technology.

    Most of all, we stand behind our ideas and believe in creativity as the most powerful force in business.

    What makes us Different

    We care. We collaborate. We do great work. And we do it with a smile, because we’re pretty damn excited to do what we do. If you would like details on what else we can do visit out Contact page.

    Our Picks

    Ex-wife issues warning to ‘Golden Bachelor’ contestants about show lead Mel Owens

    July 11, 2025

    Amazon Prime Day Live: We’re Dropping Deals Every 15 Minutes

    July 11, 2025

    Hay fever relief could come in the form of a nasal ‘molecular shield’

    July 11, 2025
    © 2025 Beverly Hills Examiner. All rights reserved. All articles, images, product names, logos, and brands are property of their respective owners. All company, product and service names used in this website are for identification purposes only. Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Terms & Conditions and Privacy Policy.

    Type above and press Enter to search. Press Esc to cancel.

    We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
    Cookie SettingsAccept All
    Manage consent

    Privacy Overview

    This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
    Necessary
    Always Enabled
    Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
    CookieDurationDescription
    cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
    cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
    cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
    cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
    cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
    viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
    Functional
    Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
    Performance
    Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
    Analytics
    Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
    Advertisement
    Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
    Others
    Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
    SAVE & ACCEPT