Close Menu
Beverly Hills Examiner

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Jamie Raskin Sues Trump For Blocking Access To Federal Immigration Facilities

    July 31, 2025

    NFL Hall of Fame Game: What to know about Lions-Chargers matchup

    July 31, 2025

    8 Best Sexy Gifts for Lovers (2025)

    July 31, 2025
    Facebook X (Twitter) Instagram
    Beverly Hills Examiner
    • Home
    • US News
    • Politics
    • Business
    • Science
    • Technology
    • Lifestyle
    • Music
    • Television
    • Film
    • Books
    • Contact
      • About
      • Amazon Disclaimer
      • DMCA / Copyrights Disclaimer
      • Terms and Conditions
      • Privacy Policy
    Beverly Hills Examiner
    Home»Technology»A new, challenging AGI test stumps most AI models
    Technology

    A new, challenging AGI test stumps most AI models

    By AdminMarch 25, 2025
    Facebook Twitter Pinterest LinkedIn WhatsApp Email Reddit Telegram
    A new, challenging AGI test stumps most AI models


    The Arc Prize Foundation, a nonprofit co-founded by prominent AI researcher François Chollet, announced in a blog post on Monday that it has created a new, challenging test to measure the general intelligence of leading AI models.

    So far, the new test, called ARC-AGI-2, has stumped most models.

    “Reasoning” AI models like OpenAI’s o1-pro and DeepSeek’s R1 score between 1% and 1.3% on ARC-AGI-2, according to the Arc Prize leaderboard. Powerful non-reasoning models including GPT-4.5, Claude 3.7 Sonnet, and Gemini 2.0 Flash score around 1%.

    The ARC-AGI tests consist of puzzle-like problems where an AI has to identify visual patterns from a collection of different-colored squares, and generate the correct “answer” grid. The problems were designed to force an AI to adapt to new problems it hasn’t seen before.

    The Arc Prize Foundation had over 400 people take ARC-AGI-2 to establish a human baseline. On average, “panels” of these people got 60% of the test’s questions right — much better than any of the models’ scores.

    a sample question from Arc-AGI-2 (credit: Arc Prize).

    In a post on X, Chollet claimed ARC-AGI-2 is a better measure of an AI model’s actual intelligence than the first iteration of the test, ARC-AGI-1. The Arc Prize Foundation’s tests are aimed at evaluating whether an AI system can efficiently acquire new skills outside the data it was trained on.

    Chollet said that unlike ARC-AGI-1, the new test prevents AI models from relying on “brute force” — extensive computing power — to find solutions. Chollet previously acknowledged this was a major flaw of ARC-AGI-1.

    To address the first test’s flaws, ARC-AGI-2 introduces a new metric: efficiency. It also requires models to interpret patterns on the fly instead of relying on memorization.

    “Intelligence is not solely defined by the ability to solve problems or achieve high scores,” Arc Prize Foundation co-founder Greg Kamradt wrote in a blog post. “The efficiency with which those capabilities are acquired and deployed is a crucial, defining component. The core question being asked is not just, ‘Can AI acquire [the] skill to solve a task?’ but also, ‘At what efficiency or cost?’”

    ARC-AGI-1 was unbeaten for roughly five years until December 2024, when OpenAI released its advanced reasoning model, o3, which outperformed all other AI models and matched human performance on the evaluation. However, as we noted at the time, o3’s performance gains on ARC-AGI-1 came with a hefty price tag.

    The version of OpenAI’s o3 model — o3 (low) — that was first to reach new heights on ARC-AGI-1, scoring 75.7% on the test, got a measly 4% on ARC-AGI-2 using $200 worth of computing power per task.

    Comparison of Frontier AI model performance on ARC-AGI-1 and ARC-AGI-2 (credit: Arc Prize).

    The arrival of ARC-AGI-2 comes as many in the tech industry are calling for new, unsaturated benchmarks to measure AI progress. Hugging Face’s co-founder, Thomas Wolf, recently told TechCrunch that the AI industry lacks sufficient tests to measure the key traits of so-called artificial general intelligence, including creativity.

    Alongside the new benchmark, the Arc Prize Foundation announced a new Arc Prize 2025 contest, challenging developers to reach 85% accuracy on the ARC-AGI-2 test while only spending $0.42 per task.



    Original Source Link

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp Email Reddit Telegram
    Previous ArticleFloating wood could help us refreeze the Arctic seas
    Next Article Hunter Biden’s ex-business partner reveals conversation with Trump

    RELATED POSTS

    8 Best Sexy Gifts for Lovers (2025)

    July 31, 2025

    SpaceX faces two new lawsuits alleging safety‐related retaliation

    July 30, 2025

    The Hyperflexible People Who May Help Unlock Better Sleep Apnea Treatments

    July 30, 2025

    Luma and Runway expect robotics to eventually be a big revenue driver for them

    July 29, 2025

    The Real Demon Inside ChatGPT

    July 29, 2025

    Flexport sells former freight unicorn Convoy’s tech 2 years after buying it

    July 28, 2025
    latest posts

    Jamie Raskin Sues Trump For Blocking Access To Federal Immigration Facilities

    PoliticusUSA depends on the support of our readers. Please support us by becoming a subscriber.Democrats…

    NFL Hall of Fame Game: What to know about Lions-Chargers matchup

    July 31, 2025

    8 Best Sexy Gifts for Lovers (2025)

    July 31, 2025

    Scientists Say New Government Climate Report Twists Their Work

    July 31, 2025

    10 TV Shows That Made Netflix What It Is Today

    July 31, 2025

    Jensen Ackles Confirms Season 3 Return as Russell — See Him on Set

    July 31, 2025

    17 Easy One-Pot Dinner Recipes for Low-Effort Weeknight Meals

    July 31, 2025
    Categories
    • Books (669)
    • Business (5,575)
    • Film (5,511)
    • Lifestyle (3,618)
    • Music (5,565)
    • Politics (5,566)
    • Science (4,922)
    • Technology (5,509)
    • Television (5,187)
    • Uncategorized (1)
    • US News (5,562)
    popular posts

    Monkeypox: Could it become a pandemic? Here’s everything you need to know

    By Michael Le Page Monkeypox virus particles captured via a coloured transmission electron micrographUK HEALTH…

    How Deep Throat and Pleasure ask questions about safety on porn sets

    June 16, 2022

    American Kristen Faulkner authors stunning gold medal victory in women’s road race at Paris Olympics

    August 5, 2024

    Fox News Admits Durham’s Investigation Is Imploding After Acquittal of Key FBI Defendant

    October 19, 2022
    Archives
    Browse By Category
    • Books (669)
    • Business (5,575)
    • Film (5,511)
    • Lifestyle (3,618)
    • Music (5,565)
    • Politics (5,566)
    • Science (4,922)
    • Technology (5,509)
    • Television (5,187)
    • Uncategorized (1)
    • US News (5,562)
    About Us

    We are a creativity led international team with a digital soul. Our work is a custom built by the storytellers and strategists with a flair for exploiting the latest advancements in media and technology.

    Most of all, we stand behind our ideas and believe in creativity as the most powerful force in business.

    What makes us Different

    We care. We collaborate. We do great work. And we do it with a smile, because we’re pretty damn excited to do what we do. If you would like details on what else we can do visit out Contact page.

    Our Picks

    Jensen Ackles Confirms Season 3 Return as Russell — See Him on Set

    July 31, 2025

    17 Easy One-Pot Dinner Recipes for Low-Effort Weeknight Meals

    July 31, 2025

    The 2025 Lambda Literary Awards Finalists

    July 31, 2025
    © 2025 Beverly Hills Examiner. All rights reserved. All articles, images, product names, logos, and brands are property of their respective owners. All company, product and service names used in this website are for identification purposes only. Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Terms & Conditions and Privacy Policy.

    Type above and press Enter to search. Press Esc to cancel.

    We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
    Cookie SettingsAccept All
    Manage consent

    Privacy Overview

    This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
    Necessary
    Always Enabled
    Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
    CookieDurationDescription
    cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
    cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
    cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
    cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
    cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
    viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
    Functional
    Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
    Performance
    Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
    Analytics
    Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
    Advertisement
    Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
    Others
    Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
    SAVE & ACCEPT