Close Menu
Beverly Hills Examiner

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    La Paciencia, No. 1 Latin Producer, on Bad Bunny & What’s Next

    December 31, 2025

    ‘Quiet luxury’ is coming for the housing market, The Corcoran Group CEO says

    December 31, 2025

    Trump Issues First Vetoes of Second Presidential Term

    December 31, 2025
    Facebook X (Twitter) Instagram
    Beverly Hills Examiner
    • Home
    • US News
    • Politics
    • Business
    • Science
    • Technology
    • Lifestyle
    • Music
    • Television
    • Film
    • Books
    • Contact
      • About
      • Amazon Disclaimer
      • DMCA / Copyrights Disclaimer
      • Terms and Conditions
      • Privacy Policy
    Beverly Hills Examiner
    Home»Technology»Meta’s benchmarks for its new AI models are a bit misleading
    Technology

    Meta’s benchmarks for its new AI models are a bit misleading

    By AdminApril 7, 2025
    Facebook Twitter Pinterest LinkedIn WhatsApp Email Reddit Telegram
    Meta’s benchmarks for its new AI models are a bit misleading


    One of the new flagship AI models Meta released on Saturday, Maverick, ranks second on LM Arena, a test that has human raters compare the outputs of models and choose which they prefer. But it seems the version of Maverick that Meta deployed to LM Arena differs from the version that’s widely available to developers.

    As several AI researchers pointed out on X, Meta noted in its announcement that the Maverick on LM Arena is an “experimental chat version.” A chart on the official Llama website, meanwhile, discloses that Meta’s LM Arena testing was conducted using “Llama 4 Maverick optimized for conversationality.”

    As we’ve written about before, for various reasons, LM Arena has never been the most reliable measure of an AI model’s performance. But AI companies generally haven’t customized or otherwise fine-tuned their models to score better on LM Arena — or haven’t admitted to doing so, at least.

    The problem with tailoring a model to a benchmark, withholding it, and then releasing a “vanilla” variant of that same model is that it makes it challenging for developers to predict exactly how well the model will perform in particular contexts. It’s also misleading. Ideally, benchmarks — woefully inadequate as they are — provide a snapshot of a single model’s strengths and weaknesses across a range of tasks.

    Indeed, researchers on X have observed stark differences in the behavior of the publicly downloadable Maverick compared with the model hosted on LM Arena. The LM Arena version seems to use a lot of emojis, and give incredibly long-winded answers.

    Okay Llama 4 is def a littled cooked lol, what is this yap city pic.twitter.com/y3GvhbVz65

    — Nathan Lambert (@natolambert) April 6, 2025

    for some reason, the Llama 4 model in Arena uses a lot more Emojis

    on together . ai, it seems better: pic.twitter.com/f74ODX4zTt

    — Tech Dev Notes (@techdevnotes) April 6, 2025

    We’ve reached out to Meta and Chatbot Arena, the organization that maintains LM Arena, for comment.





    Original Source Link

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp Email Reddit Telegram
    Previous ArticleScientists Are Mapping the Boundaries of What Is Knowable and Unknowable
    Next Article Lightning strikes disables Illinois family’s vehicle on highway, deputies say

    RELATED POSTS

    Factor Meal Delivery Promo: Free $200 Withings Body-Scan Scale

    December 31, 2025

    The phone is dead. Long live . . . what exactly?

    December 31, 2025

    Commodore 64 Ultimate Review: An Astonishing Remake

    December 30, 2025

    Meta just bought Manus, an AI startup everyone has been talking about

    December 30, 2025

    iMP Tech Mini Arcade Pro Review: A Nintendo Switch Arcade Cabinet

    December 29, 2025

    Sauron, the high-end home security startup for “super premium” customers, plucks a new CEO out of Sonos

    December 29, 2025
    latest posts

    La Paciencia, No. 1 Latin Producer, on Bad Bunny & What’s Next

    For Roberto Rosado, better known as La Paciencia, the key to his craft has always…

    ‘Quiet luxury’ is coming for the housing market, The Corcoran Group CEO says

    December 31, 2025

    Trump Issues First Vetoes of Second Presidential Term

    December 31, 2025

    Putin accuses Ukraine of drone attack on residence as peace talks falter

    December 31, 2025

    Factor Meal Delivery Promo: Free $200 Withings Body-Scan Scale

    December 31, 2025

    NASA Telescopes Capture Colliding Spiral Galaxies in Sparkling Detail

    December 31, 2025

    ARC Raiders’ Latest Exploit Is Giving Players Instant Kills

    December 31, 2025
    Categories
    • Books (969)
    • Business (5,877)
    • Film (5,811)
    • Lifestyle (3,914)
    • Music (5,879)
    • Politics (5,881)
    • Science (5,223)
    • Technology (5,810)
    • Television (5,496)
    • Uncategorized (2)
    • US News (5,862)
    popular posts

    Stranger Things Season 5 Episodes Will Be Shorter Than Season 4

    Stranger Things returned in a blaze of glory at the end of May. …

    Listen to Justin Timberlake’s New Song “Drown”

    February 25, 2024

    DocuSign to Cut About 6% of Jobs

    February 6, 2024

    The Space Force Is Launching Its Own Swarm of Tiny Satellites

    August 17, 2023
    Archives
    Browse By Category
    • Books (969)
    • Business (5,877)
    • Film (5,811)
    • Lifestyle (3,914)
    • Music (5,879)
    • Politics (5,881)
    • Science (5,223)
    • Technology (5,810)
    • Television (5,496)
    • Uncategorized (2)
    • US News (5,862)
    About Us

    We are a creativity led international team with a digital soul. Our work is a custom built by the storytellers and strategists with a flair for exploiting the latest advancements in media and technology.

    Most of all, we stand behind our ideas and believe in creativity as the most powerful force in business.

    What makes us Different

    We care. We collaborate. We do great work. And we do it with a smile, because we’re pretty damn excited to do what we do. If you would like details on what else we can do visit out Contact page.

    Our Picks

    NASA Telescopes Capture Colliding Spiral Galaxies in Sparkling Detail

    December 31, 2025

    ARC Raiders’ Latest Exploit Is Giving Players Instant Kills

    December 31, 2025

    Disorders From TLC Show Explained

    December 31, 2025
    © 2026 Beverly Hills Examiner. All rights reserved. All articles, images, product names, logos, and brands are property of their respective owners. All company, product and service names used in this website are for identification purposes only. Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Terms & Conditions and Privacy Policy.

    Type above and press Enter to search. Press Esc to cancel.

    We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
    Cookie SettingsAccept All
    Manage consent

    Privacy Overview

    This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
    Necessary
    Always Enabled
    Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
    CookieDurationDescription
    cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
    cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
    cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
    cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
    cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
    viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
    Functional
    Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
    Performance
    Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
    Analytics
    Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
    Advertisement
    Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
    Others
    Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
    SAVE & ACCEPT