Close Menu
Beverly Hills Examiner

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    David Byrne adds more UK and Ireland shows to summer 2026 ‘Who Is The Sky?’ world tour

    February 4, 2026

    Amazon AWS CEO Matt Garman pushes back against Elon Musk’s space data centers plan

    February 4, 2026

    Trump Just Made The Most Insane Statement About Elections In The History Of The Oval Office

    February 4, 2026
    Facebook X (Twitter) Instagram
    Beverly Hills Examiner
    • Home
    • US News
    • Politics
    • Business
    • Science
    • Technology
    • Lifestyle
    • Music
    • Television
    • Film
    • Books
    • Contact
      • About
      • Amazon Disclaimer
      • DMCA / Copyrights Disclaimer
      • Terms and Conditions
      • Privacy Policy
    Beverly Hills Examiner
    Home»Technology»Researchers suggest OpenAI trained AI models on paywalled O’Reilly books
    Technology

    Researchers suggest OpenAI trained AI models on paywalled O’Reilly books

    By AdminApril 2, 2025
    Facebook Twitter Pinterest LinkedIn WhatsApp Email Reddit Telegram
    Researchers suggest OpenAI trained AI models on paywalled O’Reilly books


    OpenAI has been accused by many parties of training its AI on copyrighted content sans permission. Now a new paper by an AI watchdog organization makes the serious accusation that the company increasingly relied on nonpublic books it didn’t license to train more sophisticated AI models.

    AI models are essentially complex prediction engines. Trained on a lot of data — books, movies, TV shows, and so on — they learn patterns and novel ways to extrapolate from a simple prompt. When a model “writes” an essay on a Greek tragedy or “draws” Ghibli-style images, it’s simply pulling from its vast knowledge to approximate. It isn’t arriving at anything new.

    While a number of AI labs, including OpenAI, have begun embracing AI-generated data to train AI as they exhaust real-world sources (mainly the public web), few have eschewed real-world data entirely. That’s likely because training on purely synthetic data comes with risks, like worsening a model’s performance.

    The new paper, out of the AI Disclosures Project, a nonprofit co-founded in 2024 by media mogul Tim O’Reilly and economist Ilan Strauss, draws the conclusion that OpenAI likely trained its GPT-4o model on paywalled books from O’Reilly Media. (O’Reilly is the CEO of O’Reilly Media.)

    In ChatGPT, GPT-4o is the default model. O’Reilly doesn’t have a licensing agreement with OpenAI, the paper says.

    “GPT-4o, OpenAI’s more recent and capable model, demonstrates strong recognition of paywalled O’Reilly book content … compared to OpenAI’s earlier model GPT-3.5 Turbo,” wrote the co-authors of the paper. “In contrast, GPT-3.5 Turbo shows greater relative recognition of publicly accessible O’Reilly book samples.”

    The paper used a method called DE-COP, first introduced in an academic paper in 2024, designed to detect copyrighted content in language models’ training data. Also known as a “membership inference attack,” the method tests whether a model can reliably distinguish human-authored texts from paraphrased, AI-generated versions of the same text. If it can, it suggests that the model might have prior knowledge of the text from its training data.

    The co-authors of the paper — O’Reilly, Strauss, and AI researcher Sruly Rosenblat — say that they probed GPT-4o, GPT-3.5 Turbo, and other OpenAI models’ knowledge of O’Reilly Media books published before and after their training cutoff dates. They used 13,962 paragraph excerpts from 34 O’Reilly books to estimate the probability that a particular excerpt had been included in a model’s training dataset.

    According to the results of the paper, GPT-4o “recognized” far more paywalled O’Reilly book content than OpenAI’s older models, including GPT-3.5 Turbo. That’s even after accounting for potential confounding factors, the authors said, like improvements in newer models’ ability to figure out whether text was human-authored.

    “GPT-4o [likely] recognizes, and so has prior knowledge of, many non-public O’Reilly books published prior to its training cutoff date,” wrote the co-authors.

    It isn’t a smoking gun, the co-authors are careful to note. They acknowledge that their experimental method isn’t foolproof and that OpenAI might’ve collected the paywalled book excerpts from users copying and pasting it into ChatGPT.

    Muddying the waters further, the co-authors didn’t evaluate OpenAI’s most recent collection of models, which includes GPT-4.5 and “reasoning” models such as o3-mini and o1. It’s possible that these models weren’t trained on paywalled O’Reilly book data or were trained on a lesser amount than GPT-4o.

    That being said, it’s no secret that OpenAI, which has advocated for looser restrictions around developing models using copyrighted data, has been seeking higher-quality training data for some time. The company has gone so far as to hire journalists to help fine-tune its models’ outputs. That’s a trend across the broader industry: AI companies recruiting experts in domains like science and physics to effectively have these experts feed their knowledge into AI systems.

    It should be noted that OpenAI pays for at least some of its training data. The company has licensing deals in place with news publishers, social networks, stock media libraries, and others. OpenAI also offers opt-out mechanisms — albeit imperfect ones — that allow copyright owners to flag content they’d prefer the company not use for training purposes.

    Still, as OpenAI battles several suits over its training data practices and treatment of copyright law in U.S. courts, the O’Reilly paper isn’t the most flattering look.

    OpenAI didn’t respond to a request for comment.



    Original Source Link

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp Email Reddit Telegram
    Previous ArticleNIH Director Removes Four Main Scientists amid Massive Staff Purge
    Next Article Maryland reparations bill advances, Gov. Wes Moore dodges questions on support

    RELATED POSTS

    Epstein-linked longevity guru Peter Attia leaves David Protein, and his own startup ‘won’t comment’

    February 4, 2026

    Upgrade Your Roku Before the Big Game

    February 3, 2026

    Fintech CEO and Forbes 30 Under 30 alum has been charged for alleged fraud

    February 3, 2026

    Dyson Deals: WIRED’s Top Pick Pet Vacuum and Purifier Heater

    February 2, 2026

    TikTok says its services are restored after the outage

    February 2, 2026

    Building a Watch Collection on a Budget? Here’s Where to Start (2026)

    February 1, 2026
    latest posts

    David Byrne adds more UK and Ireland shows to summer 2026 ‘Who Is The Sky?’ world tour

    David Byrne has added fresh UK and Ireland shows to his ‘Who Is The Sky?’…

    Amazon AWS CEO Matt Garman pushes back against Elon Musk’s space data centers plan

    February 4, 2026

    Trump Just Made The Most Insane Statement About Elections In The History Of The Oval Office

    February 4, 2026

    Judge restricts use of tear gas on protesters at Portland ICE facility

    February 4, 2026

    Epstein-linked longevity guru Peter Attia leaves David Protein, and his own startup ‘won’t comment’

    February 4, 2026

    NASA’s Artemis II moon mission engulfed by debate over its controversial heat shield

    February 4, 2026

    Twinless review – a twee showcase for actor Dylan…

    February 4, 2026
    Categories
    • Books (1,039)
    • Business (5,945)
    • Film (5,881)
    • Lifestyle (3,983)
    • Music (5,949)
    • Politics (5,950)
    • Science (5,292)
    • Technology (5,879)
    • Television (5,568)
    • Uncategorized (2)
    • US News (5,931)
    popular posts

    building on an abundant natural resource – Horizon Magazine Blog

    Forests in the EU can help green the European construction industry and bolster a continent-wide…

    US seeing “indications” that Russian forces are “trying to adjust” for impacts of HIMARS, official says

    July 23, 2022

    Guy Fieri’s Son Has Huge Ranch Wedding

    August 31, 2025

    Secret Service Will Decide If Trump Will Be Handcuffed After Indictment

    March 18, 2023
    Archives
    Browse By Category
    • Books (1,039)
    • Business (5,945)
    • Film (5,881)
    • Lifestyle (3,983)
    • Music (5,949)
    • Politics (5,950)
    • Science (5,292)
    • Technology (5,879)
    • Television (5,568)
    • Uncategorized (2)
    • US News (5,931)
    About Us

    We are a creativity led international team with a digital soul. Our work is a custom built by the storytellers and strategists with a flair for exploiting the latest advancements in media and technology.

    Most of all, we stand behind our ideas and believe in creativity as the most powerful force in business.

    What makes us Different

    We care. We collaborate. We do great work. And we do it with a smile, because we’re pretty damn excited to do what we do. If you would like details on what else we can do visit out Contact page.

    Our Picks

    NASA’s Artemis II moon mission engulfed by debate over its controversial heat shield

    February 4, 2026

    Twinless review – a twee showcase for actor Dylan…

    February 4, 2026

    ‘High Potential’ Boss Breaks Down Morgan’s Panic Attack and Comfort From Karadec Amid New Relationship (Exclusive)

    February 4, 2026
    © 2026 Beverly Hills Examiner. All rights reserved. All articles, images, product names, logos, and brands are property of their respective owners. All company, product and service names used in this website are for identification purposes only. Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Terms & Conditions and Privacy Policy.

    Type above and press Enter to search. Press Esc to cancel.

    We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
    Cookie SettingsAccept All
    Manage consent

    Privacy Overview

    This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
    Necessary
    Always Enabled
    Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
    CookieDurationDescription
    cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
    cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
    cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
    cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
    cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
    viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
    Functional
    Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
    Performance
    Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
    Analytics
    Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
    Advertisement
    Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
    Others
    Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
    SAVE & ACCEPT