Close Menu
Beverly Hills Examiner

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Bring on 2027 I mean 2026 I mean Happy Easter

    December 31, 2025

    ‘I opened her door and the wind caught me, and I went flying’: The U.S. Arctic air surge is sweeping northerners off their feet

    December 31, 2025

    The Trump Regime Threatens Artists As The Kennedy Center Will Be Empty On New Year’s Eve

    December 31, 2025
    Facebook X (Twitter) Instagram
    Beverly Hills Examiner
    • Home
    • US News
    • Politics
    • Business
    • Science
    • Technology
    • Lifestyle
    • Music
    • Television
    • Film
    • Books
    • Contact
      • About
      • Amazon Disclaimer
      • DMCA / Copyrights Disclaimer
      • Terms and Conditions
      • Privacy Policy
    Beverly Hills Examiner
    Home»Science»Distillation Can Make AI Models Smaller and Cheaper
    Science

    Distillation Can Make AI Models Smaller and Cheaper

    By AdminSeptember 21, 2025
    Facebook Twitter Pinterest LinkedIn WhatsApp Email Reddit Telegram
    Distillation Can Make AI Models Smaller and Cheaper


    The original version of this story appeared in Quanta Magazine.

    The Chinese AI company DeepSeek released a chatbot earlier this year called R1, which drew a huge amount of attention. Most of it focused on the fact that a relatively small and unknown company said it had built a chatbot that rivaled the performance of those from the world’s most famous AI companies, but using a fraction of the computer power and cost. As a result, the stocks of many Western tech companies plummeted; Nvidia, which sells the chips that run leading AI models, lost more stock value in a single day than any company in history.

    Some of that attention involved an element of accusation. Sources alleged that DeepSeek had obtained, without permission, knowledge from OpenAI’s proprietary o1 model by using a technique known as distillation. Much of the news coverage framed this possibility as a shock to the AI industry, implying that DeepSeek had discovered a new, more efficient way to build AI.

    But distillation, also called knowledge distillation, is a widely used tool in AI, a subject of computer science research going back a decade and a tool that big tech companies use on their own models. “Distillation is one of the most important tools that companies have today to make models more efficient,” said Enric Boix-Adsera, a researcher who studies distillation at the University of Pennsylvania’s Wharton School.

    Dark Knowledge

    The idea for distillation began with a 2015 paper by three researchers at Google, including Geoffrey Hinton, the so-called godfather of AI and a 2024 Nobel laureate. At the time, researchers often ran ensembles of models—“many models glued together,” said Oriol Vinyals, a principal scientist at Google DeepMind and one of the paper’s authors—to improve their performance. “But it was incredibly cumbersome and expensive to run all the models in parallel,” Vinyals said. “We were intrigued with the idea of distilling that onto a single model.”

    “Distillation is one of the most important tools that companies have today to make models more efficient.”

    Enric Boix-Adsera

    The researchers thought they might make progress by addressing a notable weak point in machine-learning algorithms: Wrong answers were all considered equally bad, regardless of how wrong they might be. In an image-classification model, for instance, “confusing a dog with a fox was penalized the same way as confusing a dog with a pizza,” Vinyals said. The researchers suspected that the ensemble models did contain information about which wrong answers were less bad than others. Perhaps a smaller “student” model could use the information from the large “teacher” model to more quickly grasp the categories it was supposed to sort pictures into. Hinton called this “dark knowledge,” invoking an analogy with cosmological dark matter.

    After discussing this possibility with Hinton, Vinyals developed a way to get the large teacher model to pass more information about the image categories to a smaller student model. The key was homing in on “soft targets” in the teacher model—where it assigns probabilities to each possibility, rather than firm this-or-that answers. One model, for example, calculated that there was a 30 percent chance that an image showed a dog, 20 percent that it showed a cat, 5 percent that it showed a cow, and 0.5 percent that it showed a car. By using these probabilities, the teacher model effectively revealed to the student that dogs are quite similar to cats, not so different from cows, and quite distinct from cars. The researchers found that this information would help the student learn how to identify images of dogs, cats, cows, and cars more efficiently. A big, complicated model could be reduced to a leaner one with barely any loss of accuracy.

    Explosive Growth

    The idea was not an immediate hit. The paper was rejected from a conference, and Vinyals, discouraged, turned to other topics. But distillation arrived at an important moment. Around this time, engineers were discovering that the more training data they fed into neural networks, the more effective those networks became. The size of models soon exploded, as did their capabilities, but the costs of running them climbed in step with their size.

    Many researchers turned to distillation as a way to make smaller models. In 2018, for instance, Google researchers unveiled a powerful language model called BERT, which the company soon began using to help parse billions of web searches. But BERT was big and costly to run, so the next year, other developers distilled a smaller version sensibly named DistilBERT, which became widely used in business and research. Distillation gradually became ubiquitous, and it’s now offered as a service by companies such as Google, OpenAI, and Amazon. The original distillation paper, still published only on the arxiv.org preprint server, has now been cited more than 25,000 times.

    Considering that the distillation requires access to the innards of the teacher model, it’s not possible for a third party to sneakily distill data from a closed-source model like OpenAI’s o1, as DeepSeek was thought to have done. That said, a student model could still learn quite a bit from a teacher model just through prompting the teacher with certain questions and using the answers to train its own models—an almost Socratic approach to distillation.

    Meanwhile, other researchers continue to find new applications. In January, the NovaSky lab at UC Berkeley showed that distillation works well for training chain-of-thought reasoning models, which use multistep “thinking” to better answer complicated questions. The lab says its fully open source Sky-T1 model cost less than $450 to train, and it achieved similar results to a much larger open source model. “We were genuinely surprised by how well distillation worked in this setting,” said Dacheng Li, a Berkeley doctoral student and co-student lead of the NovaSky team. “Distillation is a fundamental technique in AI.”


    Original story reprinted with permission from Quanta Magazine, an editorially independent publication of the Simons Foundation whose mission is to enhance public understanding of science by covering research developments and trends in mathematics and the physical and life sciences.



    Original Source Link

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp Email Reddit Telegram
    Previous ArticleA film feast for the senses in the Italian Alps
    Next Article Best Bamboo Sheets (2025): WIRED’s Brand-New Top Pick

    RELATED POSTS

    Star that seemed to vanish more than 130 years ago is found again

    December 31, 2025

    The Great Big Power Play

    December 30, 2025

    15 Million Years before the Megalodon, This Giant Ancient Shark Prowled the Oceans

    December 30, 2025

    Mathematicians unified key laws of physics in 2025

    December 29, 2025

    People Who Drink Bottled Water on a Daily Basis Ingest 90,000 More Microplastic Particles Each Year

    December 29, 2025

    Why Active Rest Is Important During the Holidays

    December 28, 2025
    latest posts

    Bring on 2027 I mean 2026 I mean Happy Easter

    Liam Gallagher has stoked rumours by teasing possible Oasis activity for next year. Earlier this month, the frontman appeared to confirm that the band would…

    ‘I opened her door and the wind caught me, and I went flying’: The U.S. Arctic air surge is sweeping northerners off their feet

    December 31, 2025

    The Trump Regime Threatens Artists As The Kennedy Center Will Be Empty On New Year’s Eve

    December 31, 2025

    Treat yourself: Save up to 50% on tech from Apple, Bose and more

    December 31, 2025

    The phone is dead. Long live . . . what exactly?

    December 31, 2025

    Star that seemed to vanish more than 130 years ago is found again

    December 31, 2025

    Bowie: The Final Act review – revisiting the…

    December 31, 2025
    Categories
    • Books (968)
    • Business (5,876)
    • Film (5,810)
    • Lifestyle (3,913)
    • Music (5,878)
    • Politics (5,880)
    • Science (5,222)
    • Technology (5,809)
    • Television (5,495)
    • Uncategorized (2)
    • US News (5,861)
    popular posts

    Watch Denzel Curry perform ‘Walkin’ on ‘The Tonight Show’

    Denzel Curry performed his recent single ‘Walkin’ during an appearance on US TV last night…

    Elon Musk’s Father Confirms Second Secret Child With Stepdaughter

    July 15, 2022

    A Major GLP-1 Drug Shortage Is Over. Some Patients Aren’t Celebrating

    October 9, 2024

    Benson Boone claps back at Coachella crowd for lukewarm reaction to Queen Brian May collab

    April 14, 2025
    Archives
    Browse By Category
    • Books (968)
    • Business (5,876)
    • Film (5,810)
    • Lifestyle (3,913)
    • Music (5,878)
    • Politics (5,880)
    • Science (5,222)
    • Technology (5,809)
    • Television (5,495)
    • Uncategorized (2)
    • US News (5,861)
    About Us

    We are a creativity led international team with a digital soul. Our work is a custom built by the storytellers and strategists with a flair for exploiting the latest advancements in media and technology.

    Most of all, we stand behind our ideas and believe in creativity as the most powerful force in business.

    What makes us Different

    We care. We collaborate. We do great work. And we do it with a smile, because we’re pretty damn excited to do what we do. If you would like details on what else we can do visit out Contact page.

    Our Picks

    Star that seemed to vanish more than 130 years ago is found again

    December 31, 2025

    Bowie: The Final Act review – revisiting the…

    December 31, 2025

    ’90 Day Fiance’ Debbie Johnson Shares Devastating Family Death

    December 31, 2025
    © 2025 Beverly Hills Examiner. All rights reserved. All articles, images, product names, logos, and brands are property of their respective owners. All company, product and service names used in this website are for identification purposes only. Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Terms & Conditions and Privacy Policy.

    Type above and press Enter to search. Press Esc to cancel.

    We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
    Cookie SettingsAccept All
    Manage consent

    Privacy Overview

    This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
    Necessary
    Always Enabled
    Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
    CookieDurationDescription
    cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
    cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
    cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
    cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
    cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
    viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
    Functional
    Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
    Performance
    Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
    Analytics
    Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
    Advertisement
    Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
    Others
    Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
    SAVE & ACCEPT