Close Menu
Beverly Hills Examiner

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Gilla Band face “feeling unloved and finding it difficult to articulate what I’m actually thinking” on first new song in four years, ‘Giraffe’

    May 28, 2026

    Why AI is raising worker productivity but not making the economy more efficient

    May 28, 2026

    In Between Naps, Trump Blames Biden For The Lincoln Memorial

    May 28, 2026
    Facebook X (Twitter) Instagram
    Beverly Hills Examiner
    • Home
    • US News
    • Politics
    • Business
    • Science
    • Technology
    • Lifestyle
    • Music
    • Television
    • Film
    • Books
    • Contact
      • About
      • Amazon Disclaimer
      • DMCA / Copyrights Disclaimer
      • Terms and Conditions
      • Privacy Policy
    Beverly Hills Examiner
    Home»Science»Chatbots Struggle to Answer Medical Questions in Widely Spoken Languages
    Science

    Chatbots Struggle to Answer Medical Questions in Widely Spoken Languages

    By AdminApril 1, 2024
    Facebook Twitter Pinterest LinkedIn WhatsApp Email Reddit Telegram


    Plugging medical symptoms into Google is so common that clinicians have nicknamed the search engine “Doctor Google.” But a newcomer is quickly taking its place: “Doctor Chatbot.” People with medical questions are drawn to generative artificial intelligence because chatbots can answer conversationally worded questions with simplified summaries of complex technical information. Users who direct medical questions to, say, OpenAI’s ChatGPT or Google’s Gemini may also trust the AI tool’s chatty responses more than a list of search results.

    But that trust might not always be wise. Concerns remain as to whether these models can consistently provide safe and accurate answers. New study findings, set to be presented at the Association for Computing Machinery’s Web Conference in Singapore this May, underscore that point: OpenAI’s general-purpose GPT-3.5 and another AI program called MedAlpaca, which is trained on medical texts, are both more likely to produce incorrect responses to health care queries in Mandarin Chinese, Hindi and Spanish compared with English.

    In a world where less than 20 percent of the population speaks English, these new findings show the need for closer human oversight of AI-generated responses in multiple languages—especially in the medical realm, where misunderstanding a single word can be deadly. About 14 percent of Earth’s people speak Mandarin, and Spanish and Hindi are used by about 8 percent each, making these the three most commonly spoken languages after English.


    On supporting science journalism

    If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


    “Most patients in the world do not speak English, and so developing models which can serve them should be an important priority,” says ophthalmologist Arun Thirunavukarasu, a digital health specialist at John Radcliffe Hospital and the University of Oxford, who was not involved in the study. More work is needed before these models’ performance in non-English languages matches what they promise the English-speaking world, he adds.

    In the new preprint study, researchers at the Georgia Institute of Technology asked the two chatbots more than 2,000 questions similar to those typically asked by the public about diseases, medical procedures, medications, and other general health topics.* The queries in the experiment, chosen from three English-language medical datasets, were then translated into Mandarin Chinese, Hindi and Spanish.

    For each language, the team checked whether the chatbots answered questions correctly, comprehensively and appropriately—qualities that would be expected of a human expert’s answer. The study authors used an AI tool (GPT-3.5) to compare generated responses against the answers provided in the three medical datasets. Finally, human assessors double-checked a portion of those evaluations to confirm the AI judge was accurate. Thirunavukarasu, though, says he wonders about the extent to which artificial intelligence and human evaluators agree; people can, after all, disagree over critiques of comprehension and other subjective traits. Additional human study of the generated answers would help clarify conclusions about chatbots’ medical usefulness, he adds.

    The authors found that according to GPT-3.5’s own evaluation, GPT-3.5 produced more unacceptable replies in Chinese (23 percent of answers) and Spanish (20 percent), compared with English (10 percent). Its performance was poorest in Hindi, generating answers that were contradictory, not comprehensive or inappropriate about 45 percent of the time. Answer quality was much worse for MedAlpaca: more than 67 percent of the answers it generated to questions in Chinese, Hindi and Spanish were deemed irrelevant or contradictory. Because people might use chatbots to verify information about medications and medical procedures, the team also tested the AI’s capability to distinguish between correct and erroneous statements; the chatbots performed better when the claims were in English or Spanish, compared with Chinese or Hindi.

    One reason large language models, or LLMs (the text-generating technology behind these chatbots), generated irrelevant answers was because the models struggled to figure out the context of the questions, says Mohit Chandra, co-lead author of the study. Scientific American asked OpenAI and the creators of MedAlpaca for comment but did not receive a response by the time of this article’s publication.

    MedAlpaca tended to repeat words when responding to non-English queries. For instance, when asked in Hindi about the outlook for chronic kidney disease, it started generating a general answer about the problems of the disease but went on to continuously repeat the phrase “at the last stage.” The researchers also noticed that the model occasionally produced answers in English to questions in Chinese or Hindi—or did not generate an answer at all. These strange results might have occurred because “the MedAlpaca model is significantly smaller than ChatGPT, and its training data is also limited,” says the study’s co-lead author Yiqiao Jin, a graduate student at the Georgia Institute of Technology.

    The team found that the answers in English and Spanish, compared with those in Chinese and Hindi, had better consistency across a parameter that artificial intelligence developers call “temperature.” That’s a value that determines the creativity of generated text: the higher an AI’s temperature, the less predictable it becomes when generating a response. At lower temperatures, the models might respond to each health care question with, “Check with your health care professional for more information.” (While this is a safe reply, it’s perhaps not always a helpful one.) The comparable performance across model temperatures might be because of the similarity between English and Spanish words and syntax, Jin says. “Maybe in the internal functioning of those models, English and Spanish are placed somewhat closer,” he adds.

    The overall worse performance in non-English languages may result from the way these models were trained, the study authors say. LLMs learn how to string words together from data scraped online, where most text is in English. And Chandra points out that even in nations where English isn’t the majority language, it’s the language of most medical education. The researchers think a straightforward way to tackle this might be to translate health care texts from English into other languages. But building multilingual text datasets at the huge quantities required to train LLMs is a major challenge. One option could be to leverage LLMs’ own capability to translate between languages by designing specific models that are trained on English-only data and generate answers in a different language.

    But this trick might not work neatly in the medical domain. “One of the problems human translators, as well as machine translation models, face is that the key scientific words are very hard to translate. You might know the English version of the particular scientific term, but the Hindi or Chinese version might be really different,” says Chandra, who also notes that errors in the translation quality of texts in Chinese and Hindi could contribute to the LLM mistakes found in the study.

    Additionally, Chandra says, it may be wise to include more medical experts and doctors, especially from the Global South, when training and evaluating these LLMs in non-English use. “Most of the evaluations for health care LLMs, even today, are done with a homogeneous set of experts, which leads to the language disparity we see in this study,” he adds. “We need a more responsible approach.”

    *Editor’s Note (4/1/24): This sentence was edited after posting to reflect the current status of the study.



    Original Source Link

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp Email Reddit Telegram
    Previous ArticleWhat Netflix Movies Could Look Like Under Dan Lin
    Next Article TikTok is bringing its dedicated STEM feed to Europe

    RELATED POSTS

    A New Species of Tiny Octopus Was Discovered in the Galápagos Islands

    May 27, 2026

    This sci-fi novel asks—can what you will never know kill you?

    May 27, 2026

    How a radical new view of life could reveal its origin – and aliens

    May 26, 2026

    The Cookware Industry Has a Major Fight Brewing Over PFAS Claims

    May 26, 2026

    How mathematicians use Minecraft to calculate pi

    May 25, 2026

    Mars astronauts may do laundry by blasting clothes with a plasma beam

    May 25, 2026
    latest posts

    Gilla Band face “feeling unloved and finding it difficult to articulate what I’m actually thinking” on first new song in four years, ‘Giraffe’

    Gilla Band have shared their first new song in four years in the form of…

    Why AI is raising worker productivity but not making the economy more efficient

    May 28, 2026

    In Between Naps, Trump Blames Biden For The Lincoln Memorial

    May 28, 2026

    Illinois alderperson charged with casting ballot in dead mom’s name

    May 27, 2026

    Google Security Engineer Arrested in Million-Dollar Polymarket Trading Scheme

    May 27, 2026

    A New Species of Tiny Octopus Was Discovered in the Galápagos Islands

    May 27, 2026

    Netflix’s Little House On The Prairie Remake Casts Iconic Season 2 Villain Ahead Of Series Premiere

    May 27, 2026
    Categories
    • Books (1,267)
    • Business (6,173)
    • Cover Story (5)
    • Film (6,110)
    • Lifestyle (4,196)
    • Music (6,180)
    • Politics (6,168)
    • Science (5,521)
    • Technology (6,106)
    • Television (5,800)
    • Uncategorized (3)
    • US News (6,157)
    popular posts

    ‘Pulp Fiction,’ ‘The Mask’ Actor Was 60

    Peter Greene, the actor who appeared in films including Pulp Fiction and The Mask, has…

    Math’s Block-Stacking Problem Has a Preposterous Solution

    July 7, 2025

    What is mpox and what can be done to contain it

    August 16, 2024

    Harassment of Supreme Court justices appalls pro-choice WaPo columnist: ‘Beyond the pale’

    July 9, 2022
    Archives
    Browse By Category
    • Books (1,267)
    • Business (6,173)
    • Cover Story (5)
    • Film (6,110)
    • Lifestyle (4,196)
    • Music (6,180)
    • Politics (6,168)
    • Science (5,521)
    • Technology (6,106)
    • Television (5,800)
    • Uncategorized (3)
    • US News (6,157)
    About Us

    We are a creativity led international team with a digital soul. Our work is a custom built by the storytellers and strategists with a flair for exploiting the latest advancements in media and technology.

    Most of all, we stand behind our ideas and believe in creativity as the most powerful force in business.

    What makes us Different

    We care. We collaborate. We do great work. And we do it with a smile, because we’re pretty damn excited to do what we do. If you would like details on what else we can do visit out Contact page.

    Our Picks

    A New Species of Tiny Octopus Was Discovered in the Galápagos Islands

    May 27, 2026

    Netflix’s Little House On The Prairie Remake Casts Iconic Season 2 Villain Ahead Of Series Premiere

    May 27, 2026

    ‘RHOBH’ Dorit Kemsley Breaks Silence On Exiting Show, Finances

    May 27, 2026
    © 2026 Beverly Hills Examiner. All rights reserved. All articles, images, product names, logos, and brands are property of their respective owners. All company, product and service names used in this website are for identification purposes only. Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Terms & Conditions and Privacy Policy.

    Type above and press Enter to search. Press Esc to cancel.

    We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
    Cookie SettingsAccept All
    Manage consent

    Privacy Overview

    This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
    Necessary
    Always Enabled
    Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
    CookieDurationDescription
    cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
    cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
    cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
    cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
    cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
    viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
    Functional
    Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
    Performance
    Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
    Analytics
    Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
    Advertisement
    Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
    Others
    Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
    SAVE & ACCEPT