Issue 7. Where ChatGPT and AI fail.
MUNICH DIGITAL INSTITUTE
We turn Data into Intelligence: Strategy | Visualization | Science | Engineering
“Trust me, I’m ChatGPT !” Is artificial intelligence taking over everything and doing everything better? We say: Be cautious!
These days, advanced AI models like ChatGPT are finding more and more applications in everyday work life. ChatGPT is a so-called Large Language Model (LLM), which means it's a language model trained on a very large dataset, automatically recognizing patterns and providing responses. Ultimately, it's based on statistics. LLMs provide responses to queries (prompts) that are most likely based on the training data. However, they do so in a self-learning manner, which is where the AI component comes in.
With AI, there's no need to create extensive rules like with traditional algorithms. It creates its own rules. On one hand, this is very powerful because it can develop its own logic for almost all cases automatically in a short time. On the other hand, transparency and trust can suffer as a result.
A conventional algorithm programmed by hand might only have a 40% accuracy rate (or covers 40% of the cases), but it's clearer which 60% of the cases the algorithm fails in. Advanced AI models, on the other hand, return a much higher percentage of correct results, often up to 90% of queries. However, one can never be sure which cases belong to the wrong 10% (numbers are just examples).
Even though the capabilities behind it are already very impressive, today we want to consider some cases where trust in AI may have been placed too early and which can be cautionary examples of not blindly trusting the technology.
Case 1: Amazons Hiring Tool (2018):
Even before large LLMs and generative AI models like ChatGPT became prominent, Amazon attempted to automate their hiring process using AI. However, it quickly became apparent that the model, due to the historically male-dominated developer industry used as training data, disadvantaged women in evaluation. Even when developers filtered all incoming applications for any words and phrases that could indicate the applicant's gender, women were still disadvantaged by the AI model. After further analysis, it was determined that the model focused on words that were more commonly used by male applicants, such as "executed" and "captured".
Case 2: Machine Learning and COVID (2021)
Since the COVID-19 pandemic began, numerous organizations have sought to apply machine learning (ML) algorithms to help hospitals diagnose or triage patients faster. But according to the UK’s Turing Institute, a national center for data science and AI. MIT Technology Review has chronicled a number of failures, most of which stem from errors in the way the tools were trained or tested. The use of mislabeled data or data from unknown sources was a common culprit.
Derek Driggs, a machine learning researcher at the University of Cambridge, together with his colleagues, published a paper in Nature Machine Intelligence that explored the use of deep learning models for diagnosing the virus. The paper determined the technique not fit for clinical use. For example, Driggs’ group found that their own model was flawed because it was trained on a data set that included scans of patients that were lying down while scanned and patients that were standing up. The patients who were lying down were much more likely to be seriously ill, so the algorithm learned to identify COVID risk based on the position of the person in the scan.
领英推荐
Case 3: ChatGPT is no doctor! (2023)
OpenAI's ChatGPT is no closer to replacing your family physicians, as the increasingly advanced chatbot failed to accurately diagnose the vast majority of hypothetical pediatric cases. The findings were part of a new study published in JAMA Pediatrics on Jan. 2, conducted by researchers from Cohen Children's Medical Center in New York. The researchers analyzed the bot's responses to requests for medical diagnosis of child illnesses and found that the bot had an 83 percent error rate across tests.
Case 4: Faked Court Case (2024)
A New York lawyer is facing possible discipline for citing a non-existent case generated by artificial intelligence, marking AI's latest disruption for attorneys and courts learning to navigate the emerging technology. She included a non-existent state court decision in an appeal to revive her client's lawsuit claiming a Queens doctor botched an abortion. The court in November ordered the lawyer to submit a copy of the cited decision after it was not able to find the case. She responded that she was "unable to furnish a copy of the decision."
Summary: From these examples, one can see that when deploying AI, it's important to choose the right tool for the right application. A language model is not always the best choice for medical examinations. Additionally, it's advisable to use such techniques when it's not crucial which cases are incorrectly answered, but rather when the goal is to improve overall accuracy.
Inspired by data! will be published every second Tuesday at 12pm CET (next one on Apr 30th)
We turn data into intelligence! Since 2014, MUNICH DIGITAL is helping big and middle-sized corporations with their data-driven business models, especially in context of marketing, communications, sales and service. www.munich-digital.com