ChatGPT & Scientific Research: What are the Risks?
"ai neural network futuristic style" created in Midjourney

ChatGPT & Scientific Research: What are the Risks?

Enthought has been leveraging AI and machine learning (ML) techniques for our customers in science-driven industries for decades. So while we’re not leaping blindly into the most recent hype around Large Language Model (LLM) technologies like ChatGPT, we wholeheartedly agree that the advancement is a world-changing milestone.?(Side note: All the images in this article were created with Midjourney.)

ChatGPT (chat with Generative Pre-trained Transformers) developed by OpenAI uses unsupervised learning to generate human-like conversation. Unlike traditional natural language processing (NLP) technologies, ChatGPT does not require any custom training to use, is capable of learning new phrases without additional programming, and can process massive data sets much faster. The extension of the technology into other applications like text-to-image/video/audio and introduction of plugins, as well as the break-neck pace of advancement, is mind-blowing to even the most avid technophiles.

AI in Scientific Research

When you’re in high-pressure innovation industries like biopharmaceuticals, materials science, and semiconductors where time to discovery and generating IP are king, integrating advanced technologies is the only way to be and stay competitive. We’ve used AI and ML to automate polymer formulation scale-up in specialty material product development, enable visualization of sub-pixel particles in fluids properties testing, unlock the value of high throughput screening pipelines in drug discovery, and much more. Beyond the entertainment factor, what are the implications of ChatGPT and generative AI on scientific research in business? We, like others, are deeply considering the new opportunities and risks around LLMs.

The Risks are Real

It’s easy to get distracted by the long list of potential research and operational opportunities with ChatGPT/generative AI. The stakes are extremely high in R&D, however, so the very real and significant risks must be thoroughly deliberated first. The risks below are at the top of our current conversations at Enthought and with our customers:

1.Validity and Reliability.

The world witnessed Google’s $100 billion mistake by Bard in February and through it, learned quickly about the reliability of generative AI. While Google’s own employees admitted that at the time of product launch that Bard was “a pathological liar,” ChatGPT is also not as reliable as it appears. There are many stories of ChatGPT simply making things up in a convincing and authoritative way: saying someone was deceased when they were very much alive, providing incorrect medical advice, fabricating research findings and citations. Andrew Ng and others also have shown that the longer the utterance ChatGPT produces in response to a question or prompt, the more likely it is to give a wrong answer, because each token it produces has some probability of veering off the path of a right answer. And don’t forget that ChatGPT still has limited knowledge of the world after 2021. Takeaway: The bar is currently too low for high quality scientific research.

2. Algorithmic Bias.

Compounding ChatGPT and other’s propensity to “hallucinate” is clear evidence that its responses are biased. ChatGPT is trained on vast amounts of data, which is inherently biased already. Take the internet for example. As ex-co-head of Google’s AI ethics team Timnit Gebru says, “The text that you're using from the internet to train these models is going to be encoding the people who remain online, who are not bullied off—all of the sexist and racist things that are on the internet, all of the hegemonic views that are on the internet. So, we were not surprised to see racist, and sexist, and homophobic, and ableist, et cetera, outputs.” Takeaway: Diversity and range of training data, not just quantity, determines usability.

What Midjourney v5 thinks of when prompted with only "scientist"
What Midjourney v5 thinks of when prompted with only "scientist"

3. Privacy and Data Protection.

As Harvard Business Review states Generative AI Has an Intellectual Property Problem, and there are risks on both the input side and the generated content side. Companies are already scrambling to manage employee sharing of confidential trade secrets and business information. In fact, Cyberhaven says the average company leaks confidential material to ChatGPT hundreds of times per week. Samsung employees shared confidential meeting notes with ChatGPT, while others used it to help fix their proprietary source code. Not only is all that data now in the hands of OpenAI, it’s being used to train ChatGPT to share out. In addition, legal questions are being raised—do copyright, patent, trademark infringement apply to AI generated content? Takeaway: Safeguarding IP and data just got significantly harder for businesses.

"artificial intelligence legal scales futuristic" created in Midjourney
"artificial intelligence legal scales futuristic" created in Midjourney

What are your thoughts?

Notwithstanding the risks, there are massive opportunities ahead for scientists and R&D, which we will talk about soon. What are the ChatGPT discussions about in your team and organization? Comment to share your thoughts.


Questions? Message us or email [email protected] to connect with an expert.

要查看或添加评论,请登录

Enthought的更多文章

社区洞察

其他会员也浏览了