登录查看更多内容

GPThibault Pulse” vol. 4 - your weekly fix of Prompt Engineering, insider tips and news on Generative AI, and Life Sciences

?? ?? Thibault GEOUI ?? ??

Science CDO - Head of AI/ML for Drug R&D ??- Bridging Science ??, Data ??, and Technology (AI) ?? to Help Life Sciences Companies Bring Better Products ?? to Market Faster

发布日期: 2023年3月25日

Welcome to “GPThibault Pulse” vol. 4 - your weekly fix of PromptEngineering, insider tips and news on Generative AI, and Life Sciences.

In this week's issue, we will discuss the following topics:

1. Another busy week for Generative AI companies, with news from 英伟达 , 谷歌 , and OpenAI .

2. What is the value of Generative AI beyond "generic" use cases? In other words, what can it do for me in my industry? In this case, we'll look at it from the Pharma angle.

3. You can't make omelets without breaking eggs, and GenAI is breaking a lot of eggs lately! We'll talk about private user data leaks and IP problems.

4. Philosophy Friday: Some say GenerativeAI is showing "sparks of intelligence" and it "understands," while others say it's just an algorithm that predicts the next word. So, what is it? As usual, the truth is somewhere in the middle.

1 - A week of launches from some of the key players in the world of Generative AI

谷歌 #Bard went public.

Google launched their #ChatGPT competitor, Bard (see Google blog post, here ), and it was initially received with mixed feelings, although people seem more positive now. However, it is only available in the UK and the USA. Elena Alston from Zapier wrote a nice head-to-head comparison of ChatGPT and Bard (link here ). The key differences between them are the data sources and models they're trained on. ChatGPT, using GPT-3 or GPT-4 depending on the version, was trained on a massive dataset of text from the internet, while Google Bard relies on real-time, current research pulled from the internet using Google's Language Model for Dialogue Applications. Bard's ability to draw responses from the internet means it can offer more accurate and up-to-date information, making it a more advanced personal assistant compared to ChatGPT. However, ChatGPT is better at textual functions and can write large amounts of text, making it a handy tool for writing emails, coming up with content marketing ideas, and summarizing text (my personal favourite use-case!). When it comes to the user experience, Bard is miles ahead with its user-friendly interface, formatted text, and the ability to view multiple responses prepared by the AI. In the end, competition is good, and more options mean that users are the winners.

OpenAI launched ChatGPT plugins.

Plugins can be “eyes and ears” for language models, giving them access to information that is too recent, too personal, or too specific to be included in the training data. In response to a user’s explicit request, plugins can also enable language models to perform safe, constrained actions on their behalf, increasing the usefulness of the system overall. After seeing countless ChatGPT extensions in Google #Chrome , for example, ChatGPT itself is getting extended and turning into a platform. That’s the start of a new #GenerativeAI economy – think 苹果 #AppStore !

No alt text provided for this image — An example of ChatGPT plugins

Nvidia, the big daddy of AI, had its Launch day ... and it was MASSIVE!

It was a significant week for 英伟达 with its GTC event.

Despite initial skepticism from analysts, Nvidia's strong focus and investment in AI over the past decade has led to its dominance in the AI and generative AI market.

A lot of the current success of Nvidia can be attributed to a few clever investments, such as their cloud services for custom language and generative AI models, Nvidia Picasso and Nvidia NeMo. Additionally, Nvidia's research team has consistently produced cutting-edge papers on generative AI topics, even when others questioned their strategy. Nvidia's long-standing commitment to producing high-performance computing hardware has also been essential. Despite some early doubts, Nvidia's investment in AI has proven to be a wise decision, and the company continues to be the major player in the industry.?

This week they had several announcements, including the introduction of their inferencing platform for Generative AI. Nvidia has announced its DGX H100 compute platform for AI inferencing, which offers high-speed scalability and performance. 微软 will be the first hyperscaler to use it and will provide early access. 英伟达 also announced the L4 accelerator for AI, video, and graphics, and the H100 NVL for real-time large language model inferencing. Google Cloud has announced early access to L4.

AI inferencing is the process of using pre-trained models to make predictions or decisions based on new data.

In contrast to training, which requires significant computing power and time to develop new models, inferencing is a more lightweight process that can be done in real-time. By improving the speed and scalability of AI inferencing, Nvidia is helping to unlock new use cases and applications for AI, which could have significant benefits for businesses and consumers alike.

Finally, some of the first customers are already taking delivery of this cutting-edge hardware. Japanese conglomerate Mitsui & Co., Ltd. is collaborating with Nvidia to create #Tokyo -1, an initiative to help the pharmaceutical sector through the use of high-resolution molecular dynamics simulations and AI in drug discovery. The move is intended to supercharge Japan's pharma industry, the world's third-largest after the US and China. Initially, the project will consist of 16 Nvidia DGX H100 systems at Xeureka, a Mitsui & Co., Ltd. subsidiary, with customers able to access a dedicated server on the supercomputer network and technical support from the company. Major pharma firms including Astellas Pharma , Daiichi Sankyo US and ONO Pharmaceutical are planning to use the technology.?

Read the announcement here

The Generative AI ecosystem is evolving on a daily basis, with new players coming from everywhere. It's challenging to have an overview of how this space is shaping up. One thing is clear: Nvidia is playing a central role in this ecosystem. They have developed the hardware and software that everyone needs to run their models. Small large Language Models (#sLLM ... I just made this up ... but it seems to be the best way to describe this new breed of LLM that is not as large as GPT-4 and its main rivals) are emerging, they can be run on a Raspberry Pi (e.g.: 美国斯坦福大学 #Alpaca ), but the more traditional ones are still power-hungry, and Nvidia is uniquely placed to support them.

2 - Generative AI beyond simple use-cases

I understand that there is a lot of hype around Generative AI, and AI usually has two effects: it either attracts more people or turns them off.

While some people have dismissed Generative AI as only useful for basic tasks, like creating simple art or music, the reality is that this technology can be incredibly powerful in a variety of professional applications.

Some of the most common use cases include generating text, images, and even entire websites. In the pharmaceutical industry, Generative AI has the potential to be a game-changer, helping researchers identify new drug candidates, design more effective clinical trials, and even develop personalized treatment plans for patients. With the right tools and expertise, the possibilities for Generative AI in healthcare are truly endless.

Cem Dilmegani wrote an excellent article (link here ) about ten use-cases for Generative AI in life sciences:

1. Novel molecule generation: Using generative models like VAEs and GANs to design new drug-like molecules with specific properties.

2. Protein sequence design: Creating new protein sequences with desired functionalities or properties, which can be useful in protein engineering and drug development.

3. Synthetic gene design: Designing synthetic gene sequences for use in synthetic biology, such as creating new biosynthetic pathways or optimizing gene expression.

4. Data augmentation for model training: Generating synthetic data to augment existing datasets and improve the performance of AI models.

5. Imputation of missing data: Filling in missing medical data in life science datasets to improve downstream analysis and modeling.

6. Virtual patient generation: Creating synthetic patient and healthcare data for training AI models, simulating clinical trials, or studying rare diseases.

7. Single-cell RNA sequencing (scRNA-seq) data denoising: Removing noise or unwanted variations from scRNA-seq data to improve downstream analysis like cell-type identification and gene expression quantification.

8. Image-to-image translation: Converting one type of biological image to another, such as transforming fluorescence microscopy images into electron microscopy images.

9. Text-to-image generation: Generating images of biological structures or processes based on textual descriptions.

10. Simulating biological processes: Creating realistic simulations of biological processes, such as cellular signaling or metabolic pathways, to better understand these systems and predict their behavior under different conditions.

I'm also very excited about Generative AI's capabilities to aggregate and summarize information since it is a key challenge in an information-rich domain such as drug R&D. Decision-making to move from one step to the other involves the aggregation of countless documents, their analysis, and summarization – a task very well-suited to Generative AI. While today, most of the AI applications in Pharma R&D are rather scientific, I'm convinced that in the near future, supporting leadership decision-making and accelerating regulatory filings will be the most exciting use-cases!

What's your favourite use-case in your industry?

3 - Private data leaks and potential lawsuits over IP infringements … let the fun begins as Generative AI?

Don’t leak personal stuff!

There is not a day without “fun and excitement” in the world of Generative AI.

This week, ChatGPT users had the bad surprise to see saved searches from other users. And while it might not be too bad when it comes to searching for new chocolate cake recipes, you can imagine that it is much more problematic when it comes to very personal information!?As a result, OpenAI took ChatGPT offline as it scrambled to deal with the problem.

The company's status page indicated that it was having problems with its "web experience" and was working to restore conversation history. Users were warned that they were temporarily unable to access their chat history. OpenAI showed a range of error messages to users.

Steve Nouri 2 年前

? Google's Gemini tells us a lot about the AI race

Azeem Azhar 11 个月前

Which AI Platforms Will Breathe The Most Fire In…

Artificial Inspiration 9 个月前

The bug in ChatGPT affected a small percentage of users, but OpenAI did not give an indication of how many. The problem was the result of a bug in an open-source library, but OpenAI did not specify which library.OpenAI has found a fix for the problem and is currently rolling it out. However, users will not be able to see their chat history for some of Monday as a result (I can't use ChatGPT for a week ... i wonder if this is related?). OpenAI as stated that it will follow up with a technical postmortem after the privacy issue with ChatGPT. Additionally, users are warned not to share sensitive information with ChatGPT, as conversations can be reviewed and used to train the system. This issue serves as a reminder of the importance of transparency and accountability in AI development as technology continues to advance. Companies must prioritize user privacy and security to maintain trust in their products and services.

Read the BBC report here

Respect other people's IP!

Nothing new under the sun, another conflict between publishers of online content and AI companies such as 微软 and 谷歌 over the use of their content to train AI tools that can generate natural language responses. These AI tools, such as ChatGPT by OpenAI and Bard by Google, have been praised for their capabilities to carry on conversations, make up sonnets and ace the #LSAT . However, the publishers claim that their content is valuable and should be compensated for being used to “train” these tools, as they invest a lot of human work and resources to produce it. They also fear that these tools might reduce their traffic and advertising revenue by providing comprehensive answers to user queries without linking to their websites or citing their sources. The article (here ) discusses some legal cases and legislative efforts related to this issue, such as Getty Images suing an AI art company for infringing on its copyrights, and a bill that would allow publishers to negotiate collectively with AI suppliers without violating antitrust laws. The article also reports on some discussions between publishers and AI companies, such as Reddit, Inc. talking with Microsoft and News Corp talking with an unnamed party. The article explores the question of whether AI companies have the legal right to scrape content off the internet and feed it into their training models under the provision of “fair use”, which allows for copyright material to be used without permission in certain circumstances. The article quotes OpenAI CEO Sam Altman saying that they have done a lot with fair use with ChatGPT, which was trained on two-year-old data.

Battles over IP infringement with content creators are not new … 10 years ago, German publishers were seeking ancillary copyright law that guarantees them a cut of the royalties paid to them when search engines like Google and Bing display snippets of newspaper articles they have developed the content for. The search engines did not pay royalties, but earned a lot of money from advertising. Their model is to link and earn. Google argues that their news search listing services have the right to try and earn money and that interfering with this right is allowed. While publishers have the option to opt out of listings, they would prefer to see something similar to France's model enshrined in law. Legal opinion, however, contends that ancillary copyright is unconstitutional, as such a law would encroach on a user's fundamental right to freely inform themselves.

4 - Philosophy Friday: AI intelligence; the concept of “understanding” and what it means to us “Humans”

"The age of AI has begun"

In his latest article Bill Gates (link here ) reflects on the two technological breakthroughs he has witnessed in his lifetime - the graphical user interface and the development of advanced artificial intelligence (AI) by OpenAI in 2022.

Gates challenged the team at OpenAI to create an AI capable of passing an Advanced Placement biology exam, and to everyone's surprise, the AI, known as GPT, was able to complete the challenge in just a few months.

Gates believes that the development of artificial intelligence is as fundamental as the creation of the microprocessor, personal computer, internet, and mobile phone. AI will transform the way people work, learn, travel, obtain healthcare, and communicate with one another. AI can also help to reduce some of the world's worst inequities in fields like healthcare, education, and climate change.

Change of this magnitude comes with risks, but there are potential mitigation strategies. AI is a disruptive technology that raises difficult questions about the workforce, the legal system, privacy, bias, and more. However, Gates believes that how governments and foundations approach the regulation and development of AI will determine its impact on society. He suggests creating regulations to protect privacy, prevent bias, develop new educational curricula, and invest in research into AI safety.

Regulation is already starting

Canada, the birthplace of AI, is also one of the first countries to work on regulating it. Canada is home to 20 public AI research labs, 75 AI incubators and accelerators, 60 groups of AI investors from across the country, and over 850 AI-related start-up businesses. The Canadian government has tabled the Artificial Intelligence and Data Act (AIDA) as part of Bill C-27, the Digital Charter Implementation Act. The framework proposed in the AIDA is the first step towards a new regulatory system designed to guide AI innovation in a positive direction and encourage the responsible adoption of AI technologies by Canadians and Canadian businesses. The AIDA is intended to protect Canadians, ensure the development of responsible AI in Canada, and position Canadian firms as leaders in global AI development. The AIDA defines high-impact AI systems and sets up specific requirements, including oversight and enforcement, criminal prohibitions, and consultation timeline for further development through stakeholder input. Canada is also working with international partners to align its approach to AI with global standards. AI is a powerful enabler, and Canada has a leadership role in this significant technology area.

Link to the to “The Artificial Intelligence and Data Act (AIDA) – Companion document” here

Denis Hassabis, Google DeepMind ’s CEO, is another prominent voice advocating for the regulation of AI. He is on a mission to build an artificial general intelligence (AGI) capable of solving humanity's toughest problems. DeepMind is a leading AI lab, and Hassabis wants to use it to create a "cathedral of knowledge" for AI innovation. While AI has the potential to make extraordinary contributions to society, there are also concerns about its potential misuse, such as deadly new chemicals or hate speech. Hassabis is worried about the trend of moving fast and breaking things in the tech industry, which has led to companies like Facebook being unprepared for the negative consequences of their products. He believes that AI is now capable of creating tools that could be detrimental to human civilization and urges his competitors to proceed with more caution. He warns that not everyone is thinking about the potential dangers of AI, and we are the ones who will suffer the consequences of experimentation. Hassabis is focused on creating AGI that will benefit humanity, but he is also aware of the risks and is urging others to prioritize safety in their pursuit of AI innovation.

Read the full article here

Are LLM capable of understanding? And what does understanding really mean?

So now, we are talking about Artificial General Intelligence (AGI). A few months ago, it was a wild dream, science-fiction… it was decades away. Some thought that it would never happen and that intelligence was a human thing (or at least a thing of the living), but a few papers are now being published following the launch of ChatGPT (especially in its latest iteration with GPT-4), so let’s dive into that.

Some will argue that LLMs don’t understand, they just calculate what is the most likely word after another word. However, their behavior is suggesting otherwise, and people (me included) tend to overuse the term “it understands”. Melanie Mitchell and David C. Krakauer from the Santa Fe Institute published a paper entitled “The Debate Over Understanding in AI’s Large Language Models” (link here ), and a heated debate has arisen in the AI community on whether machines can now be said to understand natural language and the physical and social situations that language can describe. This has real stakes in how much humans can trust AI systems to act robustly in tasks that affect humans. Recent Large Language Models (LLMs) have emerged, showing impressive feats of language processing and seeming human-like reasoning abilities. However, there are two sides to this debate, with one faction arguing that these models understand language and can perform reasoning in a general way, and another faction arguing that LLMs cannot possess understanding because they have no experience or mental models of the world. A survey of AI researchers showed that half agreed that LLMs could understand language in some non-trivial sense, and half disagreed.

The debate surrounding understanding in Large Language Models (LLMs) is centered around the questions of whether they are capable of creating the same kind of concept-based mental models of the world that humans have. While LLMs have exhibited formal linguistic competence, they still lack the conceptual understanding required for human-like functional language abilities. The performance of LLMs on language understanding and reasoning tasks such as GLUE and SuperGLUE has suggested that they may have some kind of understanding, but it remains unclear if this is simply a result of learning statistical correlations or a more human-like understanding. Cognitive science research has long focused on understanding the nature of concepts, how understanding arises from coherent, hierarchical sets of relations among them, and how they are represented in the brain. Researchers disagree on the extent to which concepts are domain-specific and innate versus more general-purpose and learned. To make progress, scientists will need to develop new kinds of benchmarks and probing methods that can yield insight into the mechanisms of diverse types of intelligence and understanding. This will be necessary to understand the capabilities and limitations of LLMs and to learn how to integrate different kinds of cognition.

And now something even more controversial: are they intelligent?

A new paper claims that GPT-4 shows “sparks of General intelligence” (here )

Let’s start with “what is intelligence”? A very simple (and restrictive) definition, according to Oxford Languages, is

“the ability to acquire and apply knowledge and skills”…

In Wikipedia, we find a more complex definition:

“Intelligence has been defined in many ways: the capacity for abstraction, logic, understanding, self-awareness, learning, emotional knowledge, reasoning, planning, creativity, critical thinking, and problem-solving. More generally, it can be described as the ability to perceive or infer information, and to retain it as knowledge to be applied towards adaptive behaviors within an environment or context.”

In 2020, Will Douglas Heaven published “Artificial general intelligence: Are we close, and does it even make sense to try?” in the MIT Technology Review (link here )

“A machine that could think like a person has been the guiding vision of AI research since the earliest days—and remains its most divisive idea.”

Artificial General Intelligence (AGI) is a controversial goal of AI research, often seen as a stand-in for any AI that has not been built yet. While AGI is not about conscious machines or thinking robots, some of the biggest AI labs in the world take AGI seriously, with OpenAI aiming to build a machine with human-like reasoning abilities, while DeepMind's unofficial but widely stated mission is to "solve intelligence". The inconsistency of what is meant by AGI makes it a bugbear for some researchers, leading Facebook 's head of AI, Jerome Pesenti, to say:

"I don't like the term AGI. I don't know what it means."

In just two years, it seems that things have evolved dramatically! A recent paper by 微软 research scientists (link here ) has claimed that the OpenAI language model that powers Bing AI shows "sparks" of human-level intelligence, or Artificial General Intelligence (AGI). The researchers characterize GPT-4's prowess as

"only a first step towards a series of increasingly generally intelligent systems."

However, they contend that GPT-4 is part of a new cohort of Large Language Models (LLMs) that exhibit more general intelligence than previous AI models. GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. However, the researchers admit that while GPT-4 is "at or beyond human-level for many tasks," its overall "patterns of intelligence are decidedly not human-like."

According to this paper, GPT-4 is able to:

- Use tools with minimal instruction and no demonstration

- Pass technical interviews

- Code (sometimes better than humans)

- Succeed at (some) international math olympics (requires creativity)

- Solve Fermi questions (e.g., how many golf balls can fit in a swimming pool?)

- Act as a personal assistant

- Create mental maps

- Theory of Mind (ToM): mental model of what other people are thinking ??

This will be for sure very controversial…but a bit of controversy is often good for science, isn’t it? And it makes for a good story.

With that being said, we’ve reached the end of our (LONG) weekly newsletter. I hope you had a good read. Feel free to share with your friends and send me your comments! See you next week!

Thibault