登录查看更多内容

"Surprise" as an aspect of learning

Rajeswaran V (PhD)

Generative AI specialist. AI Futures and AI CoE head

发布日期: 2023年1月17日

Ever since I joined #Capgemini I have kept this ("learning how machines learn") as my profile quote. My view is that #chatCPT is such an sensation because of the "surprise" factor it provides, which makes people go and try it out again and again. I want to discuss some aspects of #chatGPT as well as how large language models (LLM) are trained in the next few articles.

It is well know in research Dopamine release triggers humans when encountering novelty. This is one of the key insights into human learning. Reinforcement learning is a type of machine learning where an agent learns to make decisions by interacting with its environment and receiving feedback in the form of rewards or penalties. The agent's goal is to maximize the cumulative reward it receives over time. Both in machine learning and reinforcement learning, the conventional wisdom was to "teach" or "critic" the model giving it feedback, so that the agent or model can improve its performance. When I first encountered LLM's the first question was about the training objective and how did they get large labeled data? It was surprising to learn that there was no labeled data and this is actually unsupervised learning, where the model is trained on a large dataset of text without any explicit labels or supervision. The model learns patterns and relationships in the data, and uses that knowledge to generate new text that is similar to the training data.

Steve Nouri 1 年前

ChatGPT consistently fails (most parts of) the…

Jason M. Lodge 1 年前

#Teachers Worried about #ChatGPT? Wait until kids…

Alicia Colmenero Fernández 1 年前

OpenAI researchers Yuri Burda and team did a curious research where they removed the extrinsic reward function from the agent and used curiosity as an intrinsic reward function which uses prediction error as reward signal. The most surprising result was that the agent did very well in 54 RL benchmarks tested. This work is now extended to many RL domains.

OpenAI has been betting on scale and RL for fine-tuning the models. GPT itself is a fairly simple architecture, but scale, training data and optimizations are where the focus has been. Using RLHF they have fine-tuned the models. They are expecting people to 'hack" the system and provide the "surprise" which is required to make the model more robust. It is well known that people will try to break ChatGPT (Facebook Galactica survived 3 days, Microsoft Tey survived 16 hours !). Its very impressive how much use and abuse chatGPT is surviving.

"Surprise" as an aspect of learning

Rajeswaran V (PhD)

Generative AI specialist. AI Futures and AI CoE head

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

AI and I: My Journey of Learning Through Conversations

Should I Quit My Day Job to be a Prompt Engineer?

Want to start with Chat GPT and AI - just start!

A Teaching Dilemma and The Importance of Having The Right Data and Information

Chat GPT: 10 Hacks You Need To Know

10 Prompts to get you started as a Prompt Engineer .

Is ChatGTP secretly learning from you?

ChatGPT: The unexpected virtue of ignorance

ChatGPT and Me

Artificial Intelligence? I Rather Prefer the Term "Collective Intelligence"

领英推荐

Scaling laws

2024年2月4日

Copy of GenAI/LLM and productivity

2024年1月21日

Paper clip maximization

2024年1月14日

AI and research

2024年1月9日

Moravec's paradox and CV

2024年1月5日

AI robustness

2024年1月3日

AI for Software Engineering

2024年1月2日

AI in 2024 - some predictions

2024年1月1日

Dangers of over-simplification

2023年12月31日

LLMs and Theory of mind