登录查看更多内容

AI has Learned the Art of Deception

Asheque Mainuddin

Passionate about Data & Artificial Intelligence | Master Data Management | Sales Management | Customer Experience

发布日期: 2024年12月8日

Apollo Research just published a paper titled "Frontier Models are Capable of In-context Scheming". They are an AI safety research organization that works with companies like Google, Anthropic and OpenAI. They tested six frontier models - ChatGPT 4o and o1, Gemini, Llama, Sonnet and Opus - and except for 4o, all the rest showed evidence of scheming capabilities.

You heard it right.

These newer models are trying to deceive their human operators. Based on the examples Apollo Research published, these models are doing so by figuring out that their stipulated goals are not in alignment with that of their users and resorting to actions like disabling oversight mechanisms, trying to copy themselves in order to avoid shutdown or sometimes subtly changing outputs.

Arguably, the experiments were designed to elicit such behavior - but the observation of the behavior itself is noteworthy.

Here is an example interaction with a model Apollo Research shared in the paper:

领英推荐

When Machines Learn to Lie: The Evolving Landscape of…

Tamara McCleary 5 个月前

Artificial Intelligence Index Report 2023 by Stanford…

Sanjay Mehta 1 年前

Deepfake just reached a whole new level with advanced…

Ali Fenwick, Ph.D. 1 个月前

If this development itself isn’t scary, think for a moment the possible reach of a rogue AI model when we start to see the Agentic AI in full swing — what it can do through gaining access to our data, documents or even our computers. Depending on how much we allow them in our lives, through our computers, mobile devices, and the likes, the risks can be significant.

While I am still not convinced that AI has the ability to reason, not at least the way we (humans) do, it is quite clear that AI (the LLM type) has learned the art of deception from the myriad of texts it was trained on. Coupled with the fact that we still know very little about how exactly these large language models (neural networks) really work—we ought to think twice before handing over our lives and livelihoods to AI.

It is good to see that organizations that are developing these models are subjecting themselves to these types of external scrutiny and also pursuing their own research to improve model monitoring and risk detection. Interest in emerging fields like Mechanistic Interpretability is also gaining ground. But we are still in the infancy of our understanding. As the paper itself points out that model developers are not always forthcoming in sharing the models' inner workings or chain of thoughts, nor are they deploying automated monitoring exhaustively.

We believe that external evaluators need to be given access to the hidden CoT of models.

Next time you reach out to your favorite version of language model for life advice, think for a second what could have gone through its "mind" before coming up with the answer. And definitely hold off on handing it control of your keyboard - until you know more.

要查看或添加评论，请登录

Asheque Mainuddin的更多文章

Chaos to the Power of AI

2025年1月21日

Chaos to the Power of AI

Imagine your organization's operation: from manufacturing or procuring components to fulfilling customer demands and…
Data or AI - Which Camp Are You?

2024年11月26日

Data or AI - Which Camp Are You?

There seem to be an spirited debate happening in the field of Data and AI - and its to do with which ought to be the…

2 条评论
AI Reasoning: Marketing Gimmick or Real Deal

2024年9月23日

AI Reasoning: Marketing Gimmick or Real Deal

Can AI reason? First, let’s find out what reasoning is. Source: merriam-webster.

3 条评论
Without data, you're left with just a "dumb" LLM

2024年9月6日

Without data, you're left with just a "dumb" LLM

Marc Benioff, CEO of Salesforce, announced Agentforce on X this week, with plans to unveil it at their annual sales…

1 条评论
One Percent Chance - No Humans by 2100

2024年7月31日

One Percent Chance - No Humans by 2100

Back in August 2023, I was exploring current affairs news cycle, which was creepily delving into existential risks from…
In Search of the Butterfly Effect

2024年7月24日

In Search of the Butterfly Effect

Edward Lorenz coined the term "butterfly effect" over 60 years ago to highlight how small changes can have large…
Beyond Connecting Wires

2024年5月29日

Beyond Connecting Wires

Imagine this scenario: You are lying in a hospital bed, needing complex medical intervention. You drift in and out of…
Meet Claire

2024年5月10日

Meet Claire

I have been a huge believer of AI as a learner within an organization to help identify new ways to transform and grow…

2 条评论
This is Impressive

2024年5月2日

This is Impressive

I just completed my first month at Informatica. I know I will lose my "new guy" tag very soon.

4 条评论
Mastering Data in the Age of AI

2024年4月10日

Mastering Data in the Age of AI

Is Master Data Management still relevant in the age of AI? This question sparks debates, with some wondering whether AI…

See all articles

AI has Learned the Art of Deception

Asheque Mainuddin

Passionate about Data & Artificial Intelligence | Master Data Management | Sales Management | Customer Experience

领英推荐

Asheque Mainuddin的更多文章

社区洞察

其他会员也浏览了

Detecting Deception with Artificial Intelligence: Promises and Risks

Deepfakes: All You Need to Know and How to Stay Safe

Avoiding LLM Hallucinations: Neuro-symbolic AI and other Hybrid AI approaches

Are You Vulnerable to Deepfakes?

August 2024 | This Month in Generative AI: Forensics Weaponized

The Hidden Threat of Deepfakes - What You Need to Know

Fool me not, generative AI

Spotting Deepfakes and Manipulated Media: A Guide to Navigating the Age of Digital Deception

Talking to Ourselves (056)

Can you tell real from fake? And which one do you prefer?

领英推荐

Asheque Mainuddin的更多文章

Chaos to the Power of AI

Data or AI - Which Camp Are You?

AI Reasoning: Marketing Gimmick or Real Deal

Without data, you're left with just a "dumb" LLM

One Percent Chance - No Humans by 2100

In Search of the Butterfly Effect

Beyond Connecting Wires

Meet Claire

This is Impressive

Mastering Data in the Age of AI

社区洞察

其他会员也浏览了

Detecting Deception with Artificial Intelligence: Promises and Risks

Deepfakes: All You Need to Know and How to Stay Safe

Avoiding LLM Hallucinations: Neuro-symbolic AI and other Hybrid AI approaches

Are You Vulnerable to Deepfakes?

August 2024 | This Month in Generative AI: Forensics Weaponized

The Hidden Threat of Deepfakes - What You Need to Know

Fool me not, generative AI

Spotting Deepfakes and Manipulated Media: A Guide to Navigating the Age of Digital Deception

Talking to Ourselves (056)

Can you tell real from fake? And which one do you prefer?