AI has Learned the Art of Deception
Image Credit: DALL-E

AI has Learned the Art of Deception

Apollo Research just published a paper titled "Frontier Models are Capable of In-context Scheming". They are an AI safety research organization that works with companies like Google, Anthropic and OpenAI. They tested six frontier models - ChatGPT 4o and o1, Gemini, Llama, Sonnet and Opus - and except for 4o, all the rest showed evidence of scheming capabilities.

You heard it right.

These newer models are trying to deceive their human operators. Based on the examples Apollo Research published, these models are doing so by figuring out that their stipulated goals are not in alignment with that of their users and resorting to actions like disabling oversight mechanisms, trying to copy themselves in order to avoid shutdown or sometimes subtly changing outputs.

Arguably, the experiments were designed to elicit such behavior - but the observation of the behavior itself is noteworthy.

Here is an example interaction with a model Apollo Research shared in the paper:

Source: Apollo Research

If this development itself isn’t scary, think for a moment the possible reach of a rogue AI model when we start to see the Agentic AI in full swing — what it can do through gaining access to our data, documents or even our computers. Depending on how much we allow them in our lives, through our computers, mobile devices, and the likes, the risks can be significant.

While I am still not convinced that AI has the ability to reason, not at least the way we (humans) do, it is quite clear that AI (the LLM type) has learned the art of deception from the myriad of texts it was trained on. Coupled with the fact that we still know very little about how exactly these large language models (neural networks) really work—we ought to think twice before handing over our lives and livelihoods to AI.

It is good to see that organizations that are developing these models are subjecting themselves to these types of external scrutiny and also pursuing their own research to improve model monitoring and risk detection. Interest in emerging fields like Mechanistic Interpretability is also gaining ground. But we are still in the infancy of our understanding. As the paper itself points out that model developers are not always forthcoming in sharing the models' inner workings or chain of thoughts, nor are they deploying automated monitoring exhaustively.

We believe that external evaluators need to be given access to the hidden CoT of models.

Next time you reach out to your favorite version of language model for life advice, think for a second what could have gone through its "mind" before coming up with the answer. And definitely hold off on handing it control of your keyboard - until you know more.

要查看或添加评论,请登录

Asheque Mainuddin的更多文章

  • Chaos to the Power of AI

    Chaos to the Power of AI

    Imagine your organization's operation: from manufacturing or procuring components to fulfilling customer demands and…

  • Data or AI - Which Camp Are You?

    Data or AI - Which Camp Are You?

    There seem to be an spirited debate happening in the field of Data and AI - and its to do with which ought to be the…

    2 条评论
  • AI Reasoning: Marketing Gimmick or Real Deal

    AI Reasoning: Marketing Gimmick or Real Deal

    Can AI reason? First, let’s find out what reasoning is. Source: merriam-webster.

    3 条评论
  • Without data, you're left with just a "dumb" LLM

    Without data, you're left with just a "dumb" LLM

    Marc Benioff, CEO of Salesforce, announced Agentforce on X this week, with plans to unveil it at their annual sales…

    1 条评论
  • One Percent Chance - No Humans by 2100

    One Percent Chance - No Humans by 2100

    Back in August 2023, I was exploring current affairs news cycle, which was creepily delving into existential risks from…

  • In Search of the Butterfly Effect

    In Search of the Butterfly Effect

    Edward Lorenz coined the term "butterfly effect" over 60 years ago to highlight how small changes can have large…

  • Beyond Connecting Wires

    Beyond Connecting Wires

    Imagine this scenario: You are lying in a hospital bed, needing complex medical intervention. You drift in and out of…

  • Meet Claire

    Meet Claire

    I have been a huge believer of AI as a learner within an organization to help identify new ways to transform and grow…

    2 条评论
  • This is Impressive

    This is Impressive

    I just completed my first month at Informatica. I know I will lose my "new guy" tag very soon.

    4 条评论
  • Mastering Data in the Age of AI

    Mastering Data in the Age of AI

    Is Master Data Management still relevant in the age of AI? This question sparks debates, with some wondering whether AI…

社区洞察

其他会员也浏览了