Claude and "Constitutional" AI

Claude and "Constitutional" AI

For a while now, I have been of the firm opinion that we need to build in Asimov's Three Laws of Robotics into our AI models, especially given the speed of advancement of LLMs over the past couple of years. Am glad that folks are finally starting to do that.


Anthropic, a Google-backed AI company started by ex-OpenAI folks has just announced the release of Claude, a rival to chatGPT.


So how is Claude different? From the article:

Anthropic says that Claude — which, like ChatGPT, doesn’t have access to the internet and was trained on public webpages up to spring 2021 — was “trained to avoid sexist, racist and toxic outputs” as well as “to avoid helping a human engage in illegal or unethical activities.” That’s par for the course in the AI chatbot realm. But what sets Claude apart is a technique called “constitutional AI,” Anthropic asserts.
“Constitutional AI” aims to provide a “principle-based” approach to aligning AI systems with human intentions, letting AI similar to ChatGPT respond to questions using a simple set of principles as a guide. To build Claude, Anthropic started with a list of around 10 principles that, taken together, formed a sort of “constitution” (hence the name “constitutional AI”). The principles haven’t been made public. But Anthropic says they’re grounded in the concepts of beneficence (maximizing positive impact), nonmaleficence (avoiding giving harmful advice) and autonomy (respecting freedom of choice).

So looks like folks are finally trying to put in some version of Asimov's Three Laws of Robotics:

  1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
  2. A robot must obey orders given it by human beings except where such orders would conflict with the First Law.
  3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

The concepts of "beneficence seems to line up with Second and Third Laws. Nonmaleficence is clearly the First Law. Autonomy again lines up with the Third Law.

These are still early days, and, as the article states, there are ways to engineer prompts to get around the "Constitutional" limitations imposed by Claude, but perhaps this is a harbinger of how we need to proceed. Unfettered AI can have significant negative impacts on our societies. We must ensure, for our future generations that we continue to develop these technologies with built-in safety mechanisms.

Claude seems to be a good start.

要查看或添加评论,请登录

Arun Krishnan的更多文章

  • A new architecture that incorporates more human-like memory features

    A new architecture that incorporates more human-like memory features

    The one huge drawback of attention models that are ubiquitous in LLMs, is the fact that the memory requirements can…

    3 条评论
  • What's Deep about DeepSeek?

    What's Deep about DeepSeek?

    Deepseek has taken the LLM world by storm, achieving parity with the latest models from OpenAI at a fraction of the…

    16 条评论
  • BertViz - Visualizing Attention in Transformers

    BertViz - Visualizing Attention in Transformers

    With the increasing use of LLMs and Transformers in organisations, users are starting to demand explainability from…

  • Buffer-of-Thought Prompting

    Buffer-of-Thought Prompting

    With use cases becoming more and more complicated and agent-based systems becoming the norm for #GenerativeAI based…

    1 条评论
  • To Embed or not to Embed ...

    To Embed or not to Embed ...

    Everyone by now, ought to be familiar with the Retrieval-Augmented Generation (RAG) approach, wherein documents or text…

  • The GenAI conundrum

    The GenAI conundrum

    So you are the CEO of a company and have heard of this wonderful new toy called Generative AI. You call a meeting of…

    9 条评论
  • Understanding the craft of writing

    Understanding the craft of writing

    I have never written an article about writing. Even though I have published my first novel and three more are already…

  • Generating Images with Large Language Model (GILL)

    Generating Images with Large Language Model (GILL)

    By now, we all know that LLMs work by creating embeddings of sentences in a large, multi-dimensional textual space…

    2 条评论
  • Are neural networks actually starting to replicate the functioning of the human brain?

    Are neural networks actually starting to replicate the functioning of the human brain?

    Artificial Neural Networks (ANNs), as the name suggests were patterned after the way we thought the human brain worked.…

    2 条评论
  • All about Chain-of-Thought (CoT)Prompting

    All about Chain-of-Thought (CoT)Prompting

    The rapidity with which LLM models have been progressing has been nothing short of stunning. The last few months have…

    5 条评论

社区洞察

其他会员也浏览了