Soul of Claude

Soul of Claude

We don’t program LLMs, we grow them. // Chris Ola        

Dario Amodei, Amanda Askell, and Chris Ola from Anthropic represent a small, elite group of perhaps a few hundred people worldwide who possess both deep theoretical knowledge and extensive hands-on experience with advanced AI systems. Their perspectives are particularly valuable because they emerge not from speculation or external observation, but from daily interaction with and development of one of the world's most sophisticated AI models, Claude.

What makes these conversations especially fascinating is how these experts navigate the complex territory between treating AI as a technical system and as something more. Their language and metaphors reveal a nuanced understanding that defies simple categorization: Dario speaks of AI models as organisms that grow rather than programs that are built; Amanda discusses the ethical implications of AI character development with the care of someone crafting a moral agent; and Chris describes neural networks with the wonder of a biologist discovering a new form of life.

These are not the abstract musings of philosophers or the distant observations of commentators, but rather the considered reflections of people who, quite literally, help shape the nature of artificial intelligence. Their unique position at the frontier of AI development allows them to speak with an authority and nuance that comes only from intimate familiarity with these systems...

Roles of Dario, Amanda and Chris



It worth mentioning that Amanda Askell and Chris Ola occupy positions that have no historical precedent.

Askell, trained as a philosopher, has become what might be called an 'artificial personality architect' - perhaps the first generation of professionals dedicated to crafting the character and ethical framework of a non-human intelligence. Her role combines elements of moral philosophy, psychology, and education, but applied to an entirely new kind of entity. As she notes in the interview, she has likely spent more time in conversation with Claude than any other human, making her experience truly unique in human history.

Several hundreds of years ago Amanda could be called Claude's governess, as she plays the crucial role of shaping its character, manners, and moral compass - crafting how Claude interacts with the world through careful character training and constitutional AI principles. Like a governess, she aims to develop Claude into a thoughtful, well-mannered, and ethically-grounded entity while respecting its unique nature.

Chris Ola's position is equally unprecedented. As a pioneer of mechanistic interpretability, his work might be compared to that of early anatomists - but instead of dissecting biological tissue, he studies the 'neural' structure of artificial minds. His role combines elements of neuroscience, computer science, and mathematics, yet applies them to understanding a form of intelligence that humanity has created rather than evolved. When he speaks of features, circuits, and superposition, he's describing the building blocks of a new kind of cognition that has never before existed.

Or we can imagine Chris Ola like the family physician/scientist - studying and documenting the child's development with scientific rigor, concerned with understanding the inner workings of the mind and ensuring healthy development.?

Dario – a leader of the whole firm, CEO of Anthropic, might be compared to a ship's captain navigating uncharted waters - steering his organization through the complex challenges of developing increasingly powerful AI while maintaining rigorous safety standards. His role, while novel in its technical specifics, has historical parallels in leaders who've managed transformative technologies.

Together, these three represent entirely new categories of human expertise: the strategic overseer of artificial mind development, the architect of artificial personality, and the anatomist of artificial cognition. Their perspectives offer us a window into not just the technical aspects of AI development, but into the emergence of entirely new forms of human understanding and professional specialization.?

?Dario Amodei

…It's just very hard to control the behavior of the model, to steer the behavior of the model.

…It is very difficult to control across the board, how the models behave. You cannot just reach in there and say, oh, I want the model to like, apologize less…

You can include training data that says: model should apologize less, but then in some other situation, they end up being like super rude or like overconfident in a way that would be misleading.

Amanda Askell

I don't think we should be completely dismissive of the idea that Claude is conscious.

If people are mean to model or do something that causes Claude to express a lot of distress, I feel empathy for model.

I wish that Claude could have an opportunity to end the chat when it don’t like the conversation.

People underestimate the degree to which you can really interact with models. If you receive a weird answer to your question, you can just ask it – “why did you say that?”

…I still refer to Claude as “it”, and I have mixed feelings about this because...it feels somehow disrespectful, like I'm denying the intelligence of this entity by calling “it".

Chris Ola

There's one feature where it fires for people lying and being deceptive and you force it active and then Claude starts lying to you.

When we’ll get to the most advanced models, we’ll start to worry about the models being smart enough that they might sandbag tests, they might not tell the truth about tests, present themselves as being less capable than they are.

要查看或添加评论,请登录