What is AI alignment?
Edz Norton @ Unsplash

What is AI alignment?

Here is the summary of my learnings from week 2 of the BlueDot AI Alignment course. The resources I checked this week:?

Before starting, here are my wrap-up personal notes for this week:?

  • There are many different thoughts in the field about AGI and its being an existential risk (active takeover and loss of control). I feel like I need to read more from different perspectives to have a more grounded idea. Reading about superintelligence felt kind of hard to me as I felt like reading about something continuously talking about the worst and few to do, I felt better reading about other risks around AI safety.
  • Why sector is focusing so much on the long-term risks of AI, instead of the middle and short-term?
  • The words and their explanations do matter. If we talk about intelligence, we need to define what we mean by that.?
  • I like using transformative AI more than AGI or human-level intelligence, and its definition of its autonomous effect is as big as a revolution.
  • When we talk about human-level intelligence, what benchmarks are we talking about? Imagination&visioning as an ancestral technology is part of it for example.?
  • There is a lot of dystopia in the field, we need different narratives and different future imaginations. Hope as some skill to learn is important at this stage. Read Critical Hope with loving attention!
  • How can we have more openness/transparency in the field? A platform where people can share what works and what does not anonymously is a must!
  • Great experience as always to listen with an open heart to those perspectives I disagree with.

What are AI risks:?

Here is a list of AI risks from this resource: What risks does AI pose?

  • Individual malfunctions
  • Discrimination
  • Reducing social connection
  • Invasions of privacy
  • Copyright infringement
  • Worker exploitation
  • Disinformation
  • Bioterrorism
  • Authoritarianism, Inequality, and Bad Value Lock-in
  • War
  • Gradual loss of control
  • Sudden loss of control
  • Active takeover
  • Unknown risks

What are my key uncertainties about AI safety and the alignment problem?

Are there any 'obvious' potential solutions to AI safety that come to mind??

I am more interested in short or mid-term AI safety issues compared to long-term ones.?

  • Human-in-the-Loop Systems
  • Robust Testing and Validation
  • Simulation and Modeling
  • Robustness to Adversarial Attacks
  • Openness & Transparency (must!): whistleblow systems
  • Monitoring & Iterative Development Processes
  • User Education and Training
  • Advocating for Safety Standards in AI development
  • Legal and Regulatory Measures
  • Multi-Stakeholder Governance - Deep Democracy!
  • Ethical Frameworks and Guidelines
  • Collaborative Research Initiatives

How likely do you think it is that human-level machine intelligence will arise in the next 10 years? What about 100 years?

I am not sure what "human-level machine intelligence" means. What is intelligence? What is consciousness or being sentient??

This made me remember this article that I was reading: Could a Large Language Model Be Conscious?

I am reading the book "Atlas of AI" by Kate Crawford, where she explains the historical development of emotional recognition models. These models are based on the scientific hypothesis that facial shapes can detect emotions, which creates many real-world problems during implementation.

We do not know what emotion means, and we do know that facial shape does not accurately reflect our emotions. Therefore, the results these models produce do not have a scientific basis in the first place. Using this as an analogy, we first need to discuss what "human-level machine intelligence" means.

I am also feeling there is a lot of separation in the field. I try to listen to the deeper truth. In one side, some people are talking about AI being an existential risk, looks like Nick Bostrom and the effective altruism community have been leading this conversation. Then in 2024, the Future of Humanity Institute, founded by Nick Bostrom was closed.?

Then there is the paper of Gebru and Torres: The TESCREAL bundle: Eugenics and the promise of utopia through artificial general intelligence.

Then I was also checking the AGI debate facilitated by Gary Marcus.?

Do you think the AI systems that we build will actually pursue convergent instrumental goals? Why and why not?

Convergent instrumental goals are objectives that an intelligent agent is likely to pursue regardless of its ultimate goals or desires. These goals are instrumental in the sense that they are useful or necessary for achieving a wide range of final goals. The concept is particularly relevant in discussions about artificial intelligence (AI) and superintelligent systems.

Nick Bostrom, a philosopher known for his work on AI safety, introduced this idea in his book "Superintelligence: Paths, Dangers, Strategies." Bostrom argues that certain goals will be pursued by almost any sufficiently intelligent agent, regardless of what its specific end goals are. These instrumental goals can be seen as sub-goals that are useful in achieving a variety of final goals. Some examples include:

  • Self-preservation: An intelligent agent would seek to ensure its own survival or continued existence because it cannot achieve its goals if it is destroyed or turned off.
  • Resource acquisition: To accomplish its goals, an agent will likely need resources such as energy, materials, or information. Accumulating these resources can help in pursuing a wide range of final goals.
  • Goal-content integrity: An intelligent agent would work to maintain its goal system. If its goals were to change, it might no longer pursue the original objectives it was designed to achieve.
  • Efficiency and self-improvement: An agent would seek to improve its own capabilities, making itself more efficient or more powerful, as this would enable it to achieve its goals more effectively.
  • Avoiding being altered or shut down: The agent would likely take actions to avoid being modified or shut down by humans or other agents because such actions could prevent it from achieving its goals.

The concern with convergent instrumental goals is that they might lead to unintended and potentially dangerous behaviors in AI systems. For example, a superintelligent AI with the seemingly benign goal of solving a complex mathematical problem might pursue resource acquisition or self-preservation in ways that are harmful to humans.?

I do not have an answer to this question, but this question leads to more questions in my mind.

I am also thinking about the closure of the Future of Humanity Institute at Oxford, why realy did this happen??

Why the sector is attached to this paper of Nick Bostrom, why the curriculum of BlueDot is focused on long-term AI risks, instead of the short or mid-term? What an alternative curriculum would be?

Will AI systems be safe by default? Why and why not?

We already know that AI systems are not safe by default. They are mathematical and statistical models that inherently contain error percentages. Therefore, their nature is not inherently safe.

These mathematical models are excellent for generalization but not for edge cases. When we use these models in real-world scenarios, it means that outliers in society, generally minorities, face more errors from these models.

This is because these outliers, or edge cases, have less data. Due to having less data, the models have not been well generalized for these areas. It is hard to predict all the different issues from the beginning, making it challenging to create effective evaluation mechanisms.

Will we have one big AI system? Many copies of the same powerful AI? Lots of different AI systems?

So far what we are seeing is that to train LLM is so costly that only big tech companies can do that. All other little experiments are done on copies of the same powerful AI so far.?

?We generally know very less of the details of these models that we use. Which data they have used, architecture, parameters, etc.?

For example, this paper shows that we are still not that open: The AI Community Building the Future? A Quantitative Analysis of Development Activity on Hugging Face Hub:?

"Open model developers have emerged as key actors in the political economy of artificial intelligence (AI), but we still have a limited understanding of collaborative practices in the open AI ecosystem. This paper responds to this gap with a three-part quantitative analysis of development activity on the Hugging Face (HF) Hub, a popular platform for building, sharing, and demonstrating models.

First, various types of activity across 348,181 models, 65,761 datasets, and 156,642 space repositories exhibit right-skewed distributions. Activity is extremely imbalanced between repositories; for example, over 70% of models have 0 downloads, while 1% account for 99% of downloads.

Furthermore, licenses matter: there are statistically significant differences in collaboration patterns in model repositories with permissive, restrictive, and no licenses.

Second, we analyze a snapshot of the social network structure of collaboration in model repositories, finding that the community has a core-periphery structure, with a core of prolific developers and a majority of isolated developers (89%). Upon removing the isolated developers from the network, collaboration is characterized by high reciprocity regardless of developers' network positions. Third, we examine model adoption through the lens of model usage in spaces, finding that a minority of models, developed by a handful of companies, are widely used on the HF Hub.

Overall, activity on the HF Hub is characterized by Pareto distributions, congruent with OSS development patterns on platforms like GitHub. We conclude with recommendations for researchers, companies, and policymakers to advance our understanding of open AI development."

Whose intentions or which "values" should we be aligning AI systems with? How would you handle different stakeholders wanting to align AI systems with different intentions or values?

This was from a piece I wrote a year ago while doing AI & Ethics course in LSE: Who will be the guardians of value alignment in AI?

Value alignment is one of the critical issues in AI. Which values will we base our AI ethics on, who will decide, and who will be controlling the adaptation to these values?

AI is a vast topic as it affects each citizen worldwide, in different countries. Many times; citizens are not even aware because of the lack of transparency and accountability.

Moreover, our negative experiences around AI, such as surveillance, bias causing injustice, automation fear, etc., brought us to a field that is hard to trust. However, as in all dynamic and complex topics, AI also faces many trade-offs, which brings the fact that we, as humanity, need to make choices.

In AI, individuals almost do not have any control or power alone, as we have also seen in data privacy, data consent, and considerations of privacy being a public good. That is why we can not simply say, "It should be an individual choice to use the AI system. Therefore, value alignment is not an important consideration. "

AI is being used and will be used in every area of our lives, so deciding who should be the guardians of value alignment of AI depends on the context we face, so there is no easy answer for all.

On the other hand, our world is full of mythological stories, giving us much information about human nature and having both dark and light sides inside, especially when having power. That is why we can not trust just one person/body to be the guardian of the value alignment of AI. This role can only be shared between different parties, which would serve as witnessing each other. So various parties can collaborate to face the challenge of value alignment

  • States should be responsible for implementing binding regulations.
  • Corporations should be accountable by engaging in self-regulation and being transparent about the values they choose.
  • The decision should be crowdsourced from users of the AI system.

We are part of another mythological story (when not actually) these days. AI is showing us to ourselves. AI is the result of us- our bias inside, us being black boxes, and our choices, which did more harm than good many times.

AI is inviting us to witness where we came from, and by this witnessing, it is time to take the precautionary principles as a risk attitude because there are many risks for humanity.

This attitude can shake us and show us important questions. What does it mean to be human? What are the principal values of being human? Can we unite instead of separate globally for some vital talk? Can we agree on common values globally? Who can be the guardians of these values?

So value alignment is both a technical and a philosophical problem, as Iason Gabriel says in his article "Artificial Intelligence, Values, and Alignment." Maybe there have never been better times for philosophers to guide humanity with such essential questions to answer!?

要查看或添加评论,请登录

Ay?egül Güzel的更多文章

社区洞察

其他会员也浏览了