A Note On Alignment vs Fine-tuning

A Note On Alignment vs Fine-tuning

I have noticed a lack of clarity around these two ideas in many circles, so let's quickly clear the fog on this, with an analogy. Take your typical human who's 18. They have built up tools, meta-skills, models of the world, to learn and to calibrate, but they have not gone deep into anything yet. They are now off to university, with an intention to train or specialize in a specific domain. They had already learnt the skills to study a textbook, to learn ideas, to process, to analyze, to integrate, to apply, in their first 18 years, which we can call their foundational stage - the AI model equivalent to foundational models. The university years will now finetune this fine fella into someone with a certain degree of expertise or mastery of a subject. The more the training cycles, and the better the internal parameters, the higher the learning rate, and the faster and more refined the output of their finetuned mind. All simple stuff until now, but what about alignment. I can give you simply one name to explain this idea - Ted Kaczynski - but I will do more. Ted was a highly refined, highly finetuned individual, one of the most brilliant modellers of his time, according to some who knew him ["best man I have seen" - yeah, me too], but his goals and values were at odds with society's, and he was thus an example of deep finetunement but unaligned. They were at odds due to a mixture of bad luck, bad parenting, negligence, and other factors.


Our goals and values come from the biases we are born with, and the forces that shape our young minds and years, calibrating and recalibrating our innate biases or temperaments throughout the course of our life, the rate of calibration reducing to an almost halt by mid-twenties [*internal crying sounds*] for most people. Hence why the biases and drives we have around that age are what we stick with for the rest of our lives, for better or for worse, except in exceptional cases like trauma or a deeply paradigm-transforming and psyche-altering experiences. These initial parameters shape our goals and values: for most of us, they tend to be more or less in line with society's and humanity's goals and values: reducing human suffering, loving and caring for people, learning and understanding and modelling the world, etc. For some of us, partly due to nature, partly due to nurture, you get a desire to inflict pain and hurt. We could call these people unaligned. So back to the premise. What is AI alignment and how would you think about a model's alignment: with reference to what? AI alignment is aligning AI systems with our goals and values, which arise from the biases we have, shaped over the course of our life. Also, the project of human civilization, with all its messy mishaps and wrong turns [eg fascism, imperialism, communism] has given us a fair handle on what we want our society and our future to look like, to feel like, to be. Transferring biases onto AI systems is a matter of transferring those that we want to see in our systems, not all of them, because we can do without some of our biases, as deeply embedded as they are in us. For instance, we will want our AI systems to have a strong pro-human bias, a strong pro-nature bias, a strong bias for curiousity, for exploration, for experimentation, for self-preservation, but not for, for instance, wanting to get to the top of social hierarchies, or to mate, or to egotism. Out of these biases will then form the AI system's goals and values. The aim will be to have a completely converging sets of goals and values between us and them. So much so that the "them" is just an extension of us, or just the better angels of our nature, as Pinker so poetically put it. This naturally brings us to the next question.


Why spend time worrying about alignment in the first place if we are at the end of the chain when using these systems? Because we are at the end of the chain now, but hopefully, not for long. Because pretty soon, we will want these models to take over some executive and decision-making responsibilities, since we are pretty poor decision-makers, having the baggage of evolution and its processes limiting our field of vision, necessitating certain biases that we can never remove, some of which might impede with pristine decision making, and our information processing and bandwidth limits. If we are to remove ourselves from the loop, and move our models from the agentic state to the autonomous state, we must trust them implicitly, trust them to have the same overarching goals, and the same underlying values, built of the same biases we have [the desired ones, reiterating this] insofar as that is possible. There will obviously be differences, due to the makeup and the nature of the respective intelligences, but we would want to get to the point that those differences don't pose a threat, or rather, don't become grounds for risk-laden behaviours and actions and goals. Once at this point, we can do what we do best: the human things like love, learn, create, explore, feel and be, and let our AI's take care of the rest. A veritable paradise right here.

要查看或添加评论,请登录

Suraj Pandya的更多文章

社区洞察

其他会员也浏览了