Towards Self-Propelled AI: Is It Possible?

Towards Self-Propelled AI: Is It Possible?

What is the nature of curiosity? Is there any scientific way to understand the origin of this mysterious force that drives the behavior of even the “stupidest” naturally intelligent systems and is completely absent in their “smartest” artificial analogs? Can we build AI systems that could be curious about something, systems that would have an intrinsic motivation to learn? Is such a motivation quantifiable? Is it implementable? Will we ever see artificially built systems having their views, values or goals? Or maybe the only mission of AI is to imitate intelligence, fool Turing test judges and build next-generation gadgets? These are the main questions I’m trying to address in my article entitled

"Can Turing machine be curious about its Turing test results? Three informal lectures on physics of intelligence"

which just appeared as an arXiv's e-print #  arXiv:1606.08109 [cs.AI]:  

I'm discussing these questions from the standpoint of physics. Treating intelligence as a physical phenomenon not only allows us to understand what its driving force is, but also gives us a powerful formalism capable of studying it mathematically in a systematic and unified way. The relationship between physics and intelligence is a consequence of the fact that “correctly predicted information” is nothing but an energy resource, and the process of thinking can be viewed as a process of accumulating and spending this resource through the acts of perception and, respectively, decision making. The natural motivation of any autonomous system to keep this accumulation/spending balance as high as possible allows one to treat the problem of describing the dynamics of thinking processes as a resource optimization problem. 

In this article, I’m proposing and discussing a simple theoretical model of such an autonomous system which I call the Autonomous Turing Machine (ATM). The potential attractiveness of ATM lies in the fact that it is the model of a self-propelled AI for which the only available energy resource is the information itself. For ATM, the problem of optimal thinking, learning, and decision-making becomes conceptually simple and mathematically well tractable. This circumstance makes the ATM an ideal playground for studying the dynamics of intelligent behavior and allows one to quantify many seemingly unquantifiable features of genuine intelligence.

A closer look at this subject reveals its cross-disciplinary nature: it turns out that there are many striking parallels between diverse branches of artificial intelligence on the one hand and theoretical physics and business economics on the other hand. For this reason, I wanted to target this text to a maximally broad audience, including physicists, computer scientists, business analysts and philosophers of science. 

The full text of this article (its pdf file) is freely downloadable from 

https://arxiv.org/ftp/arxiv/papers/1606/1606.08109.pdf

Below is a slightly modified and shortened version of its introductory part.

------- : -------

Is motivation quantifiable?

Many of today’s AI systems look astonishingly smart. They can perform complex tasks, learn fast, outperform humans in many areas and even pass Turing test. Tomorrow’s AI systems will probably look smarter.  Huge progress in this direction is stimulated by impressive advances in the area of “deep learning” with all its sub-branches such as sparse auto-encoders, restricted Boltzmann machines and others.  We are not yet in the position to present a universal algorithm capable of learning from any data, but many researchers believe that sooner or later such an algorithm will be created. However, when created, will it represent any truly intelligent system?

Since there is no accepted definition of “intelligence” the audience may split in answering this question. Probably the overwhelming majority will say yes. I do not belong to this group because I do not believe that intelligence can be reduced to the simple ability of a system to learn. If we define intelligence this way, we will miss the main point – the motivation to evolve towards intelligence. What should motivate, for example, the auto-encoders or restricted Boltzmann machines to have the learning-supporting architectures they have? After all, what motivates them to learn?  These are not na?ve questions, not at all. If the motivation to learn is absent – the system hardly can be qualified as intelligent and autonomous. Indeed, to be intelligent, it is not sufficient to be able to calculate, be able to solve problems and be able to answer questions – one should need to calculate, need to solve problems and need to answer questions. Even more: one should need to ask questions. All these particular needs should originate from an intrinsic need of a system to do something useful not only for us, its creators (and this is what the existing AI systems already do) but for themselves. This “artificial egocentrism” or “machine curiosity” – whatever we call it – is something we need to take very seriously because otherwise we will be forced to always deal with “universal answering machines”. Those could be very helpful, no doubt, but not “naturally” intelligent.  They never will have an internal drive to increase the level of their intelligence and will never be independent on us, humans. What I would like to have instead is, metaphorically said, a certain “universal asking machine” which, as I sincerely hope, will be kind enough and not too busy with its problems to be willing to answer some of my questions too.

But do we need such machines? Are we ready for this switch from “machine learning” to “machine asking”?  Is it not too dangerous for the humankind? Is “AI slavery”, so to speak, a safer solution for us than “AI partnership”? These are all rhetorical and futurological questions. The problem I want to discuss here is much more practical from today’s standpoint: is it possible, at least in principle, to build such a machine?

I think the global answer to this question should be yes, which follows from the very fact of our existence: nature has already demonstrated the feasibility of this program.  I do not think it is necessary to mimic how nature has done it – the only thing we need is to understand the basic idea and once it is understood start looking for the shortcuts – like we have already done many times in the past. And this is where physics can help us.

The magic formula

The secret word linking intelligence to physics is “energy”. Each time the physicists succeed in relating energy to something else results in huge breakthroughs in both science and industry.  This is quite understandable because energy is the main resource supporting our life and any clue shedding light on the ways of obtaining and controlling it is of primary importance for us.

One of the simplest and best-known examples of such relationships is the famous Einstein’s formula E=mc^2 which establishes the equivalence between mass and energy. Another example is Planck’s formula  E = hv establishing the relation between the frequency of a light wave and the energy of its quant (the photon). The role played by these two formulas on our lives is hard to overestimate.  It is huge: one can safely say that most of our today’s technologies are directly or indirectly based on them.

Here I am going to discuss another formula for energy which is not as widely known as the previous two but whose impact on our life may be even higher.  This formula was derived by Ralph Landauer in 1961 and has a very simple-looking form:

E = k * T * I * ln2.

Here, k is the so-called Boltzmann’s constant,  T is temperature, and I is information. Why is this formula so important? Because it establishes the equivalence between information and energy, which, in turn, creates the link between intelligence and physics.  This formula is just a quantitative manifestation of the amazing fact that information is a resource, exactly in the same sense as gas is a resource for our cars and food is a resource for all living beings. It is simply a certain low entropy stuff that can be consumed and then converted into useful work as any other fuel can.  Consumption of information is what we call perception and the term “useful work” may stand for any of the useful actions the intelligent system can perform.

Of course, the energy scales characterizing these consumption-action processes are negligibly small in comparison with the scales we deal on an everyday base and at which our today’s computers operate. Indeed, Landauer formula shows that 1 bit of information at room temperature can be converted into about 1/300,000,000,000,000,000,000 Joules of energy. Even in terms of the maximal capacities of personal computers, it looks a rather small amount. For example, 1 TB of information may give us the energy equal to only 1/40,000,000 Joules, which, roughly speaking, is the energy of a grain of rice moving with the velocity of 1 cm/sec. However, this tiny amount could be quite noticeable at the microscopic scales (or more precisely, at the molecular levels) at which our future computers may work. And this is the key point because it allows us to say that future intelligent systems working on molecular scales may have the motivation of behaving in such a way that maximizes the balance between their perceptions and actions!

Thinking as a resource optimization process

So we see that one of the benefits of treating information as a resource is that the notion of motivation arises in this case in a very natural way. And this also leads us to the conclusion that internally motivated intelligent behavior could be mathematically describable as the process of optimizing the usage of a resource. Since the latter is a physical quantity, this opens the possibility of studying the behavior of intelligent systems by using a purely physical language. And this is an intriguing possibility. 

Technically every resource optimization problem is reduced to the two closely related questions:  how to maximize consumption and how to minimize spending.

To answer the first question, we can imagine a hypothetical and somewhat idealized situation where the system we are interested in is microscopic and the energy scales  are quite comparable with its energy needs. In this case, the system would be interested in extracting the energy directly from the information that surrounds it. There is, however, a little problem with such an extraction because not any information can be used as fuel. To be a fuel (or, in other words, to be able to perform some useful work), the information must be a priori known. To be more precise, the energy value contained in one bit of information depends on the extent to which we know it in advance.  In other words, the only way for a system to increase the energy value of the information it consumes is to predict it better. This fact creates a strong internal motivation for a system to learn – i.e. to discover the relationships between different spatial and temporal parts of the external world.

What about the second question? At first glance the answer to it seems quite straightforward:  the system should maximally reduce the number of its actions because each action leads to the loss of energy.  However, this simple recipe hides a rather serious problem. The point is that the system cannot simply skip all of the actions it performs regardless of their kind. There are some of them that should be performed, no matter what, again and again. The most typical examples of such non-skippable actions are those that are crucial for the system’s survival, as, for example, the search for new possible locations of fuel.  There are some other non-skippable actions like learning new patterns, recovering from errors, etc. This situation creates another strong motivation for a system to be maximally disciplined in spending its resource and carefully decide which actions to perform and which not.

Combining these two answers together we arrive at the picture in which the thinking process appears as a chain of observations and decisions or, in the resource-based language, as a chain of consumptions and spendings.  The maximization of the consumption/spending balance along this chain is a very non-trivial optimization problem. The better the system can solve it, the smaller the amount of external energy supply needed for normal functioning will be.

It is very tempting to stop here and jump into a discussion of this energy saving problem because of its huge theoretical importance and many practical applications.  However, we will do that later.  Now we want to go a little bit further and ask the following question: If the process of thinking is the result of optimizing the cumulative consumption and spending balance, is it possible to optimize it in such a way that it would remain positive all the time? Or, in other words, can we close the perception-decision chain and convert it into a perception-decision loop? Answering “yes” to this question would mean that the intelligence could be self-propelled – i.e., not requiring any extra sources of the traditional fuel at all. Anything that such a system would need for normal functioning it could find in the information that surrounds it.

The last statement probably needs some clarification, simply because of the equivalence between information and fuel we just stated.  Indeed, how can an AI system distinguish between what to look for, for fuel or for information, if they are the same? Of course, it cannot, but the main idea behind this approach is just to avoid treating information and fuel as two different types of resources. We want to use instead only one type because this is methodologically much easier.  It is convenient to choose for this role the information – and the formula ln2allows to do so.

The goal of this paper is just to explore the very possibility of a bit-level self-propulsion – which is the road to autonomous AI systems. True motivation and thus true intelligence is not achievable without true autonomy. It is the real driver of evolution because the motivation for increasing the efficiency and thus complexity comes from the need to survive in the situations when the external supply of resources is not guaranteed.

The autonomous Turing machines

To start realizing this program in a consistent way, we will need some toy model simple enough from the purely technical perspective and sufficiently rich at the same time from the standpoint of its practical usefulness and further scalability. This model should lie somewhere on the border between intelligent, living and non-living systems and, to some extent, have features of all of them. Ideally, the role of such a model should be similar to the role played by the hydrogen atom model in physics or by the “Hello World!” program in programming.  If one understands physics or programming at this level, then one probably has all the chances to understand the rest.

But how shall we find an adequate toy model suitable for our needs? Fortunately, things are not as complicated as they may seem at first glance. The model which may satisfy us at least at the initial stage does exist, is describable in a very simple way and can be thought of as a very natural extension of the good old Turing machine. Because of this similarity, we will call our model the Autonomous Turing Machine.

Firstly, remember some basic facts about the Standard Turing machine (STM). The STM is a hypothetical computing device living in a hypothetic memory space divided into an infinite number of cells. This device is capable of successively visiting diverse memory cells, reading information stored in them, processing it and writing the results back to the same memory space. The machine itself can be in a certain finite number of internal states.  Its central processing unit – the so-called finite state automaton – operates as follows: it takes as an input the current state of a machine together with the symbol it has just read from the current cell and produces the output. The latter includes a) the new symbol which overwrites the old one in the same cell, b) the new state of the machine and c) the new move which is the instruction to go in a given direction or just to stop.

The initial content of the memory space is our assignment – this is what we want a machine to do for us, which includes both the data and the program – i.e., the instruction how to process the data. After completing all the computation, the machine stops. The content of memory space at that time is what we call the result of computation. 

The importance of STM lies in the fact that it is capable of performing any computation and, at the same time, it is simple enough to be considered as a convenient playground for developing and testing diverse theories of computer science.  One of the most important distinguishing properties of the traditional Turing machine is that it does something useful for us, i.e., its users. We write the programs for STM, we provide it with data, we start it, we wait until it finishes and we become nervous if it does not stop. The machine itself does not care what it does, why it does it and how it does it.  It is simply a tool for serving our needs and simplifying our work in fulfilling these needs – it is like a shovel, hammer or bulldozer, but nothing more.

Can we use such a machine for programming artificial intelligence? Probably yes, but only to some extent – until we can anticipate in advance all the tasks the machine may need to perform for us. We may be very smart and build very smart machines which will fulfill all our current needs. But in any case, this will not be genuine intelligence because it assumes the ability to learn new things and – which is even more important – the ability to define new problems.

We can say that traditional computers help us to answer our questions. But what we intuitively expect from intelligent systems is the ability to ask questions. We want to deal with computing devices which would help us to be on the cutting edge of progress and drive it by creating new knowledge about nature. But if all this knowledge needs to exist in advance on the tape presented to Turing Machine then what would we expect that machine to do for us?

Here we consider another version of Turing machine which, as we think, is free of the problems we mentioned above. We can call it the Autonomous Turing Machine (ATM).  The idea is to let the machine itself decide what it needs.  The first thing the ATM may need is energy. Having the energy, it can move, perform calculations and maybe even produce something useful for us. But how can the machine find this energy? Do we need to equip it with batteries or power supplies? Not necessarily. Here is a good place to remember that information can serve as a fuel, so theoretically our ATM may find everything it needs in the memory space where it lives and moves. 

This circumstance makes ATM similar to an autonomous robot placed in an unknown fuel-bearing terrain and trying to survive in it. To survive the robot needs to consume fuel. But to consume fuel it needs to find fuel deposits. But to find fuel deposits it needs to understand the patterns of its possible distribution – i.e., to learn from experience. Then it needs to move to the places where it thinks the chances of finding fuel are high. All these actions require energy which robot can extract only from the fuel it consumes. We arrive at the infinite logical loop: the robot consumes fuel to be able to act and acts to be able to consume fuel again. And this is what we call life.  If we replace the word “fuel” with the word “information” and the word "robot” with the word "ATM”, we will get an idea of how the ATM’s dynamics in memory space may look. 

What we plan to discuss here

This cycle consists of the following three lectures:

  1. The life of information
  2. Intelligence: an internal point of view
  3. Intelligence: an external point of view

Lecture one is mostly based on the results obtained in the middle of the 20th century in attempts to resolve the famous Maxwell’s daemon paradox and actually stating the equivalence between information and fuel.  Freely rephrasing these results I will show how to use this equivalence for building information-driven engines capable of converting any predictable information into useful work. The main concluding messages of this lecture are:  To create new information we need to spend energy. To extract energy from the existing information we need to know it in advance. The energy extraction process kills information – it becomes a waste. One can build information-driven engines that are tolerant to errors. The effectiveness of these engines is describable by Kullback-Leibler formula for information gain. Different information patterns require different engines and vice versa:  simply speaking, patterns and engines should match. In other words, there is no such thing as an absolute value of information – this value can only be defined and measured relatively to a certain engine, and this is probably the central point of this lecture. The rest of it is devoted to energy aspects of computation. The main message of this part is that computation if properly organized, does not require any energy. It does not directly lead to any energy gains either.  This raises the question why we may need computation if it does nothing. The answer is given in lectures 2 and 3.

In lecture two we provide an internal view on computation by discussing it from the standpoint of the computing device. We demonstrate that by using properly organized computations one can increase the initial energy value of practically any information reaching an energy extraction device specified a priori. In other words, computation can be viewed as a process of making information ready for the consumption by a given engine. In a sense, this process is similar to the process of cooking food before eating it. We call this procedure the information refining process and show that it lies in the basis of any statistical learning algorithm. Computation can also be considered as a unitary evolution operator applied to the input vector of registry bits. Being itself energy-neutral, it transforms this input vector into an output vector having maximal energy value, i.e., matching a given engine the best possible. The energy extraction procedure is, however, non-unitary and leads to a registry vector collapse – it forgets everything. This makes this process similar to what happens with the wavefunction in quantum systems after the measurement . The rest of this lecture is devoted to autonomous Turing machines considered from the standpoint of their internal organization.  We describe the two main building blocks of these machines and outline their functionality. These two blocks are responsible for (i) understanding and (ii) making decisions. From the energy perspective, the "understanding block” maximizes energy accumulation while the "decision block” minimizes its spending. The optimal dynamics of ATM can be derived from the problem of statistical optimization of the accumulation and spending difference.

In lecture three we consider the problem of computation from the external point of view. As an example, we discuss autonomous Turing machines considering them as point-wise objects and show that the problem of finding the optimal trajectory of such ATMs in memory space reveals striking similarities with the famous Least Action Principle in the context of finding the optimal trajectory of a mechanical particle moving in an external potential. Basing ourselves on this equivalence, we will present the exact solution of this problem and discuss it in the context of diverse physical systems. The most interesting aspect of this discussion is the similarity between mathematical structures appearing in the description of the open and closed, living and non-living, and even physical and socio-economic systems. Then we will discuss some general questions related to the relationship between models of artificial intelligence, theoretical physics, and business economics. We will show that all these systems can be treated and examined using the resource maximization principle which seems to be a natural generalization of the least action principle from closed conservative to open autonomous systems.  It is interesting that this purely theoretical discussion may have practical implications, too. For example, the almost literal analogy between businesses and simple mechanical systems like swings allows one to conjecture that one of the ways of facilitating business growth could be based on the effects of parametric resonance which seem to be typical for both mechanical and economic processes.

As seen from this plan the subject I am trying to cover here is very cross-disciplinary and assumes a rather broad target audience including people working in diverse areas of physics, computer science, and business economics. For this reason, I tried to keep the style of this discussion at the highest possible level. If you think you will find some ready-for-implementation solutions, new algorithms or hardware architectures, you are mistaken. However, if you are ready to look at the existing problems from a somewhat different angle, these lectures are is for you.

Most of the facts mentioned in this text in connection to physics of information should not surprise people with the background in theoretical physics or evolutionary biology working in the area of computer science, especially in the business environment. There are many excellent reviews, books and research papers discussing this subject from different angles and partially overlapping with the factual material presented here.  I have already mentioned some of them earlier.

Nevertheless, the goal of this text is not a simple exposition of known facts. This is rather an attempt of their unification into some new piece of knowledge which, as I believe, may have its own spectrum of practical applications. This is actually an attempt to approach the set of very specific, extremely complex and seemingly unrelated problems of artificial intelligence, theoretical physics and business economics from the positions of a certainly much broader but substantially simpler problem. I believe that such unification not only would allow one to simplify developing new methods in machine learning and applied business economics but also would give us another fresh view on the problems of theoretical physics. I could not resist a strong desire to find a systematic way of exposing these ideas as a whole, which gradually has crystallized in my decision to write this text and essentially determined its form as a cycle of lectures. These lectures have never been delivered to any real audiences – the chosen format simply reflects my attempts to organize my thoughts in a better way and share them with people from diverse industries and areas of research who may find this subject useful for any reason, be it theoretical or practical.

------- : -------

You can read the full article at https://arxiv.org/ftp/arxiv/papers/1606/1606.08109.pdf

要查看或添加评论,请登录

社区洞察

其他会员也浏览了