How To Create A Mind - BestLightNovel.com
You’re reading novel How To Create A Mind Part 7 online at BestLightNovel.com. Please use the follow button to get notification about the latest chapter next time when you visit BestLightNovel.com. Use F11 button to read novel in full-screen(PC only). Drop by anytime you want to read free – fast – latest novel. It’s great if you could leave a comment, share your opinion about the new chapters, new novel with others on the internet. We’ll do our best to bring you the finest, latest novel everyday. Enjoy
As an example, recall the metaphor I used in chapter 4 chapter 4 relating the random movements of molecules in a gas to the random movements of evolutionary change. Molecules in a gas move randomly with no apparent sense of direction. Despite this, virtually every molecule in a gas in a beaker, given sufficient time, will leave the beaker. I noted that this provides a perspective on an important question concerning the evolution of intelligence. Like molecules in a gas, evolutionary changes also move every which way with no apparent direction. Yet we nonetheless see a movement toward greater complexity and greater intelligence, indeed to evolution's supreme achievement of evolving a neocortex capable of hierarchical thinking. So we are able to gain an insight into how an apparently purposeless and directionless process can achieve an apparently purposeful result in one field (biological evolution) by looking at another field (thermodynamics). relating the random movements of molecules in a gas to the random movements of evolutionary change. Molecules in a gas move randomly with no apparent sense of direction. Despite this, virtually every molecule in a gas in a beaker, given sufficient time, will leave the beaker. I noted that this provides a perspective on an important question concerning the evolution of intelligence. Like molecules in a gas, evolutionary changes also move every which way with no apparent direction. Yet we nonetheless see a movement toward greater complexity and greater intelligence, indeed to evolution's supreme achievement of evolving a neocortex capable of hierarchical thinking. So we are able to gain an insight into how an apparently purposeless and directionless process can achieve an apparently purposeful result in one field (biological evolution) by looking at another field (thermodynamics).
I mentioned earlier how Charles Lyell's insight that minute changes to rock formations by streaming water could carve great valleys over time inspired Charles Darwin to make a similar observation about continual minute changes to the characteristics of organisms within a species. This metaphor search would be another continual background process.
We should provide a means of stepping through multiple lists simultaneously to provide the equivalent of structured thought. A list might be the statement of the constraints that a solution to a problem must satisfy. Each step can generate a recursive search through the existing hierarchy of ideas or a search through available literature. The human brain appears to be able to handle only four simultaneous lists at a time (without the aid of tools such as computers), but there is no reason for an artificial neocortex to have such a limitation.
We will also want to enhance our artificial brains with the kind of intelligence that computers have always excelled in, which is the ability to master vast databases accurately and implement known algorithms quickly and efficiently. Wolfram Alpha uniquely combines a great many known scientific methods and applies them to carefully collected data. This type of system is also going to continue to improve given Dr. Wolfram's observation of an exponential decline in error rates.
Finally, our new brain needs a purpose. A purpose is expressed as a series of goals. In the case of our biological brains, our goals are established by the pleasure and fear centers that we have inherited from the old brain. These primitive drives were initially set by biological evolution to foster the survival of species, but the neocortex has enabled us to sublimate them. Watson's goal was to respond to Jeopardy! Jeopardy! queries. Another simply stated goal could be to pa.s.s the Turing test. To do so, a digital brain would need a human narrative of its own fictional story so that it can pretend to be a biological human. It would also have to dumb itself down considerably, for any system that displayed the knowledge of, say, Watson would be quickly unmasked as nonbiological. queries. Another simply stated goal could be to pa.s.s the Turing test. To do so, a digital brain would need a human narrative of its own fictional story so that it can pretend to be a biological human. It would also have to dumb itself down considerably, for any system that displayed the knowledge of, say, Watson would be quickly unmasked as nonbiological.
More interestingly, we could give our new brain a more ambitious goal, such as contributing to a better world. A goal along these lines, of course, raises a lot of questions: Better for whom? Better in what way? For biological humans? For all conscious beings? If that is the case, who or what is conscious?
As nonbiological brains become as capable as biological ones of effecting changes in the world-indeed, ultimately far more capable than unenhanced biological ones-we will need to consider their moral education. A good place to start would be with one old idea from our religious traditions: the golden rule.
CHAPTER 8
THE MIND AS COMPUTER
Shaped a little like a loaf of French country bread, our brain is a crowded chemistry lab, bustling with nonstop neural conversations. Imagine the brain, that s.h.i.+ny mound of being, that mouse-gray parliament of cells, that dream factory, that pet.i.t tyrant inside a ball of bone, that huddle of neurons calling all the plays, that little everywhere, that fickle pleasuredome, that wrinkled wardrobe of selves stuffed into the skull like too many clothes into a gym bag.-Diane Ackerman Brains exist because the distribution of resources necessary for survival and the hazards that threaten survival vary in s.p.a.ce and time.-John M. Allman The modern geography of the brain has a deliciously antiquated feel to it-rather like a medieval map with the known world encircled by terra incognita where monsters roam.-David Bainbridge In mathematics you don't understand things. You just get used to them.-John von Neumann
E ver since the emergence of the computer in the middle of the twentieth century, there has been ongoing debate not only about the ultimate extent of its abilities but about whether the human brain itself could be considered a form of computer. As far as the latter question was concerned, the consensus has veered from viewing these two kinds of information-processing ent.i.ties as being essentially the same to their being fundamentally different. So ver since the emergence of the computer in the middle of the twentieth century, there has been ongoing debate not only about the ultimate extent of its abilities but about whether the human brain itself could be considered a form of computer. As far as the latter question was concerned, the consensus has veered from viewing these two kinds of information-processing ent.i.ties as being essentially the same to their being fundamentally different. So is is the brain a computer? the brain a computer?
When computers first became a popular topic in the 1940s, they were immediately regarded as thinking machines. The ENIAC, which was announced in 1946, was described in the press as a "giant brain." As computers became commercially available in the following decade, ads routinely referred to them as brains capable of feats that ordinary biological brains could not match.
A 1957 ad showing the popular conception of a computer as a giant brain.
Computer programs quickly enabled the machines to live up to this billing. The "general problem solver," created in 1959 by Herbert A. Simon, J. C. Shaw, and Allen Newell at Carnegie Mellon University, was able to devise a proof to a theorem that mathematicians Bertrand Russell (18721970) and Alfred North Whitehead (18611947) had been unable to solve in their famous 1913 work Principia Mathematica Principia Mathematica. What became apparent in the decades that followed was that computers could readily significantly exceed una.s.sisted human capability in such intellectual exercises as solving mathematical problems, diagnosing disease, and playing chess but had difficulty with controlling a robot tying shoelaces or with understanding the commonsense language that a five-year-old child could comprehend. Computers are only now starting to master these sorts of skills. Ironically, the evolution of computer intelligence has proceeded in the opposite direction of human maturation.
The issue of whether or not the computer and the human brain are at some level equivalent remains controversial today. In the introduction I mentioned that there were millions of links for quotations on the complexity of the human brain. Similarly, a Google inquiry for "Quotations: the brain is not a computer" also returns millions of links. In my view, statements along these lines are akin to saying, "Applesauce is not an apple." Technically that statement is true, but you can make applesauce from an apple. Perhaps more to the point, it is like saying, "Computers are not word processors." It is true that a computer and a word processor exist at different conceptual levels, but a computer can become a word processor if it is running word processing software and not otherwise. Similarly, a computer can become a brain if it is running brain software. That is what researchers including myself are attempting to do.
The question, then, is whether or not we can find an algorithm that would turn a computer into an ent.i.ty that is equivalent to a human brain. A computer, after all, can run any algorithm that we might define because of its innate universality (subject only to its capacity). The human brain, on the other hand, is running a specific set of algorithms. Its methods are clever in that it allows for significant plasticity and the restructuring of its own connections based on its experience, but these functions can be emulated in software.
The universality of computation (the concept that a general-purpose computer can implement any algorithm)-and the power of this idea-emerged at the same time as the first actual machines. There are four key concepts that underlie the universality and feasibility of computation and its applicability to our thinking. They are worth reviewing here, because the brain itself makes use of them. The first is the ability to communicate, remember, and compute information reliably. Around 1940, if you used the word "computer," people a.s.sumed you were talking about an a.n.a.log computer, in which numbers were represented by different levels of voltage, and specialized components could perform arithmetic functions such as addition and multiplication. A big limitation of a.n.a.log computers, however, was that they were plagued by accuracy issues. Numbers could only be represented with an accuracy of about one part in a hundred, and as voltage levels representing them were processed by increasing numbers of arithmetic operators, errors would acc.u.mulate. If you wanted to perform more than a handful of computations, the results would become so inaccurate as to be meaningless.
Anyone who can remember the days of recording music with a.n.a.log tape machines will recall this effect. There was noticeable degradation on the first copy, as it was a little noisier than the original. (Remember that "noise" represents random inaccuracies.) A copy of the copy was noisier still, and by the tenth generation the copy was almost entirely noise. It was a.s.sumed that the same problem would plague the emerging world of digital computers. We can understand such concerns if we consider the communication of digital information through a channel. No channel is perfect and each one will have some inherent error rate. Suppose we have a channel that has a .9 probability of correctly transmitting each bit. If I send a message that is one bit long, the probability of accurately transmitting it through that channel will be .9. Suppose I send two bits? Now the accuracy is .92 = .81. How about if I send one byte (eight bits)? I have less than an even chance (.43 to be exact) of sending it correctly. The probability of accurately sending five bytes is about 1 percent. = .81. How about if I send one byte (eight bits)? I have less than an even chance (.43 to be exact) of sending it correctly. The probability of accurately sending five bytes is about 1 percent.
An obvious solution to circ.u.mvent this problem is to make the channel more accurate. Suppose the channel makes only one error in a million bits. If I send a file consisting of a half million bytes (about the size of a modest program or database), the probability of correctly transmitting it is less than 2 percent, despite the very high inherent accuracy of the channel. Given that a single-bit error can completely invalidate a computer program and other forms of digital data, that is not a satisfactory situation. Regardless of the accuracy of the channel, since the likelihood of an error in a transmission grows rapidly with the size of the message, this would seem to be an intractable barrier.
a.n.a.log computers approached this problem through graceful degradation (meaning that users only presented problems in which they could tolerate small errors); however, if users of a.n.a.log computers limited themselves to a constrained set of calculations, the computers did prove somewhat useful. Digital computers, on the other hand, require continual communication, not just from one computer to another, but within the computer itself. There is communication from its memory to and from the central processing unit. Within the central processing unit, there is communication from one register to another and back and forth to the arithmetic unit, and so forth. Even within the arithmetic unit, there is communication from one bit register to another. Communication is pervasive at every level. If we consider that error rates escalate rapidly with increased communication and that a single-bit error can destroy the integrity of a process, digital computation was doomed-or so it seemed at the time.
Remarkably, that was the common view until American mathematician Claude Shannon (19162001) came along and demonstrated how we can create arbitrarily accurate communication using even the most unreliable communication channels. What Shannon stated in his landmark paper "A Mathematical Theory of Communication," published in the Bell System Technical Journal Bell System Technical Journal in July and October 1948, and in particular in his noisy channel-coding theorem, was that if you have available a channel with any error rate (except for exactly 50 percent per bit, which would mean that the channel was just transmitting pure noise), you are able to transmit a message in which the error rate is as accurate as you desire. In other words, the error rate of the transmission can be one bit out of in July and October 1948, and in particular in his noisy channel-coding theorem, was that if you have available a channel with any error rate (except for exactly 50 percent per bit, which would mean that the channel was just transmitting pure noise), you are able to transmit a message in which the error rate is as accurate as you desire. In other words, the error rate of the transmission can be one bit out of n n bits, where bits, where n n can be as large as you define. So, for example, in the extreme, if you have a channel that correctly transmits bits of information only 51 percent of the time (that is, it transmits the correct bit just slightly more often than the wrong bit), you can nonetheless transmit messages such that only one bit out of a million is incorrect, or one bit out of a trillion or a trillion trillion. can be as large as you define. So, for example, in the extreme, if you have a channel that correctly transmits bits of information only 51 percent of the time (that is, it transmits the correct bit just slightly more often than the wrong bit), you can nonetheless transmit messages such that only one bit out of a million is incorrect, or one bit out of a trillion or a trillion trillion.
How is this possible? The answer is through redundancy. That may seem obvious now, but it was not at the time. As a simple example, if I transmit each bit three times and take the majority vote, I will have substantially increased the reliability of the result. If that is not good enough, simply increase the redundancy until you get the reliability you need. Simply repeating information is the easiest way to achieve arbitrarily high accuracy rates from low-accuracy channels, but it is not the most efficient approach. Shannon's paper, which established the field of information theory, presented optimal methods of error detection and correction codes that can achieve any any target accuracy through target accuracy through any any nonrandom channel. nonrandom channel.
Older readers will recall telephone modems, which transmitted information through noisy a.n.a.log phone lines. These lines featured audibly obvious hisses and pops and many other forms of distortion, but nonetheless were able to transmit digital data with very high accuracy rates, thanks to Shannon's noisy channel theorem. The same issue and the same solution exist for digital memory. Ever wonder how CDs, DVDs, and program disks continue to provide reliable results even after the disk has been dropped on the floor and scratched? Again, we can thank Shannon.
Computation consists of three elements: communication-which, as I mentioned, is pervasive both within and between computers-memory, and logic gates (which perform the arithmetic and logical functions). The accuracy of logic gates can also be made arbitrarily high by similarly using error detection and correction codes. It is due to Shannon's theorem and theory that we can handle arbitrarily large and complex digital data and algorithms without the processes being disturbed or destroyed by errors. It is important to point out that the brain uses Shannon's principle as well, although the evolution of the human brain clearly predates Shannon's own! Most of the patterns or ideas (and an idea is also a pattern), as we have seen, are stored in the brain with a substantial amount of redundancy. A primary reason for the redundancy in the brain is the inherent unreliability of neural circuits.
The second important idea on which the information age relies is the one I mentioned earlier: the universality of computation. In 1936 Alan Turing described his "Turing machine," which was not an actual machine but another thought experiment. His theoretical computer consists of an infinitely long memory tape with a 1 or a 0 in each square. Input to the machine is presented on this tape, which the machine can read one square at a time. The machine also contains a table of rules-essentially a stored program-that consist of numbered states. Each rule specifies one action if the square currently being read is a 0, and a different action if the current square is a 1. Possible actions include writing a 0 or 1 on the tape, moving the tape one square to the right or left, or halting. Each state will then specify the number of the next state that the machine should be in.
The input to the Turing machine is presented on the tape. The program runs, and when the machine halts, it has completed its algorithm, and the output of the process is left on the tape. Note that even though the tape is theoretically infinite in length, any actual program that does not get into an infinite loop will use only a finite portion of the tape, so if we limit ourselves to a finite tape, the machine will still solve a useful set of problems.
If the Turing machine sounds simple, it is because that was its inventor's objective. Turing wanted his machine to be as simple as possible (but no simpler, to paraphrase Einstein). Turing and Alonzo Church (19031995), his former professor, went on to develop the Church-Turing thesis, which states that if a problem that can be presented to a Turing machine is not solvable by it, it is also not solvable by any any machine, following natural law. Even though the Turing machine has only a handful of commands and processes only one bit at a time, it can compute anything that any computer can compute. Another way to say this is that any machine that is "Turing complete" (that is, that has equivalent capabilities to a Turing machine) can compute any algorithm (any procedure that we can define). machine, following natural law. Even though the Turing machine has only a handful of commands and processes only one bit at a time, it can compute anything that any computer can compute. Another way to say this is that any machine that is "Turing complete" (that is, that has equivalent capabilities to a Turing machine) can compute any algorithm (any procedure that we can define).
A block diagram of a Turing machine with a head that reads and writes the tape and an internal program consisting of state transitions.
"Strong" interpretations of the Church-Turing thesis propose an essential equivalence between what a human can think or know and what is computable by a machine. The basic idea is that the human brain is likewise subject to natural law, and thus its information-processing ability cannot exceed that of a machine (and therefore of a Turing machine).
We can properly credit Turing with establis.h.i.+ng the theoretical foundation of computation with his 1936 paper, but it is important to note that he was deeply influenced by a lecture that Hungarian American mathematician John von Neumann (19031957) gave in Cambridge in 1935 on his stored program concept, a concept enshrined in the Turing machine.1 In turn, von Neumann was influenced by Turing's 1936 paper, which elegantly laid out the principles of computation, and made it required reading for his colleagues in the late 1930s and early 1940s. In turn, von Neumann was influenced by Turing's 1936 paper, which elegantly laid out the principles of computation, and made it required reading for his colleagues in the late 1930s and early 1940s.2 In the same paper Turing reports another unexpected discovery: that of unsolvable problems. These are problems that are well defined with unique answers that can be shown to exist, but that we can also prove can never be computed by any Turing machine-that is to say, by any any machine, a reversal of what had been a nineteenth-century dogma that problems that could be defined would ultimately be solved. Turing showed that there are as many unsolvable problems as solvable ones. Austrian American mathematician and philosopher Kurt G.o.del reached a similar conclusion in his 1931 "incompleteness theorem." We are thus left with the perplexing situation of being able to define a problem, to prove that a unique answer exists, and yet know that the answer can never be found. machine, a reversal of what had been a nineteenth-century dogma that problems that could be defined would ultimately be solved. Turing showed that there are as many unsolvable problems as solvable ones. Austrian American mathematician and philosopher Kurt G.o.del reached a similar conclusion in his 1931 "incompleteness theorem." We are thus left with the perplexing situation of being able to define a problem, to prove that a unique answer exists, and yet know that the answer can never be found.
Turing had shown that at its essence, computation is based on a very simple mechanism. Because the Turing machine (and therefore any computer) is capable of basing its future course of action on results it has already computed, it is capable of making decisions and modeling arbitrarily complex hierarchies of information.
In 1939 Turing designed an electronic calculator called Bombe that helped decode messages that had been encrypted by the n.a.z.i Enigma coding machine. By 1943, an engineering team influenced by Turing completed what is arguably the first computer, the Colossus, that enabled the Allies to continue decoding messages from more sophisticated versions of Enigma. The Bombe and Colossus were designed for a single task and could not be reprogrammed for a different one. But they performed this task brilliantly and are credited with having enabled the Allies to overcome the three-to-one advantage that the German Luftwaffe enjoyed over the British Royal Air Force and win the crucial Battle of Britain, as well as to continue antic.i.p.ating n.a.z.i tactics throughout the war.
It was on these foundations that John von Neumann created the architecture of the modern computer, which represents our third major idea. Called the von Neumann machine, it has remained the core structure of essentially every computer for the past sixty-seven years, from the microcontroller in your was.h.i.+ng machine to the largest supercomputers. In a paper dated June 30, 1945, and t.i.tled "First Draft of a Report on the EDVAC," von Neumann presented the ideas that have dominated computation ever since.3 The von Neumann model includes a central processing unit, where arithmetical and logical operations are carried out; a memory unit, where the program and data are stored; ma.s.s storage; a program counter; and input/output channels. Although this paper was intended as an internal project doc.u.ment, it has become the bible for computer designers. You never know when a seemingly routine internal memo will end up revolutionizing the world. The von Neumann model includes a central processing unit, where arithmetical and logical operations are carried out; a memory unit, where the program and data are stored; ma.s.s storage; a program counter; and input/output channels. Although this paper was intended as an internal project doc.u.ment, it has become the bible for computer designers. You never know when a seemingly routine internal memo will end up revolutionizing the world.
The Turing machine was not designed to be practical. Turing's theorems were concerned not with the efficiency of solving problems but rather in examining the range of problems that could in theory be solved by computation. Von Neumann's goal, on the other hand, was to create a feasible concept of a computational machine. His model replaces Turing's one-bit computations with multiple-bit words (generally some multiple of eight bits). Turing's memory tape is sequential, so Turing machine programs spend an inordinate amount of time moving the tape back and forth to store and retrieve intermediate results. In contrast, von Neumann's memory is random access, so that any data item can be immediately retrieved.
One of von Neumann's key ideas is the stored program, which he had introduced a decade earlier: placing the program in the same type of random access memory as the data (and often in the same block of memory). This allows the computer to be reprogrammed for different tasks as well as for self-modifying code (if the program store is writable), which enables a powerful form of recursion. Up until that time, virtually all computers, including the Colossus, were built for a specific task. The stored program makes it possible for a computer to be truly universal, thereby fulfilling Turing's vision of the universality of computation.
Another key aspect of the von Neumann machine is that each instruction includes an operation code specifying the arithmetic or logical operation to be performed and the address of an operand from memory.
Von Neumann's concept of how a computer should be architected was introduced with his publication of the design of the EDVAC, a project he conducted with collaborators J. Presper Eckert and John Mauchly. The EDVAC itself did not actually run until 1951, by which time there were other stored-program computers, such as the Manchester Small-Scale Experimental Machine, ENIAC, EDSAC, and BINAC, all of which had been deeply influenced by von Neumann's paper and involved Eckert and Mauchly as designers. Von Neumann was a direct contributor to the design of a number of these machines, including a later version of ENIAC, which supported a stored program.
There were a few precursors to von Neumann's architecture, although with one surprising exception, none are true von Neumann machines. In 1944 Howard Aiken introduced the Mark I, which had an element of programmability but did not use a stored program. It read instructions from a punched paper tape and then executed each command immediately. It also lacked a conditional branch instruction.
In 1941 German scientist Konrad Zuse (19101995) created the Z-3 computer. It also read its program from a tape (in this case, coded on film) and also had no conditional branch instruction. Interestingly, Zuse had support from the German Aircraft Research Inst.i.tute, which used the device to study wing flutter, but his proposal to the n.a.z.i government for funding to replace his relays with vacuum tubes was turned down. The n.a.z.is deemed computation as "not war important." That perspective goes a long way, in my view, toward explaining the outcome of the war.
There is actually one genuine forerunner to von Neumann's concept, and it comes from a full century earlier! English mathematician and inventor Charles Babbage's (17911871) a.n.a.lytical Engine, which he first described in 1837, did incorporate von Neumann's ideas and featured a stored program via punched cards borrowed from the Jacquard loom.4 Its random access memory included 1,000 words of 50 decimal digits each (the equivalent of about 21 kilobytes). Each instruction included an op code and an operand number, just like modern machine languages. It did include conditional branching and looping, so it was a true von Neumann machine. It was based entirely on mechanical gears and it appears that the a.n.a.lytical Engine was beyond Babbage's design and organizational skills. He built parts of it but it never ran. It is unclear whether the twentieth-century pioneers of the computer, including von Neumann, were aware of Babbage's work. Its random access memory included 1,000 words of 50 decimal digits each (the equivalent of about 21 kilobytes). Each instruction included an op code and an operand number, just like modern machine languages. It did include conditional branching and looping, so it was a true von Neumann machine. It was based entirely on mechanical gears and it appears that the a.n.a.lytical Engine was beyond Babbage's design and organizational skills. He built parts of it but it never ran. It is unclear whether the twentieth-century pioneers of the computer, including von Neumann, were aware of Babbage's work.
Babbage's computer did result in the creation of the field of software programming. English writer Ada Byron (18151852), Countess of Lovelace and the only legitimate child of the poet Lord Byron, was the world's first computer programmer. She wrote programs for the a.n.a.lytical Engine, which she needed to debug in her own mind (since the computer never worked), a practice well known to software engineers today as "table checking." She translated an article by the Italian mathematician Luigi Menabrea on the a.n.a.lytical Engine and added extensive notes of her own, writing that "the a.n.a.lytical Engine weaves algebraic patterns, just as the Jacquard loom weaves flowers and leaves." She went on to provide perhaps the first speculations on the feasibility of artificial intelligence, but concluded that the a.n.a.lytical Engine has "no pretensions whatever to originate anything."
Babbage's conception is quite miraculous when you consider the era in which he lived and worked. However, by the mid-twentieth century, his ideas had been lost in the mists of time (although they were subsequently rediscovered). It was von Neumann who conceptualized and articulated the key principles of the computer as we know it today, and the world recognizes this by continuing to refer to the von Neumann machine as the princ.i.p.al model of computation. Keep in mind, though, that the von Neumann machine continually communicates data between its various units and within these units, so it could not be built without Shannon's theorems and the methods he devised for transmitting and storing reliable digital information.
That brings us to the fourth important idea, which is to go beyond Ada Byron's conclusion that a computer could not think creatively and find the key algorithms employed by the brain and then use these to turn a computer into a brain. Alan Turing introduced this goal in his 1950 paper "Computing Machinery and Intelligence," which includes his now-famous Turing test for ascertaining whether or not an AI has achieved a human level of intelligence.
In 1956 von Neumann began preparing a series of lectures intended for the prestigious Silliman lecture series at Yale University. Due to the ravages of cancer, he never delivered these talks nor did he complete the ma.n.u.script from which they were to be given. This unfinished doc.u.ment nonetheless remains a brilliant and prophetic foreshadowing of what I regard as humanity's most daunting and important project. It was published posthumously as The Computer and the Brain The Computer and the Brain in 1958. It is fitting that the final work of one of the most brilliant mathematicians of the last century and one of the pioneers of the computer age was an examination of intelligence itself. This project was the earliest serious inquiry into the human brain from the perspective of a mathematician and computer scientist. Prior to von Neumann, the fields of computer science and neuroscience were two islands with no bridge between them. in 1958. It is fitting that the final work of one of the most brilliant mathematicians of the last century and one of the pioneers of the computer age was an examination of intelligence itself. This project was the earliest serious inquiry into the human brain from the perspective of a mathematician and computer scientist. Prior to von Neumann, the fields of computer science and neuroscience were two islands with no bridge between them.
Von Neumann starts his discussion by articulating the similarities and differences between the computer and the human brain. Given when he wrote this ma.n.u.script, it is remarkably accurate. He noted that the output of neurons was digital-an axon either fired or it didn't. This was far from obvious at the time, in that the output could have been an a.n.a.log signal. The processing in the dendrites leading into a neuron and in the soma neuron cell body, however, was a.n.a.log, and he described its calculations as a weighted sum of inputs with a threshold. This model of how neurons work led to the field of connectionism, which built systems based on this neuron model in both hardware and software. (As I described in the previous chapter previous chapter, the first such connectionist system was created by Frank Rosenblatt as a software program on an IBM 704 computer at Cornell in 1957, immediately after von Neumann's draft lectures became available.) We now have more sophisticated models of how neurons combine inputs, but the essential idea of a.n.a.log processing of dendrite inputs using neurotransmitter concentrations has remained valid.
Von Neumann applied the concept of the universality of computation to conclude that even though the architecture and building blocks appear to be radically different between brain and computer, we can nonetheless conclude that a von Neumann machine can simulate the processing in a brain. The converse does not hold, however, because the brain is not a von Neumann machine and does not have a stored program as such (albeit we can simulate a very simple Turing machine in our heads). Its algorithm or methods are implicit in its structure. Von Neumann correctly concludes that neurons can learn patterns from their inputs, which we have now established are coded in part in dendrite strengths. What was not known in von Neumann's time is that learning also takes place through the creation and destruction of connections between neurons.
Von Neumann presciently notes that the speed of neural processing is extremely slow, on the order of a hundred calculations per second, but that the brain compensates for this through ma.s.sive parallel processing-another un.o.bvious and key insight. Von Neumann argued that each one of the brain's 1010 neurons (a tally that itself was reasonably accurate; estimates today are between 10 neurons (a tally that itself was reasonably accurate; estimates today are between 1010 and 10 and 1011) was processing at the same time. In fact, each of the connections (with an average of about 103 to 10 to 104 connections per neuron) is computing simultaneously. connections per neuron) is computing simultaneously.
Von Neumann's estimates and his descriptions of neural processing are remarkable, given the primitive state of neuroscience at the time. One aspect of his work that I do disagree with, however, is his a.s.sessment of the brain's memory capacity. He a.s.sumes that the brain remembers every input for its entire life. Von Neumann a.s.sumes an average life span of 60 years, or about 2 109 seconds. With about 14 inputs to each neuron per second (which is actually low by at least three orders of magnitude) and with 10 seconds. With about 14 inputs to each neuron per second (which is actually low by at least three orders of magnitude) and with 1010 neurons, he arrives at an estimate of about 10 neurons, he arrives at an estimate of about 1020 bits for the brain's memory capacity. The reality, as I have noted earlier, is that we remember only a very small fraction of our thoughts and experiences, and even these memories are not stored as bit patterns at a low level (such as a video image), but rather as sequences of higher-level patterns. bits for the brain's memory capacity. The reality, as I have noted earlier, is that we remember only a very small fraction of our thoughts and experiences, and even these memories are not stored as bit patterns at a low level (such as a video image), but rather as sequences of higher-level patterns.
As von Neumann describes each mechanism in the brain, he shows how a modern computer could accomplish the same thing, despite their apparent differences. The brain's a.n.a.log mechanisms can be simulated through digital ones because digital computation can emulate a.n.a.log values to any desired degree of precision (and the precision of a.n.a.log information in the brain is quite low). The brain's ma.s.sive parallelism can be simulated as well, given the significant speed advantage of computers in serial computation (an advantage that has vastly expanded over time). In addition, we can also use parallel processing in computers by using parallel von Neumann machines-which is exactly how supercomputers work today.
Von Neumann concludes that the brain's methods cannot involve lengthy sequential algorithms, when one considers how quickly humans are able to make decisions combined with the very slow computational speed of neurons. When a third baseman fields a ball and decides to throw to first rather than to second base, he makes this decision in a fraction of a second, which is only enough time for each neuron to go through a handful of cycles. Von Neumann concludes correctly that the brain's remarkable powers come from all its 100 billion neurons being able to process information simultaneously. As I have noted, the visual cortex makes sophisticated visual judgments in only three or four neural cycles.
There is considerable plasticity in the brain, which enables us to learn. But there is far greater plasticity in a computer, which can completely restructure its methods by changing its software. Thus, in that respect, a computer will be able to emulate the brain, but the converse is not the case.
When von Neumann compared the capacity of the brain's ma.s.sively parallel organization to the (few) computers of his time, it was clear that the brain had far greater memory and speed. By now the first supercomputer to achieve specifications matching some of the more conservative estimates of the speed required to functionally simulate the human brain (about 1016 operations per second) has been built. operations per second) has been built.5 (I estimate that this level of computation will cost $1,000 by the early 2020s.) With regard to memory we are even closer. Even though it was remarkably early in the history of the computer when his ma.n.u.script was written, von Neumann nonetheless had confidence that both the hardware and software of human intelligence would ultimately fall into place, which was his motivation for having prepared these lectures. (I estimate that this level of computation will cost $1,000 by the early 2020s.) With regard to memory we are even closer. Even though it was remarkably early in the history of the computer when his ma.n.u.script was written, von Neumann nonetheless had confidence that both the hardware and software of human intelligence would ultimately fall into place, which was his motivation for having prepared these lectures.
Von Neumann was deeply aware of the increasing pace of progress and its profound implications for humanity's future. A year after his death in 1957, fellow mathematician Stan Ulam quoted him as having said in the early 1950s that "the ever accelerating progress of technology and changes in the mode of human life give the appearance of approaching some essential singularity in the history of the race beyond which human affairs, as we know them, could not continue." This is the first known use of the word "singularity" in the context of human technological history.
Von Neumann's fundamental insight was that there is an essential equivalence between a computer and the brain. Note that the emotional intelligence of a biological human is part of its intelligence. If von Neumann's insight is correct, and if one accepts my own leap of faith that a nonbiological ent.i.ty that convincingly re-creates the intelligence (emotional and otherwise) of a biological human is conscious (see the next chapter next chapter), then one would have to conclude that there is an essential equivalence between a computer-with the right software-and a (conscious) mind. So is von Neumann correct?
Most computers today are entirely digital, whereas the human brain combines digital and a.n.a.log methods. But a.n.a.log methods are easily and routinely re-created by digital ones to any desired level of accuracy. American computer scientist Carver Mead (born in 1934) has shown that we can directly emulate the brain's a.n.a.log methods in silicon, which he has demonstrated with what he calls "neuromorphic" chips.6 Mead has demonstrated how this approach can be thousands of times more efficient than digitally emulating a.n.a.log methods. As we codify the ma.s.sively repeated neocortical algorithm, it will make sense to use Mead's approach. The IBM Cognitive Computing Group, led by Dharmendra Modha, has introduced chips that emulate neurons and their connections, including the ability to form new connections. Mead has demonstrated how this approach can be thousands of times more efficient than digitally emulating a.n.a.log methods. As we codify the ma.s.sively repeated neocortical algorithm, it will make sense to use Mead's approach. The IBM Cognitive Computing Group, led by Dharmendra Modha, has introduced chips that emulate neurons and their connections, including the ability to form new connections.7 Called "SyNAPSE," one of the chips provides a direct simulation of 256 neurons with about a quarter million synaptic connections. The goal of the project is to create a simulated neocortex with 10 billion neurons and 100 trillion connections-close to a human brain-that uses only one kilowatt of power. Called "SyNAPSE," one of the chips provides a direct simulation of 256 neurons with about a quarter million synaptic connections. The goal of the project is to create a simulated neocortex with 10 billion neurons and 100 trillion connections-close to a human brain-that uses only one kilowatt of power.
As von Neumann described over a half century ago, the brain is extremely slow but ma.s.sively parallel. Today's digital circuits are at least 10 million times faster than the brain's electrochemical switches. Conversely, all 300 million of the brain's neocortical pattern recognizers process simultaneously, and all quadrillion of its interneuronal connections are potentially computing at the same time. The key issue for providing the requisite hardware to successfully model a human brain, though, is the overall memory and computational throughput required. We do not need to directly copy the brain's architecture, which would be a very inefficient and inflexible approach.
Let's estimate what those hardware requirements are. Many projects have attempted to emulate the type of hierarchical learning and pattern recognition that takes place in the neocortical hierarchy, including my own work with hierarchical hidden Markov models. A conservative estimate from my own experience is that emulating one cycle in a single pattern recognizer in the biological brain's neocortex would require about 3,000 calculations. Most simulations run at a fraction of this estimate. With the brain running at about 102 (100) cycles per second, that comes to 3 10 (100) cycles per second, that comes to 3 105 (300,000) calculations per second per pattern recognizer. Using my estimate of 3 10 (300,000) calculations per second per pattern recognizer. Using my estimate of 3 108 (300 million) pattern recognizers, we get about 10 (300 million) pattern recognizers, we get about 1014 (100 trillion) calculations per second, a figure that is consistent with my estimate in (100 trillion) calculations per second, a figure that is consistent with my estimate in The Singularity Is Near The Singularity Is Near. In that book I projected that to functionally simulate the brain would require between 1014 and 10 and 1016 calculations per second (cps) and used 10 calculations per second (cps) and used 1016 cps to be conservative. AI expert Hans Moravec's estimate, based on extrapolating the computational requirement of the early (initial) visual processing across the entire brain, is 10 cps to be conservative. AI expert Hans Moravec's estimate, based on extrapolating the computational requirement of the early (initial) visual processing across the entire brain, is 1014 cps, which matches my own a.s.sessment here. cps, which matches my own a.s.sessment here.
Routine desktop machines can reach 1010 cps, although this level of performance can be significantly amplified by using cloud resources. The fastest supercomputer, j.a.pan's K Computer, has already reached 10 cps, although this level of performance can be significantly amplified by using cloud resources. The fastest supercomputer, j.a.pan's K Computer, has already reached 1016 cps. cps.8 Given that the algorithm of the neocortex is ma.s.sively repeated, the approach of using neuromorphic chips such as the IBM SyNAPSE chips mentioned above is also promising. Given that the algorithm of the neocortex is ma.s.sively repeated, the approach of using neuromorphic chips such as the IBM SyNAPSE chips mentioned above is also promising.
In terms of memory requirement, we need about 30 bits (about four bytes) for one connection to address one of 300 million other pattern recognizers. If we estimate an average of eight inputs to each pattern recognizer, that comes to 32 bytes per recognizer. If we add a one-byte weight for each input, that brings us to 40 bytes. Add another 32 bytes for downward connections, and we are at 72 bytes. Note that the branching-up-and-down figure will often be much higher than eight, though these very large branching trees are shared by many recognizers. For example, there may be hundreds of recognizers involved in recognizing the letter "p." These will feed up into thousands of such recognizers at this next higher level that deal with words and phrases that include "p." However, each "p" recognizer does not repeat the tree of connections that feeds up to all of the words and phrases that include "p"-they all share one such tree of connections. The same is true of downward connections: A recognizer that is responsible for the word "APPLE" will tell all of the thousands of "E" recognizers at a level below it that an "E" is expected if it has already seen "A," "P," "P," and "L." That tree of connections is not repeated for each word or phrase recognizer that wants to inform the next lower level that an "E" is expected. Again, they are shared. For this reason, an overall estimate of eight up and eight down on average per pattern recognizer is reasonable. Even if we increase this particular estimate, it does not significantly change the order of magnitude of the resulting estimate.
With 3 108 (300 million) pattern recognizers at 72 bytes each, we get an overall memory requirement of about 2 10 (300 million) pattern recognizers at 72 bytes each, we get an overall memory requirement of about 2 1010 (20 billion) bytes. That is actually a quite modest number that routine computers today can exceed. (20 billion) bytes. That is actually a quite modest number that routine computers today can exceed.
These estimates are intended only to provide rough estimates of the order of magnitude required. Given that digital circuits are inherently about 10 million times faster than the biological neocortical circuits, we do not need to match the human brain for parallelism-modest parallel processing (compared with the trillions-fold parallelism of the human brain) will be sufficient. We can see that the necessary computational requirements are coming within reach. The brain's rewiring of itself-dendrites are continually creating new synapses-can also be emulated in software using links, a far more flexible system than the brain's method of plasticity, which as we have seen is impressive but limited.
The redundancy used by the brain to achieve robust invariant results can certainly be replicated in software emulations. The mathematics of optimizing these types of self-organizing hierarchical learning systems is well understood. The organization of the brain is far from optimal. Of course it didn't need to be-it only needed to be good enough to achieve the threshold of being able to create tools that would compensate for its own limitations.
Another restriction of the human neocortex is that there is no process that eliminates or even reviews contradictory ideas, which accounts for why human thinking is often ma.s.sively inconsistent. We have a weak mechanism to address this called critical thinking, but this skill is not practiced nearly as often as it should be. In a software-based neocortex, we can build in a process that reveals inconsistencies for further review.
It is important to note that the design of an entire brain region is simpler than the design of a single neuron. As discussed earlier, models often get simpler at a higher level-consider an a.n.a.logy with a computer. We do need to understand the detailed physics of semiconductors to model a transistor, and the equations underlying a single real transistor are complex. A digital circuit that multiples two numbers requires hundreds of them. Yet we can model this multiplication circuit very simply with one or two formulas. An entire computer with billions of transistors can be modeled through its instruction set and register description, which can be described on a handful of written pages of text and formulas. The software programs for an operating system, language compilers, and a.s.semblers are reasonably complex, but modeling a particular program-for example, a speech recognition program based on hierarchical hidden Markov modeling-may likewise be described in only a few pages of equations. Nowhere in such a description would be found the details of semiconductor physics or even of computer architecture.
A similar observation holds true for the brain. A particular neocortical pattern recognizer that detects a particular invariant visual feature (such as a face) or that performs a bandpa.s.s filtering (restricting input to a specific frequency range) on sound or that evaluates the temporal proximity of two events can be described with far fewer specific details than the actual physics and chemical relations controlling the neurotransmitters, ion channels, and other synaptic and dendritic variables involved in the neural processes. Although all of this complexity needs to be carefully considered before advancing to the next higher conceptual level, much of it can be simplified as the operating principles of the brain are revealed.
CHAPTER 9
THOUGHT EXPERIMENTS ON THE MIND
Minds are simply what brains do.-Marvin Minsky, The Society of Mind The Society of Mind
When intelligent machines are constructed, we should not be surprised to find them as confused and as stubborn as men in their convictions about mind-matter, consciousness, free will, and the like.-Marvin Minsky, The Society of Mind The Society of Mind
Who Is Conscious?The real history of consciousness starts with one's first lie.-Joseph Brodsky
Suffering is the sole origin of consciousness.-Fyodor Dostoevsky, Notes from Underground Notes from Underground
There is a kind of plant that eats organic food with its flowers: when a fly settles upon the blossom, the petals close upon it and hold it fast till the plant has absorbed the insect into its system; but they will close on nothing but what is good to eat; of a drop of rain or a piece of stick they will take no notice. Curious! that so unconscious a thing should have such a keen eye to its own interest. If this is unconsciousness, where is the use of consciousness?-Samuel Butler, 1871
We have been examining the brain as an ent.i.ty that is capable of certain levels of accomplishment. But that perspective essentially leaves our selves selves out of the picture. We appear to live in our brains. We have subjective lives. How does the objective view of the brain that we have discussed up until now relate to our own feelings, to our sense of being the person having the experiences? out of the picture. We appear to live in our brains. We have subjective lives. How does the objective view of the brain that we have discussed up until now relate to our own feelings, to our sense of being the person having the experiences?
British philosopher Colin McGinn (born in 1950) writes that discussing "consciousness can reduce even the most fastidious thinker to blabbering incoherence." The reason for this is that people often have unexamined and inconsistent views on exactly what the term means.
Many observers consider consciousness to be a form of performance-for example, the capacity for self-reflection, that is, the ability to understand one's own thoughts and to explain them. I would describe that as the ability to think about one's own thinking. Presumably, we could come up with a way of evaluating this ability and then use this test to separate conscious things from unconscious things.
However, we quickly get into trouble in trying to implement this approach. Is a baby conscious? A dog? They're not very good at describing their own thinking process. There are people who believe that babies and dogs are not conscious beings precisely because they cannot explain themselves. How about the computer known as Watson? It can be put into a mode where it actually does explain how it came up with a given answer. Because it contains a model of its own thinking, is Watson therefore conscious whereas the baby and the dog are not?
Before we proceed to pa.r.s.e this question further, it is important to reflect on the most significant distinction relating to it: What is it that we can ascertain from science, versus what remains truly a matter of philosophy? One view is that philosophy is a kind of halfway house for questions that have not yet yielded to the scientific method. According to this perspective, once science advances sufficiently to resolve a particular set of questions, philosophers can then move on to other concerns, until such time that science resolves them also. This view is endemic where the issue of consciousness is concerned, and specifically the question "What and who is conscious?"
Consider these statements by philosopher John Searle: "We know that brains cause consciousness with specific biological mechanisms.... The essential thing is to recognize that consciousness is a biological process like digestion, lactation, photosynthesis, or mitosis.... The brain is a machine, a biological machine to be sure, but a machine all the same. So the first step is to figure out how the brain does it and then build an artificial machine that has an equally effective mechanism for causing consciousness."1 People are often surprised to see these quotations because they a.s.sume that Searle is devoted to protecting the mystery of consciousness against reductionists like Ray Kurzweil. People are often surprised to see these quotations because they a.s.sume that Searle is devoted to protecting the mystery of consciousness against reductionists like Ray Kurzweil.
The Australian philosopher David Chalmers (born in 1966) has coined the term "the hard problem of consciousness" to describe the difficulty of pinning down this essentially indescribable concept. Sometimes a brief phrase encapsulates an entire school of thought so well that it becomes emblematic (for example, Hannah Arendt's "the ba.n.a.lity of evil"). Chalmers's famous formulation accomplishes this very well.
When discussing consciousness, it becomes very easy to slip into considering the observable and measurable attributes that we a.s.sociate with being conscious, but this approach misses the very essence of the idea. I just mentioned the concept of metacognition-the idea of thinking about one's own thinking-as one such correlate of consciousness. Other observers conflate emotional intelligence or moral intelligence with consciousness. But, again, our ability to express a loving sentiment, to get the joke, or to be s.e.xy are simply types of performances-impressive and intelligent perhaps, but skills that can nonetheless be observed and measured (even if we argue about how to a.s.sess them). Figuring out how the brain accomplishes these sorts of tasks and what is going on in the brain when we do them const.i.tutes Chalmers's "easy" question of consciousness. Of course, the "easy" problem is anything but and represents perhaps the most difficult and important scientific quest of our era. Chalmers's "hard" question, meanwhile, is so hard that it is essentially ineffable.