How To Create A Mind - BestLightNovel.com
You’re reading novel How To Create A Mind Part 2 online at BestLightNovel.com. Please use the follow button to get notification about the latest chapter next time when you visit BestLightNovel.com. Use F11 button to read novel in full-screen(PC only). Drop by anytime you want to read free – fast – latest novel. It’s great if you could leave a comment, share your opinion about the new chapters, new novel with others on the internet. We’ll do our best to bring you the finest, latest novel everyday. Enjoy
The above patterns are const.i.tuents of the next higher level of pattern, which is a category called printed letters (there is no such formal category within the neocortex, however; indeed, there are no formal categories).
"A":
Two different patterns, either of which const.i.tutes "A," and two different patterns at a higher level ("APPLE" and "PEAR") of which "A" is a part.
"P":
Patterns that are part of the higher-level pattern "P."
"L":
Patterns that are part of the higher-level pattern "L."
"E":
Patterns that are part of the higher-level pattern "E."
These letter patterns feed up to an even higher-level pattern in a category called words. (The word "words" is our language category for this concept, but the neocortex just treats them only as patterns.) "APPLE":
In a different part of the cortex is a comparable hierarchy of pattern recognizers processing actual images images of objects (as opposed to printed letters). If you are looking at an actual apple, low-level recognizers will detect curved edges and surface color patterns leading up to a pattern recognizer firing its axon and saying in effect, "Hey guys, I just saw an actual apple." Yet other pattern recognizers will detect combinations of frequencies of sound leading up to a pattern recognizer in the auditory cortex that might fire its axon indicating, "I just heard the spoken word 'apple.'" of objects (as opposed to printed letters). If you are looking at an actual apple, low-level recognizers will detect curved edges and surface color patterns leading up to a pattern recognizer firing its axon and saying in effect, "Hey guys, I just saw an actual apple." Yet other pattern recognizers will detect combinations of frequencies of sound leading up to a pattern recognizer in the auditory cortex that might fire its axon indicating, "I just heard the spoken word 'apple.'"
Keep in mind the redundancy factor-we don't just have a single pattern recognizer for "apple" in each of its forms (written, spoken, visual). There are likely to be hundreds of such recognizers firing, if not more. The redundancy not only increases the likelihood that you will successfully recognize each instance of an apple but also deals with the variations in real-world apples. For apple objects, there will be pattern recognizers that deal with the many varied forms of apples: different views, colors, shadings, shapes, and varieties.
Also keep in mind that the hierarchy shown above is a hierarchy of concepts concepts. These recognizers are not physically placed above each other; because of the thin construction of the neocortex, it is physically only one pattern recognizer high. The conceptual hierarchy is created by the connections between the individual pattern recognizers.
An important attribute of the PRTM is how the recognitions are made inside each pattern recognition module. Stored in the module is a weight for each input dendrite indicating how important that input is to the recognition. The pattern recognizer has a threshold for firing (which indicates that this pattern recognizer has successfully recognized the pattern it is responsible for). Not every input pattern has to be present for a recognizer to fire. The recognizer may still fire if an input with a low weight is missing, but it is less likely to fire if a high-importance input is missing. When it fires, a pattern recognizer is basically saying, "The pattern I am responsible for is probably present."
Successful recognition by a module of its pattern goes beyond just counting the input signals that are activated (even a count weighted by the importance parameter). The size (of each input) matters. There is another parameter (for each input) indicating the expected size of the input, and yet another indicating how variable that size is. To appreciate how this works, suppose we have a pattern recognizer that is responsible for recognizing the spoken word "steep." This spoken word has four sounds: [s], [t], [E], and [p]. The [t] phoneme is what is known as a "dental consonant," meaning that it is created by the tongue creating a burst of noise when air breaks its contact with the upper teeth. It is essentially impossible to articulate the [t] phoneme slowly. The [p] phoneme is considered a "plosive consonant" or "oral occlusive," meaning that it is created when the vocal tract is suddenly blocked (by the lips in the case of [p]) so that air no longer pa.s.ses. It is also necessarily quick. The [E] vowel is caused by resonances of the vocal cord and open mouth. It is considered a "long vowel," meaning that it persists for a much longer period of time than consonants such as [t] and [p]; however, its duration can be quite variable. The [s] phoneme is known as a "sibilant consonant," and is caused by the pa.s.sage of air against the edges of the teeth, which are held close together. Its duration is typically shorter than that of a long vowel such as [E], but it is also variable (in other words, the [s] can be said quickly or you can drag it out).
In our work in speech recognition, we found that it is necessary to encode this type of information in order to recognize speech patterns. For example, the words "step" and "steep" are very similar. Although the [e] phoneme in "step" and the [E] in "steep" are somewhat different vowel sounds (in that they have different resonant frequencies), it is not reliable to distinguish these two words based on these often confusable vowel sounds. It is much more reliable to consider the observation that the [e] in "step" is relatively brief compared with the [E] in "steep."
We can encode this type of information with two numbers for each input: the expected size and the degree of variability of that size. In our "steep" example, [t] and [p] would both have a very short expected duration as well as a small expected variability (that is, we do not expect to hear long t's and p's). The [s] sound would have a short expected duration but a larger variability because it is possible to drag it out. The [E] sound has a long expected duration as well as a high degree of variability.
In our speech examples, the "size" parameter refers to duration, but time is only one possible dimension. In our work in character recognition, we found that comparable spatial information was important in order to recognize printed letters (for example the dot over the letter "i" is expected to be much smaller than the portion under the dot). At much higher levels of abstraction, the neocortex will deal with patterns with all sorts of continuums, such as levels of attractiveness, irony, happiness, frustration, and myriad others. We can draw similarities across rather diverse continuums, as Darwin did when he related the physical size of geological canyons to the amount of differentiation among species.
In a biological brain, the source of these parameters comes from the brain's own experience. We are not born with an innate knowledge of phonemes; indeed different languages have very different sets of them. This implies that multiple examples of a pattern are encoded in the learned parameters of each pattern recognizer (as it requires multiple instances of a pattern to ascertain the expected distribution of magnitudes of the inputs to the pattern). In some AI systems, these types of parameters are hand-coded by experts (for example, linguists who can tell us the expected durations of different phonemes, as I articulated above). In my own work, we found that having an AI system discover these parameters on its own from training data (similar to the way the brain does it) was a superior approach. Sometimes we used a hybrid approach; that is, we primed the system with the intuition of human experts (for the initial settings of the parameters) and then had the AI system automatically refine these estimates using a learning process from real examples of speech.
What the pattern recognition module is doing is computing the probability (that is, the likelihood based on all of its previous experience) that the pattern that it is responsible for recognizing is in fact currently represented by its active inputs. Each particular input to the module is active if the corresponding lower-level pattern recognizer is firing (meaning that that lower-level pattern was recognized). Each input also encodes the observed size (on some appropriate dimension such as temporal duration or physical magnitude or some other continuum) so that the size can be compared (with the stored size parameters for each input) by the module in computing the overall probability of the pattern.
How does the brain (and how can an AI system) compute the overall probability that the pattern (that the module is responsible for recognizing) is present given (1) the inputs (each with an observed size), (2) the stored parameters on size (the expected size and the variability of size) for each input, and (3) the parameters of the importance of each input? In the 1980s and 1990s, I and others pioneered a mathematical method called hierarchical hidden Markov models for learning these parameters and then using them to recognize hierarchical patterns. We used this technique in the recognition of human speech as well as the understanding of natural language. I describe this approach further in chapter 7 chapter 7.
Getting back to the flow of recognition from one level of pattern recognizers to the next, in the above example we see the information flow up the conceptual hierarchy from basic letter features to letters to words. Recognitions will continue to flow up from there to phrases and then more complex language structures. If we go up several dozen more levels, we get to higher-level concepts like irony and envy. Even though every pattern recognizer is working simultaneously, it does take time for recognitions to move upward in this conceptual hierarchy. Traversing each level takes between a few hundredths to a few tenths of a second to process. Experiments have shown that a moderately high-level pattern such as a face takes at least a tenth of a second. It can take as long as an entire second if there are significant distortions. If the brain were sequential (like conventional computers) and was performing each pattern recognition in sequence, it would have to consider every possible low-level pattern before moving on to the next level. Thus it would take many millions of cycles just to go through each level. That is exactly what happens when we simulate these processes on a computer. Keep in mind, however, that computers process millions of times faster than our biological circuits.
A very important point to note here is that information flows down the conceptual hierarchy as well as up. If anything, this downward flow is even more significant. If, for example, we are reading from left to right and have already seen and recognized the letters "A," "P," "P," and "L," the "APPLE" recognizer will predict that it is likely to see an "E" in the next position. It will send a signal down down to the "E" recognizer saying, in effect, "Please be aware that there is a high likelihood that you will see your 'E' pattern very soon, so be on the lookout for it." The "E" recognizer then adjusts its threshold such that it is more likely to recognize an "E." So if an image appears next that is vaguely like an "E," but is perhaps smudged such that it would not have been recognized as an "E" under "normal" circ.u.mstances, the "E" recognizer may nonetheless indicate that it has indeed seen an "E," since it was expected. to the "E" recognizer saying, in effect, "Please be aware that there is a high likelihood that you will see your 'E' pattern very soon, so be on the lookout for it." The "E" recognizer then adjusts its threshold such that it is more likely to recognize an "E." So if an image appears next that is vaguely like an "E," but is perhaps smudged such that it would not have been recognized as an "E" under "normal" circ.u.mstances, the "E" recognizer may nonetheless indicate that it has indeed seen an "E," since it was expected.
The neocortex is, therefore, predicting what it expects to encounter. Envisaging the future is one of the primary reasons we have a neocortex. At the highest conceptual level, we are continually making predictions-who is going to walk through the door next, what someone is likely to say next, what we expect to see when we turn the corner, the likely results of our own actions, and so on. These predictions are constantly occurring at every every level of the neocortex hierarchy. We often misrecognize people and things and words because our threshold for confirming an expected pattern is too low. level of the neocortex hierarchy. We often misrecognize people and things and words because our threshold for confirming an expected pattern is too low.
In addition to positive signals, there are also negative or inhibitory signals which indicate that a certain pattern is less likely to exist. These can come from lower conceptual levels (for example, the recognition of a mustache will inhibit the likelihood that a person I see in the checkout line is my wife), or from a higher level (for example, I know that my wife is on a trip, so the person in the checkout line can't be she). When a pattern recognizer receives an inhibitory signal, it raises the recognition threshold, but it is still possible for the pattern to fire (so if the person in line really is her, I may still recognize her).
The Nature of the Data Flowing into a Neocortical Pattern Recognizer
Let's consider further what the data for a pattern looks like. If the pattern is a face, the data exists in at least two dimensions. We cannot say that the eyes necessarily come first, followed by the nose, and so on. The same thing is true for most sounds. A musical piece has at least two dimensions. There may be more than one instrument and/or voice making sounds at the same time. Moreover, a single note of a complex instrument such as the piano consists of multiple frequencies. A single human voice consists of varying levels of energy in dozens of different frequency bands simultaneously. So a pattern of sound may be complex at any one instant, and these complex instants stretch out over time. Tactile inputs are also two-dimensional, since the skin is a two-dimensional sense organ, and such patterns may change over the third dimension of time.
So it would seem that the input to a neocortex pattern processor must comprise two- if not three-dimensional patterns. However, we can see in the structure of the neocortex that the pattern inputs are only one-dimensional lists. All of our work in the field of creating artificial pattern recognition systems (such as speech recognition and visual recognition systems) demonstrates that we can (and did) represent two- and three-dimensional phenomena with such one-dimensional lists. I'll describe how these methods work in chapter 7 chapter 7, but for now we can proceed with the understanding that the input to each pattern processor is a one-dimensional list, even though the pattern itself may inherently reflect more than one dimension.
We should factor in at this point the insight that the patterns we have learned to recognize (for example, a specific dog or the general idea of a "dog," a musical note or a piece of music) are exactly the same mechanism that is the basis for our memories. Our memories are in fact patterns organized as lists (where each item in each list is another pattern in the cortical hierarchy) that we have learned and then recognize when presented with the appropriate stimulus. In fact, memories exist in the neocortex in order to be recognized.
The only exception to this is at the lowest possible conceptual level, in which the input data to a pattern represents specific sensory information (for example, image data from the optic nerve). Even this lowest level of pattern, however, has been significantly transformed into simple patterns by the time it reaches the cortex. The lists of patterns that const.i.tute a memory are in forward order, and we are able to remember our memories only in that order, hence the difficulty we have in reversing our memories.
A memory needs to be triggered by another thought/memory (these are the same thing). We can experience this mechanism of triggering when we are perceiving a pattern. When we perceived "A," "P," "P," and "L," the "A P P L E" pattern predicted that we would see an "E" and triggered the "E" pattern that it is now expected. Our cortex is thereby "thinking" of seeing an "E" even before we see it. If this particular interaction in our cortex has our attention, we will think about "E" before we see it or even if we never see it. A similar mechanism triggers old memories. Usually there is an entire chain of such links. Even if we do have some level of awareness of the memories (that is, the patterns) that triggered the old memory, memories (patterns) do not have language or image labels. This is the reason why old memories may seem to suddenly jump into our awareness. Having been buried and not activated for perhaps years, they need a trigger in the same way that a Web page needs a Web link to be activated. And just as a Web page can become "orphaned" because no other page links to it, the same thing can happen to our memories.
Our thoughts are largely activated in one of two modes, undirected and directed, both of which use these same cortical links. In the undirected mode, we let the links play themselves out without attempting to move them in any particular direction. Some forms of meditation (such as Transcendental Meditation, which I practice) are based on letting the mind do exactly this. Dreams have this quality as well.
In directed thinking we attempt to step through a more orderly process of recalling a memory (a story, for example) or solving a problem. This also involves stepping through lists in our neocortex, but the less structured flurry of undirected thought will also accompany the process. The full content of our thinking is therefore very disorderly, a phenomenon that James Joyce illuminated in his "stream of consciousness" novels.
As you think through the memories/stories/patterns in your life, whether they involve a chance encounter with a mother with a baby carriage and baby on a walk or the more important narrative of how you met your spouse, your memories consist of a sequence of patterns. Because these patterns are not labeled with words or sounds or pictures or videos, when you try to recall a significant event, you will essentially be reconstructing the images in your mind, because the actual images do not exist.
If we were to "read" the mind of someone and peer at exactly what is going on in her neocortex, it would be very difficult to interpret her memories, whether we were to take a look at patterns that are simply stored in the neocortex waiting to be triggered or those that have been triggered and are currently being experienced as active thoughts. What we would "see" is the simultaneous activation of millions of pattern recognizers. A hundredth of a second later, we would see a different set of a comparable number of activated pattern recognizers. Each such pattern would be a list of other patterns, and each of those patterns would be a list of other patterns, and so on until we reached the most elementary simple patterns at the lowest level. It would be extremely difficult to interpret what these higher-level patterns meant without actually copying all all of the information at every level into our own cortex. Thus each pattern in our neocortex is meaningful only in light of all the information carried in the levels below it. Moreover, other patterns at the same level and at higher levels are also relevant in interpreting a particular pattern because they provide context. True mind reading, therefore, would necessitate not just detecting the activations of the relevant axons in a person's brain, but examining essentially her entire neocortex with all of its memories to understand these activations. of the information at every level into our own cortex. Thus each pattern in our neocortex is meaningful only in light of all the information carried in the levels below it. Moreover, other patterns at the same level and at higher levels are also relevant in interpreting a particular pattern because they provide context. True mind reading, therefore, would necessitate not just detecting the activations of the relevant axons in a person's brain, but examining essentially her entire neocortex with all of its memories to understand these activations.
As we experience our own thoughts and memories, we "know" what they mean, but they do not exist as readily explainable thoughts and recollections. If we want to share them with others, we need to translate them into language. This task is also accomplished by the neocortex, using pattern recognizers trained with patterns that we have learned for the purpose of using language. Language is itself highly hierarchical and evolved to take advantage of the hierarchical nature of the neocortex, which in turn reflects the hierarchical nature of reality. The innate ability of humans to learn the hierarchical structures in language that Noam Chomsky wrote about reflects the structure of the neocortex. In a 2002 paper he coauth.o.r.ed, Chomsky cites the attribute of "recursion" as accounting for the unique language faculty of the human species.4 Recursion, according to Chomsky, is the ability to put together small parts into a larger chunk, and then use that chunk as a part in yet another structure, and to continue this process iteratively. In this way we are able to build the elaborate structures of sentences and paragraphs from a limited set of words. Although Chomsky was not explicitly referring here to brain structure, the capability he is describing is exactly what the neocortex does. Recursion, according to Chomsky, is the ability to put together small parts into a larger chunk, and then use that chunk as a part in yet another structure, and to continue this process iteratively. In this way we are able to build the elaborate structures of sentences and paragraphs from a limited set of words. Although Chomsky was not explicitly referring here to brain structure, the capability he is describing is exactly what the neocortex does.
Lower species of mammals largely use up their neocortex with the challenges of their particular lifestyles. The human species acquired additional capacities by having grown substantially more cortex to handle spoken and written language. Some people have learned such skills better than others. If we have told a particular story many times, we will begin to actually learn the sequence of language that describes the story as a series of separate sequences. Even in this case our memory is not a strict sequence of words, but rather of language structures that we need to translate into specific word sequences each time we deliver the story. That is why we tell a story a bit differently each time we share it (unless we learn the exact word sequence as a pattern).
For each of these descriptions of specific thought processes, we also need to consider the issue of redundancy. As I mentioned, we don't have a single pattern representing the important ent.i.ties in our lives, whether those ent.i.ties const.i.tute sensory categories, language concepts, or memories of events. Every important pattern-at every level-is repeated many times. Some of these recurrences represent simple repet.i.tions, whereas many represent different perspectives and vantage points. This is a princ.i.p.al reason why we can recognize a familiar face from various orientations and under a range of lighting conditions. Each level up the hierarchy has substantial redundancy, allowing sufficient variability that is consistent with that concept.
So if we were to imagine examining your neocortex when you were looking at a particular loved one, we would see a great many firings of the axons of the pattern recognizers at every level, from the basic level of primitive sensory patterns up to many different patterns representing that loved one's image. We would also see ma.s.sive numbers of firings representing other aspects of the situation, such as that person's movements, what she is saying, and so on. So if the experience seems much richer than just an orderly trip up a hierarchy of features, it is.
A computer simulation of the firings of many simultaneous pattern recognizers in the neocortex.
But the basic mechanism of going up a hierarchy of pattern recognizers in which each higher conceptual level represents a more abstract and more integrated concept remains valid. The flow of information downward is even greater, as each activated level of recognized pattern sends predictions to the next lower-level pattern recognizer of what it is likely to be encountering next. The apparent lushness of human experience is a result of the fact that all of the hundreds of millions of pattern recognizers in our neocortex are considering their inputs simultaneously.
In chapter 5 chapter 5 I'll discuss the flow of information from touch, vision, hearing, and other sensory organs into the neocortex. These early inputs are processed by cortical regions that are devoted to relevant types of sensory input (although there is enormous plasticity in the a.s.signment of these regions, reflecting the basic uniformity of function in the neocortex). The conceptual hierarchy continues above the highest concepts in each sensory region of the neocortex. The cortical a.s.sociation areas integrate input from the different sensory inputs. When we hear something that perhaps sounds like our spouse's voice, and then see something that is perhaps indicative of her presence, we don't engage in an elaborate process of logical deduction; rather, we instantly perceive that our spouse is present from the combination of these sensory recognitions. We integrate all of the germane sensory and perceptual cues-perhaps even the smell of her perfume or his cologne-as one multilevel perception. I'll discuss the flow of information from touch, vision, hearing, and other sensory organs into the neocortex. These early inputs are processed by cortical regions that are devoted to relevant types of sensory input (although there is enormous plasticity in the a.s.signment of these regions, reflecting the basic uniformity of function in the neocortex). The conceptual hierarchy continues above the highest concepts in each sensory region of the neocortex. The cortical a.s.sociation areas integrate input from the different sensory inputs. When we hear something that perhaps sounds like our spouse's voice, and then see something that is perhaps indicative of her presence, we don't engage in an elaborate process of logical deduction; rather, we instantly perceive that our spouse is present from the combination of these sensory recognitions. We integrate all of the germane sensory and perceptual cues-perhaps even the smell of her perfume or his cologne-as one multilevel perception.
At a conceptual level above the cortical sensory a.s.sociation areas, we are capable of dealing with-perceiving, remembering, and thinking about-even more abstract concepts. At the highest level we recognize patterns such as that's funny that's funny, or she's pretty she's pretty, or that's ironic that's ironic, and so on. Our memories include these abstract recognition patterns as well. For example, we might recall that we were taking a walk with someone and that she said something funny, and we laughed, though we may not remember the actual joke itself. The memory sequence for that recollection has simply recorded the perception of humor but not the precise content of what was funny.
In the previous chapter previous chapter I noted that we can often recognize a pattern even though we don't recognize it well enough to be able to describe it. For example, I believe I could pick out a picture of the woman with the baby carriage whom I saw earlier today from among a group of pictures of other women, despite the fact that I am unable to actually visualize her and cannot describe much specific about her. In this case my memory of her is a list of certain high-level features. These features do not have language or image labels attached to them, and they are not pixel images, so while I am able to think about her, I am unable to describe her. However, if I am presented with a picture of her, I can process the image, which results in the recognition of the same high-level features that were recognized the first time I saw her. I would be able to thereby determine that the features match and thus confidently pick out her picture. I noted that we can often recognize a pattern even though we don't recognize it well enough to be able to describe it. For example, I believe I could pick out a picture of the woman with the baby carriage whom I saw earlier today from among a group of pictures of other women, despite the fact that I am unable to actually visualize her and cannot describe much specific about her. In this case my memory of her is a list of certain high-level features. These features do not have language or image labels attached to them, and they are not pixel images, so while I am able to think about her, I am unable to describe her. However, if I am presented with a picture of her, I can process the image, which results in the recognition of the same high-level features that were recognized the first time I saw her. I would be able to thereby determine that the features match and thus confidently pick out her picture.
Even though I saw this woman only once on my walk, there are probably already multiple copies of her pattern in my neocortex. However, if I don't think about her for a given period of time, then these pattern recognizers will become rea.s.signed to other patterns. That is why memories grow dimmer with time: The amount of redundancy becomes reduced until certain memories become extinct. However, now that I have memorialized this particular woman by writing about her here, I probably won't forget her so easily.
Autoa.s.sociation and Invariance
In the previous chapter previous chapter I discussed how we can recognize a pattern even if the entire pattern is not present, and also if it is distorted. The first capability is called autoa.s.sociation: the ability to a.s.sociate a pattern with a part of itself. The structure of each pattern recognizer inherently supports this capability. I discussed how we can recognize a pattern even if the entire pattern is not present, and also if it is distorted. The first capability is called autoa.s.sociation: the ability to a.s.sociate a pattern with a part of itself. The structure of each pattern recognizer inherently supports this capability.
As each input from a lower-level pattern recognizer flows up to a higher-level one, the connection can have a "weight," indicating how important that particular element in the pattern is. Thus the more significant elements of a pattern are more heavily weighted in considering whether that pattern should trigger as "recognized." Lincoln's beard, Elvis's sideburns, and Einstein's famous tongue gesture are likely to have high weights in the patterns we've learned about the appearance of these iconic figures. The pattern recognizer computes a probability that takes the importance parameters into account. Thus the overall probability is lower if one or more of the elements is missing, though the threshold of recognition may nonetheless be met. As I pointed out, the computation of the overall probability (that the pattern is present) is more complicated than a simple weighted sum in that the size parameters also need to be considered.
If the pattern recognizer has received a signal from a higher-level recognizer that its pattern is "expected," then the threshold is effectively lowered (that is, made easier to achieve). Alternatively, such a signal may simply add to the total of the weighted inputs, thereby compensating for a missing element. This happens at every level, so that a pattern such as a face that is several levels up from the bottom may be recognized even with multiple missing features.
The ability to recognize patterns even when aspects of them are transformed is called feature invariance, and is dealt with in four ways. First, there are global transformations that are accomplished before the neocortex receives sensory data. We will discuss the voyage of sensory data from the eyes, ears, and skin in the section "The Sensory Pathway" "The Sensory Pathway" on on page 94 page 94.
The second method takes advantage of the redundancy in our cortical pattern memory. Especially for important items, we have learned many different perspectives and vantage points for each pattern. Thus many variations are separately stored and processed.
The third and most powerful method is the ability to combine two lists. One list can have a set of transformations that we have learned may apply to a certain category of pattern; the cortex will apply this same list of possible changes to another pattern. That is how we understand such language phenomena as metaphors and similes.
For example, we have learned that certain phonemes (the basic sounds of language) may be missing in spoken speech (for example, "goin'"). If we then learn a new spoken word (for example, "driving"), we will be able to recognize that word if one of its phonemes is missing even if we have never experienced that word in that form before, because we have become familiar with the general phenomenon of certain phonemes being omitted. As another example, we may learn that a particular artist likes to emphasize (by making larger) certain elements of a face, such as the nose. We can then identify a face with which we are familiar to which that modification has been applied even if we have never seen that modification on that face. Certain artistic modifications emphasize the very features that are recognized by our pattern recognitionbased neocortex. As mentioned, that is precisely the basis of caricature.
The fourth method derives from the size parameters that allow a single module to encode multiple instances of a pattern. For example, we have heard the word "steep" many times. A particular pattern recognition module that is recognizing this spoken word can encode these multiple examples by indicating that the duration of [E] has a high expected variability. If all the modules for words including [E] share a similar phenomenon, that variability could be encoded in the models for [E] itself. However, different words incorporating [E] (or many other phonemes) may have different amounts of expected variability. For example, the word "peak" is likely not to have the [E] phoneme as drawn out as in the word "steep."
LearningAre we not ourselves creating our successors in the supremacy of the earth? Daily adding to the beauty and delicacy of their organization, daily giving them greater skill and supplying more and more of that self-regulating self-acting power which will be better than any intellect?-Samuel Butler, 1871 The princ.i.p.al activities of brains are making changes in themselves.-Marvin Minsky, The Society of Mind The Society of Mind
So far we have examined how we recognize (sensory and perceptual) patterns and recall sequences of patterns (our memory of things, people, and events). However, we are not born with a neocortex filled with any of these patterns. Our neocortex is virgin territory when our brain is created. It has the capability of learning and therefore of creating connections between its pattern recognizers, but it gains those connections from experience.
This learning process begins even before we are born, occurring simultaneously with the biological process of actually growing a brain. A fetus already has a brain at one month, although it is essentially a reptile brain, as the fetus actually goes through a high-speed re-creation of biological evolution in the womb. The natal brain is distinctly a human brain with a human neocortex by the time it reaches the third trimester of pregnancy. At this time the fetus is having experiences, and the neocortex is learning. She can hear sounds, especially her mother's heartbeat, which is one likely reason that the rhythmic qualities of music are universal to human culture. Every human civilization ever discovered has had music as part of its culture, which is not the case with other art forms, such as pictorial art. It is also the case that the beat of music is comparable to our heart rate. Music beats certainly vary-otherwise music would not keep our interest-but heartbeats vary also. An overly regular heartbeat is actually a symptom of a diseased heart. The eyes of a fetus are partially open twenty-six weeks after conception, and are fully open most of the time by twenty-eight weeks after conception. There may not be much to see inside the womb, but there are patterns of light and dark that the neocortex begins to process.
So while a newborn baby has had a bit of experience in the womb, it is clearly limited. The neocortex may also learn from the old brain (a topic I discuss in chapter 5 chapter 5), but in general at birth the child has a lot to learn-everything from basic primitive sounds and shapes to metaphors and sarcasm.
Learning is critical to human intelligence. If we were to perfectly model and simulate the human neocortex (as the Blue Brain Project is attempting to do) and all of the other brain regions that it requires to function (such as the hippocampus and thalamus), it would not be able to do very much-in the same way that a newborn infant cannot do much (other than to be cute, which is definitely a key survival adaptation).
Learning and recognition take place simultaneously. We start learning immediately, and as soon as we've learned a pattern, we immediately start recognizing it. The neocortex is continually trying to make sense of the input presented to it. If a particular level is unable to fully process and recognize a pattern, it gets sent to the next higher level. If none of the levels succeeds in recognizing a pattern, it is deemed to be a new pattern. Cla.s.sifying a pattern as new does not necessarily mean that every aspect of it is new. If we are looking at the paintings of a particular artist and see a cat's face with the nose of an elephant, we will be able to identify each of the distinctive features but will notice that this combined pattern is something novel, and are likely to remember it. Higher conceptual levels of the neocortex, which understand context-for example, the circ.u.mstance that this picture is an example of a particular artist's work and that we are attending an opening of a showing of new paintings by that artist-will note the unusual combination of patterns in the cat-elephant face but will also include these contextual details as additional memory patterns.
New memories such as the cat-elephant face are stored in an available pattern recognizer. The hippocampus plays a role in this process, and we'll discuss what is known about the actual biological mechanisms in the following chapter. For the purposes of our neocortex model, it is sufficient to say that patterns that are not otherwise recognized are stored as new patterns and are appropriately connected to the lower-level patterns that form them. The cat-elephant face, for example, will be stored in several different ways: The novel arrangement of facial parts will be stored as well as contextual memories that include the artist, the situation, and perhaps the fact that we laughed when we first saw it.
Memories that are successfully recognized may also result in the creation of a new pattern to achieve greater redundancy. If patterns are not perfectly recognized, they are likely to be stored as reflecting a different perspective of the item that was recognized.
What, then, is the overall method for determining what patterns get stored? In mathematical terms, the problem can be stated as follows: Using the available limits of pattern storage, how do we optimally represent the input patterns that have thus far been presented? While it makes sense to allow for a certain amount of redundancy, it would not be practical to fill up the entire available storage area (that is, the entire neocortex) with repeated patterns, as that would not allow for a sufficient diversity of patterns. A pattern such as the [E] phoneme in spoken words is something we have experienced countless times. It is a simple pattern of sound frequencies and it undoubtedly enjoys significant redundancy in our neocortex. We could fill up our entire neocortex with repeated patterns of the [E] phoneme. There is a limit, however, to useful redundancy, and a common pattern such as this clearly has reached it.
There is a mathematical solution to this optimization problem called linear programming, which solves for the best possible allocation of limited resources (in this case, a limited number of pattern recognizers) that would represent all of the cases on which the system has trained. Linear programming is designed for systems with one-dimensional inputs, which is another reason why it is optimal to represent the input to each pattern recognition module as a linear string of inputs. We can use this mathematical approach in a software system, and though an actual brain is further constrained by the physical connections it has available that it can adapt between pattern recognizers, the method is nonetheless similar.
An important implication of this optimal solution is that experiences that are routine are recognized but do not result in a permanent memory's being made. With regard to my walk, I experienced millions of patterns at every level, from basic visual edges and shadings to objects such as lampposts and mailboxes and people and animals and plants that I pa.s.sed. Almost none of what I experienced was unique, and the patterns that I recognized had long since reached their optimal level of redundancy. The result is that I recall almost nothing from this walk. The few details that I do remember are likely to get overwritten with new patterns by the time I take another few dozen walks-except for the fact that I have now memorialized this particular walk by writing about it.
One important point that applies to both our biological neocortex and attempts to emulate it is that it is difficult to learn too many conceptual levels simultaneously. We can essentially learn one or at most two conceptual levels at a time. Once that learning is relatively stable, we can go on to learn the next level. We may continue to fine-tune the learning in the lower levels, but our learning focus is on the next level of abstraction. This is true at both the beginning of life, as newborns struggle with basic shapes, and later in life, as we struggle to learn new subject matter, one level of complexity at a time. We find the same phenomenon in machine emulations of the neocortex. However, if they are presented increasingly abstract material one level at a time, machines are capable of learning just as humans do (although not yet with as many conceptual levels).
The output of a pattern can feed back to a pattern at a lower level or even to the pattern itself, giving the human brain its powerful recursive ability. An element of a pattern can be a decision point based on another pattern. This is especially useful for lists that compose actions-for example, getting another tube of toothpaste if the current one is empty. These conditionals exist at every level. As anyone who has attempted to program a procedure on a computer knows, conditionals are vital to describing a course of action.
The Language of ThoughtThe dream acts as a safety-valve for the over-burdened brain.-Sigmund Freud, The Interpretation of Dreams, 1911 1911.
Brain: an apparatus with which we think we think.-Ambrose Bierce, The Devil's Dictionary The Devil's Dictionary
To summarize what we've learned so far about the way the neocortex works, please refer to the diagram of the neocortical pattern recognition module on page 42 page 42.
a) Dendrites enter the module that represents the pattern. Even though patterns may seem to have two- or three-dimensional qualities, they are represented by a one-dimensional sequence of signals. The pattern must be present in this (sequential) order for the pattern recognizer to be able to recognize it. Each of the dendrites is connected ultimately to one or more axons of pattern recognizers at a lower conceptual level that have recognized a lower-level pattern that const.i.tutes part of this pattern. For each of these input patterns, there may be many lower-level pattern recognizers that can generate the signal that the lower-level pattern has been recognized. The necessary threshold to recognize the pattern may be achieved even if not all of the inputs have signaled. The module computes the probability that the pattern it is responsible for is present. This computation considers the "importance" and "size" parameters (see [f] below).Note that some of the dendrites transmit signals into the module and some out of the module. If all of the input dendrites to this pattern recognizer are signaling that their lower-level patterns have been recognized except for one or two, then this pattern recognizer will send a signal down to the pattern recognizer(s) recognizing the lower-level patterns that have not yet been recognized, indicating that there is a high likelihood that that pattern will soon be recognized and that lower-level recognizer(s) should be on the lookout for it.b) When this pattern recognizer recognizes its pattern (based on all or most of the input dendrite signals being activated), the axon (output) of this pattern recognizer will activate. In turn, this axon can connect to an entire network of dendrites connecting to many higher-level pattern recognizers that this pattern is input to. This signal will transmit magnitude information so that the pattern recognizers at the next higher conceptual level can consider it.c) If a higher-level pattern recognizer is receiving a positive signal from all or most of its const.i.tuent patterns except for the one represented by this pattern recognizer, then that higher-level recognizer might send a signal down to this recognizer indicating that its pattern is expected. Such a signal would cause this pattern recognizer to lower its threshold, meaning that it would be more likely to send a signal on its axon (indicating that its pattern is considered to have been recognized) even if some of its inputs are missing or unclear.d) Inhibitory signals from below would make it less likely that this pattern recognizer will recognize its pattern. This can result from recognition of lower-level patterns that are inconsistent with the pattern a.s.sociated with this pattern recognizer (for example, recognition of a mustache by a lower-level recognizer would make it less likely that this image is "my wife").e) Inhibitory signals from above would also make it less likely that this pattern recognizer will recognize its pattern. This can result from a higher-level context that is inconsistent with the pattern a.s.sociated with this recognizer.f) For each input, there are stored parameters for importance, expected size, and expected variability of size. The module computes an overall probability that the pattern is present based on all of these parameters and the current signals indicating which of the inputs are present and their magnitudes. A mathematically optimal way to accomplish this is with a technique called hidden Markov models. When such models are organized in a hierarchy (as they are in the neocortex or in attempts to simulate a neocortex), we call them hierarchical hidden Markov models.
Patterns triggered in the neocortex trigger other patterns. Partially complete patterns send signals down the conceptual hierarchy; completed patterns send signals up the conceptual hierarchy. These neocortical patterns are the language of thought. Just like language, they are hierarchical, but they are not language per se. Our thoughts are not conceived primarily in the elements of language, although since language also exists as hierarchies of patterns in our neocortex, we can have language-based thoughts. But for the most part, thoughts are represented in these neocortical patterns.
As I discussed above, if we were able to detect the pattern activations in someone's neocortex, we would still have little idea what those pattern activations meant without also having access to the entire hierarchy of patterns above and below each activated pattern. That would pretty much require access to that person's entire neocortex. It is hard enough for us to understand the content of our own thoughts, but understanding another person's requires mastering a neocortex different from our own. Of course we don't yet have access to someone else's neocortex; we need instead to rely on her attempts to express her thoughts into language (as well as other means such as gestures). People's incomplete ability to accomplish these communication tasks adds another layer of complexity-it is no wonder that we misunderstand one another as much as we do.
We have two modes of thinking. One is nondirected thinking, in which thoughts trigger one another in a nonlogical way. When we experience a sudden recollection of a memory from years or decades ago while doing something else, such as raking the leaves or walking down the street, the experience is recalled-as all memories are-as a sequence of patterns. We do not immediately visualize the scene unless we can call upon a lot of other memories that enable us to synthesize a more robust recollection. If we do visualize the scene in that way, we are essentially creating it in our mind from hints at the time of recollection; the memory itself is not stored in the form of images or visualizations. As I mentioned earlier, the triggers that led this thought to pop into our mind may or may not be evident. The sequence of relevant thoughts may have been immediately forgotten. Even if we do remember it, it will be a nonlinear and circuitous sequence of a.s.sociations.
The second mode of thinking is directed thinking, which we use when we attempt to solve a problem or formulate an organized response. For example, we might be rehearsing in our mind something we plan to say to someone, or we might be formulating a pa.s.sage we want to write (in a book on the mind, perhaps). As we think about tasks such as these, we have already broken down each one into a hierarchy of subtasks. Writing a book, for example, involves writing chapters; each chapter has sections; each section has paragraphs; each paragraph contains sentences that express ideas; each idea has its configuration of elements; each element and each relations.h.i.+p between elements is an idea that needs to be articulated; and so on. At the same time, our neocortical structures have learned certain rules that should be followed. If the task is writing, then we should try to avoid unnecessary repet.i.tion; we should try to make sure that the reader can follow what is being written; we should try to follow rules about grammar and style; and so on. The writer needs therefore to build a model of the reader in his mind, and that construct is hierarchical as well. In doing directed thinking, we are stepping through lists in our neocortex, each of which expands into extensive hierarchies of sublists, each with its own considerations. Keep in mind that elements in a list in a neocortical pattern can include conditionals, so our subsequent thoughts and actions will depend on a.s.sessments made as we go through the process.
Moreover, each such directed thought will trigger hierarchies of undirected thoughts. A continual storm of ruminations attends both our sensory experiences and our attempts at directed thinking. Our actual mental experience is complex and messy, made up of these lightning storms of triggered patterns, which change about a hundred times a second.