Complexity - A Guided Tour - BestLightNovel.com
You’re reading novel Complexity - A Guided Tour Part 16 online at BestLightNovel.com. Please use the follow button to get notification about the latest chapter next time when you visit BestLightNovel.com. Use F11 button to read novel in full-screen(PC only). Drop by anytime you want to read free – fast – latest novel. It’s great if you could leave a comment, share your opinion about the new chapters, new novel with others on the internet. We’ll do our best to bring you the finest, latest novel everyday. Enjoy
GENETIC REGULATORY NETWORKS.
As I mentioned in chapter 7, humans have about 25,000 genes, roughly the same number as the mustard plant arabidopsis. What seems to generate the complexity of humans as compared to, say, plants is not how many genes we have but how those genes are organized into networks.
There are many genes whose function is to regulate other genes-that is, control whether or not the regulated genes are expressed. A well-known simple example of gene regulation is the control of lactose metabolism in E. coli bacteria. These bacteria usually live off of glucose, but they can also metabolize lactose. The ability to metabolize lactose requires the cell to contain three particular protein enzymes, each encoded by a separate gene. Let's call these genes A, B, and C. There is a fourth gene that encodes a protein, called a lactose repressor, which binds to genes A, B, and C, in effect, turning off these genes. If there is no lactose in the bacterium's local environment, lactose repressors are continually formed, and no lactose metabolism takes place. However, if the bacterium suddenly finds itself in a glucose-free but lactose-rich environment, then lactose molecules bind to the lactose repressor and detach it from genes A, B, and C, which then proceed to produce the enzymes that allow lactose metabolism.
Regulatory interactions like this, some much more intricate, are the heart and soul of complexity in genetics. Network thinking played a role in understanding these interactions as early as the 1960s, with the work of Stuart Kauffman (more on this in chapter 18). More recently, network scientists teaming up with geneticists have demonstrated evidence that at least some networks of these interactions are approximately scale-free. Here, the nodes are individual genes, and each node links to all other genes it regulates (if any).
Resilience is mandatory for genetic regulatory networks. The processes of gene transcription and gene regulation are far from perfect; they are inherently error-ridden and often affected by pathogens such as viruses. Having a scale-free structure helps the system to be mostly impervious to such errors.
Metabolic Networks.
As I described in chapter 12, cells in most organisms have hundreds of different metabolic pathways, many interconnecting, forming networks of metabolic reactions. Albert-Laszlo Barabasi and colleagues looked in detail at the structure of metabolic networks in forty-three different organisms and found that they all were "well fitted" by a power-law distribution-i.e., are scale free. Here the nodes in the network are chemical substrates-the fodder and product of chemical reactions. One substrate is considered to be linked to another if the first partic.i.p.ates in a reaction that produces the second. For example, in the second step of the pathway called glycolysis, the substrate glucose-6-phosphate produces the substrate fructose-6-phosphate, so there would be a link in the network from the first substrate to the second.
Since metabolic networks are scale-free, they have a small number of hubs that are the products of a large number of reactions involving many different substrates. These hubs turn out to be largely the same chemicals in all the diverse organisms studied-the chemicals that are known to be most essential for life. It has been hypothesized that metabolic networks evolved to be scale-free so as to ensure robustness of metabolism and to optimize "communication" among different substrates.
Epidemiology.
In the early 1980s, in the early stages of the worldwide AIDS epidemic, epidemiologists at the Centers for Disease Control in Atlanta identified a Canadian flight attendant, Gaetan Dugas, as part of a cl.u.s.ter of men with AIDS who were responsible for infecting large numbers of other gay men in many different cities around the world. Dugas was later vilified in the media as "patient zero," the first North American with AIDS, who was responsible for introducing and widely spreading the AIDS virus in the United States and elsewhere. Although later studies debunked the theory that Dugas was the source of the North American epidemic, there is no question that Dugas, who claimed to have had hundreds of different s.e.xual partners each year, infected many people. In network terms, Dugas was a hub in the network of s.e.xual contacts.
Epidemiologists studying s.e.xually transmitted diseases often look at networks of s.e.xual contacts, in which nodes are people and links represent s.e.xual partners.h.i.+ps between two people. Recently, a group consisting of sociologists and physicists a.n.a.lyzed data from a Swedish survey of s.e.xual behavior and found that the resulting network has a scale-free structure; similar results have been found in studies of other s.e.xual networks.
In this case, the vulnerability of such networks to the removal of hubs can work in our favor. It has been suggested that safe-s.e.x campaigns, vaccinations, and other kinds of interventions should mainly be targeted at such hubs.
How can these hubs be identified without having to map out huge networks of people, for which data on s.e.xual partners may not be available?
A clever yet simple method was proposed by another group of network scientists: choose a set of random people from the at-risk population and ask each to name a partner. Then vaccinate that partner. People with many partners will be more likely to be named, and thus vaccinated, under this scheme.
This strategy, of course, can be exported to other situations in which "hub-targeting" is desired, such as fighting computer viruses transmitted by e-mail: in this case, one should target anti-virus methods to the computers of people with large address books, rather than depending on all computer users to perform virus detection.
FIGURE 16.1. Example of a food web. (Ill.u.s.tration from USGS Alaska Science Center, [http://www.absc.usgs.gov/research/seabird_foragefish/marinehabitat/home.html].).
Ecologies and Food Webs.
In the science of ecology, the common notion of food chain has been extended to food web, a network in which a node represents a species or group of species; if species B is part of the diet of species A, then there is a link from node A to node B. Figure 16.1 shows a simple example of a food web.
Mapping the food webs of various ecosystems has been an important part of ecological science for some time. Recently, researchers have been applying network science to the a.n.a.lysis of these webs in order to understand biodiversity and the implications of different types of disruptions to that biodiversity in ecosystems.
Several ecologists have claimed that (at least some) food webs possess the small-world property, and that some of these have scale-free degree distributions, which evolved presumably to give food webs resilience to the random deletion of species. Others ecologists have disagreed that food webs have scale-free structure, and the ecology research community has recently seen a lot of debate on this issue, mainly due to the difficulty of interpreting real-world data.
Significance of Network Thinking.
The examples above are only a small sampling of the ways in which network thinking is affecting various areas of science and technology. Scale-free degree distributions, cl.u.s.tering, and the existence of hubs are the common themes; these features give rise to networks with small-world communication capabilities and resilience to deletion of random nodes. Each of these properties is significant for understanding complex systems, both in science and in technology.
In science, network thinking is providing a novel language for expressing commonalities across complex systems in nature, thus allowing insights from one area to influence other, disparate areas. In a self-referential way, network science itself plays the role of a hub-the common connection among otherwise far-flung scientific disciplines.
In technology, network thinking is providing novel ways to think about difficult problems such as how to do efficient search on the Web, how to control epidemics, how to manage large organizations, how to preserve ecosystems, how to target diseases that affect complex networks in the body, how to target modern criminal and terrorist organizations, and, more generally, what kind of resilience and vulnerabilities are intrinsic to natural, social, and technological networks, and how to exploit and protect such systems.
Where Do Scale-Free Networks Come From?
No one purposely designed the Web to be scale-free. The Web's degree distribution, like that of the other networks I've mentioned above, is an emergent outcome of the way in which the network was formed, and how it grows.
In 1999 physicists Albert-Laszlo Barabasi and Reka Albert proposed that a particular growing process for networks, which they called preferential attachment, is the explanation for the existence of most (if not all) scale-free networks in the real world. The idea is that networks grow in such a way that nodes with higher degree receive more new links than nodes with lower degree. Intuitively this makes sense. People with many friends tend to meet more new people and thus make more new friends than people with few friends. Web pages with many incoming links are easier to find than those with few incoming links, so more new Web pages link to the high-degree ones. In other words, the rich get richer, or perhaps the linked get more linked. Barabasi and Albert showed that growth by preferential attachment leads to scale-free degree distributions. (Unbeknownst to them at the time, this process and its power-law outcome had been discovered independently at least three times before.) The growth of so-called scientific citation networks is one example of the effects of preferential attachment. Here the nodes are papers in the scientific literature; each paper receives a link from all other papers that cite it. Thus the more citations others have given to your paper, the higher its degree in the network. One might a.s.sume that a large number of citations is an indicator of good work; for example, in academia, this measure is routinely used in making decisions about tenure, pay increases, and other rewards. However, it seems that preferential attachment often plays a large role. Suppose you and Joe Scientist have independently written excellent articles about the same topic. If I happen to cite your article but not Joe's in my latest opus, then others who read only my paper will be more likely to cite yours (usually without reading it). Other people will read their papers, and also be more likely to cite you than to cite Joe. The situation for Joe gets worse and worse as your situation gets better and better, even though your paper and Joe's were both of the same quality. Preferential attachment is one mechanism for getting to what the writer Malcolm Gladwell called tipping points-points at which some process, such as citation, spread of fads, and so on, starts increasing dramatically in a positive-feedback cycle. Alternatively, tipping points can refer to failures in a system that induce an accelerating systemwide spread of additional failures, which I discuss below.
Power Laws and Their Skeptics.
So far I have implied that scale-free networks are ubiquitous in nature due to the adaptive properties of robustness and fast communication a.s.sociated with power-law degree distributions, and that the mechanism by which they form is growth by preferential attachment. These notions have given scientists new ways of thinking about many different scientific problems.
However compelling all this may seem, scientists are supposed to be skeptical by nature, especially of new, relatively untested ideas, and even more particularly of ideas that claim generality over many disciplines. Such skepticism is not only healthy, it is also essential for the progress of science. Thus, fortunately, not everyone has jumped on the network-science bandwagon, and even many who have are skeptical concerning some of the most optimistic statements about the significance of network science for complex systems research. This skepticism is founded on the following arguments.
Too many phenomena are being described as power-law or scale-free. It's typically rather difficult to obtain good data about real-world network degree distributions. For example, the data used by Barabasi and colleagues for a.n.a.lyzing metabolic networks came from a Web-based database to which biologists from all over the world contributed information. Such biological databases, while invaluable to research, are invariably incomplete and error-ridden. Barabasi and colleagues had to rely on statistics and curve-fitting to determine the degree distributions in various metabolic networks-an imperfect method, yet the one that is most often used in a.n.a.lyzing real-world data. A number of networks previously identified to be "scale-free" using such techniques have later been shown to in fact have non-scale-free distributions.
As noted by philosopher and historian of biology Evelyn Fox Keller, "Current a.s.sessments of the commonality of power laws are probably overestimates." Physicist and network scientist Cosma Shalizi had a less polite phrasing of the same sentiments: "Our tendency to hallucinate power laws is a disgrace." As I write this, there are still considerable controversies over which real-world networks are indeed scale-free.
Even for networks that are actually scale-free, there are many possible causes for power law degree distributions in networks; preferential attachment is not necessarily the one that actually occurs in nature. As Cosma Shalizi succinctly said: "there turn out to be nine and sixty ways of constructing power laws, and every single one of them is right." When I was at the Santa Fe Inst.i.tute, it seemed that there was a lecture every other day on a new hypothesized mechanism that resulted in power law distributions. Some are similar to preferential attachment, some work quite differently. It's not obvious how to decide which ones are the mechanisms that are actually causing the power laws observed in the real world.
The claimed significance of network science relies on models that are overly simplified and based on unrealistic a.s.sumptions. The small-world and scale-free network models are just that-models-which means that they make simplifying a.s.sumptions that might not be true of real-world networks. The hope in creating such simplified models is that they will capture at least some aspects of the phenomenon they are designed to represent. As we have seen, these two network models, in particular the scale-free model, indeed seem to capture something about degree-distributions, cl.u.s.tering, and resilience in a large number of real-world systems (though point 1 above suggests that the number might not be as large as some think).
However, simplified models of networks, in and of themselves, cannot explain everything about their real-world counterparts. In both the small-world and scale-free models, all nodes are a.s.sumed to be identical except for their degree; and all links are the same type and have the same strength. This is not the case in real-world networks. For example, in the real version of my social network (whose simplified model was shown in figure 14.2), some friends.h.i.+p links are stronger than others. Kim and Gar are both friends of mine but I know Kim much better, so I might be more likely to tell her about important personal events in my life. Furthermore, Kim is a woman and Gar is a man, which might increase my likelihood of confiding in her but not in Gar. Similarly, my friend Greg knows and cares a lot more about math than Kim, so if I wanted to share some neat mathematical fact I learned, I'd be much more likely to tell Greg about it than Kim. Such differences in link and node types as well as link strength can have very significant effects on how information spreads in a network, effects that are not captured by the simplified network models.
Information Spreading and Cascading Failure in Networks.
In fact, understanding the ways in which information spreads in networks is one of the most important open problems in network science. The results I have described in this and the previous chapter are all about the structure of networks-e.g., their static degree distributions-rather than dynamics of spreading information in a network.
What do I mean by "spreading information in a network"? Here I'm using the term information to capture any kind of communication among nodes. Some examples of information spreading are the spread of rumors, gossip, fads, opinions, epidemics (in which the communication between people is via germs), electrical currents, Internet packets, neurotransmitters, calories (in the case of food webs), vote counts, and a more general network-spreading phenomenon called "cascading failure."
The phenomenon of cascading failure emphasizes the need to understand information spreading and how it is affected by network structure. Cascading failure in a network happens as follows: Suppose each node in the network is responsible for performing some task (e.g., transmitting electrical power). If a node fails, its task gets pa.s.sed on to other nodes. This can result in the other nodes getting overloaded and failing, pa.s.sing on their task to still other nodes, and so forth. The result is an accelerating domino effect of failures that can bring down the entire network.
Examples of cascading failure are all too common in our networked world. Here are two fairly recent examples that made the national news: August 2003: A ma.s.sive power outage hit the Midwestern and Northeastern United States, caused by cascading failure due to a shutdown at one generating plant in Ohio. The reported cause of the shutdown was that electrical lines, overloaded by high demand on a very hot day, sagged too far down and came into contact with overgrown trees, triggering an automatic shutdown of the lines, whose load had to be s.h.i.+fted to other parts of the electrical network, which themselves became overloaded and shut down. This pattern of overloading and subsequent shutdown spread rapidly, eventually resulting in about 50 million customers in the Eastern United States and Canada losing electricity, some for more than three days.
August 2007: The computer system of the U.S. Customs and Border Protection Agency went down for nearly ten hours, resulting in more than 17,000 pa.s.sengers being stuck in planes sitting on the tarmac at Los Angeles International Airport. The cause turned out to be a malfunction in a single network card on a desktop computer. Its failure quickly caused a cascading failure of other network cards, and within about an hour of the original failure, the entire system shut down. The Customs agency could not process arriving international pa.s.sengers, some of whom had to wait on airplanes for more than five hours.
A third example shows that cascading failures can also happen when network nodes are not electronic devices but rather corporations.
AugustSeptember 1998: Long-Term Capital Management (LTCM), a private financial hedge fund with credit from several large financial firms, lost nearly all of its equity value due to risky investments. The U.S. Federal Reserve feared that this loss would trigger a cascading failure in worldwide financial markets because, in order to cover its debts, LTCM would have to sell off much of its investments, causing prices of stocks and other securities to drop, which would force other companies to sell off their investments, causing a further drop in prices, et cetera. At the end of September 1998, the Federal Reserve acted to prevent such a cascading failure by brokering a bailout of LTCM by its major creditors.
The network resilience I talked about earlier-the ability of networks to maintain short average path lengths in spite of the failure of random nodes-doesn't take into account the cascading failure scenario in which the failure of one node causes the failure of other nodes. Cascading failures provide another example of "tipping points," in which small events can trigger accelerating feedback, causing a minor problem to balloon into a major disruption. Although many people worry about malicious threats to our world's networked infrastructure from hackers or "cyber-terrorists," it may be that cascading failures pose a much greater risk. Such failures are becoming increasingly common and dangerous as our society becomes more dependent on computer networks, networked voting machines, missile defense systems, electronic banking, and the like. As Andreas Antonopoulos, a scientist who studies such systems, has pointed out, "The threat is complexity itself."
Indeed, a general understanding of cascading failures and strategies for their prevention are some of the most active current research areas in network science. Two current approaches are theories called Self-Organized Criticality (SOC) and Highly Optimized Tolerance (HOT). SOC and HOT are examples of the many theories that propose mechanisms different from preferential attachment for how scale-free networks arise. SOC and HOT each propose a general set of mechanisms for cascading failures in both evolved and engineered systems.
The simplified models of small-world networks and scale-free networks described in the previous chapter have been extraordinarily useful, as they have opened up the idea of network thinking to many different disciplines and established network science as a field in its own right. The next step is understanding the dynamics of information and other quant.i.ties in networks. To understand the dynamics of information in networks such as the immune system, ant colonies, and cellular metabolism (cf. chapter 12), network science will have to characterize networks in which the nodes and links continually change in both time and s.p.a.ce. This will be a major challenge, to say the least. As Duncan Watts eloquently writes: "Next to the mysteries of dynamics on a network-whether it be epidemics of disease, cascading failures in power systems, or the outbreak of revolutions-the problems of networks that we have encountered up to now are just pebbles on the seash.o.r.e."
CHAPTER 17.
The Mystery of Scaling.
THE PREVIOUS TWO CHAPTERS SHOWED how network thinking is having profound effects on many areas of science, particularly biology. Quite recently, a kind of network thinking has led to a proposed solution for one of biology's most puzzling mysteries: the way in which properties of living organisms scale with size.
Scaling in Biology.
Scaling describes how one property of a system will change if a related property changes. The scaling mystery in biology concerns the question of how the average energy used by an organism while resting-the basal metabolic rate-scales with the organism's body ma.s.s. Since metabolism, the conversion by cells of food, water, air, and light to usable energy, is the key process underlying all living systems, this relation is enormously important for understanding how life works.
It has long been known that the metabolism of smaller animals runs faster relative to their body size than that of larger animals. In 1883, German physiologist Max Rubner tried to determine the precise scaling relations.h.i.+p by using arguments from thermodynamics and geometry. Recall from chapter 3 that processes such as metabolism, that convert energy from one form to another, always give off heat. An organism's metabolic rate can be defined as the rate at which its cells convert nutrients to energy, which is used for all the cell's functions and for building new cells. The organism gives off heat at this same rate as a by-product. An organism's metabolic rate can thus be inferred by measuring this heat production.
If you hadn't already known that smaller animals have faster metabolisms relative to body size than large ones, a naive guess might be that metabolic rate scales linearly with body ma.s.s-for example, that a hamster with eight times the body ma.s.s of a mouse would have eight times that mouse's metabolic rate, or even more extreme, that a hippopotamus with 125,000 times the body ma.s.s of a mouse would have a metabolic rate 125,000 times higher.
The problem is that the hamster, say, would generate eight times the amount of heat as the mouse. However, the total surface area of the hamster's body-from which the heat must radiate-would be only about four times the total surface of the mouse. This is because as an animal gets larger, its surface area grows more slowly than its ma.s.s (or equivalent, its volume).
This is ill.u.s.trated in figure 17.1, in which a mouse, hamster, and hippo are represented by spheres. You might recall from elementary geometry that the formula for the volume of a sphere is four-thirds pi times the radius cubed, where pi 3.14159. Similarly, the formula for the surface area of a sphere is four times pi times the radius squared. We can say that "volume scales as the cube of the radius" whereas "surface area scales as the square of the radius." Here "scales as" just means "is proportional to"-that is, ignore the constants 4 / 3 pi and 4 pi. As ill.u.s.trated in figure 17.1, the hamster sphere has twice the radius of the mouse sphere, and it has four times the surface area and eight times the volume of the mouse sphere. The radius of the hippo sphere (not drawn to scale) is fifty times the mouse sphere's radius; the hippo sphere thus has 2,500 times the surface area and 125,000 times the volume of the mouse sphere. You can see that as the radius is increased, the surface area grows (or "scales") much more slowly than the volume. Since the surface area scales as the radius squared and the volume scales as the radius cubed, we can say that "the surface area scales as the volume raised to the two-thirds power." (See the notes for the derivation of this.) FIGURE 17.1. Scaling properties of animals (represented as spheres). (Drawing by David Moser.) Raising volume to the two-thirds power is shorthand for saying "square the volume, and then take its cube root."
Generating eight times the heat with only four times the surface area to radiate it would result in one very hot hamster. Similarly, the hippo would generate 125,000 times the heat of the mouse but that heat would radiate over a surface area of only 2,500 times the mouse's. Ouch! That hippo is seriously burning.
Nature has been very kind to animals by not using that naive solution: our metabolisms thankfully do not scale linearly with our body ma.s.s. Max Rubner reasoned that nature had figured out that in order to safely radiate the heat we generate, our metabolic rate should scale with body ma.s.s in the same way as surface area. Namely, he proposed that metabolic rate scales with body ma.s.s to the two-thirds power. This was called the "surface hypothesis," and it was accepted for the next fifty years. The only problem was that the actual data did not obey this rule.
This was discovered in the 1930s by a Swiss animal scientist, Max Kleiber, who performed a set of careful measures of metabolism rate of different animals. His data showed that metabolic rate scales with body ma.s.s to the three-fourths power: that is, metabolic rate is proportional to bodyma.s.s3/4. You'll no doubt recognize this as a power law with exponent 3/4. This result was surprising and counterintuitive. Having an exponent of 3/4 rather than 2/3 means that animals, particularly large ones, are able to maintain a higher metabolic rate than one would expect, given their surface area. This means that animals are more efficient than simple geometry predicts.
Figure 17.2 ill.u.s.trates such scaling for a number of different animals. The horizontal axis gives the body ma.s.s in kilograms and the vertical axis gives the average basal metabolic rate measured in watts. The labeled dots are the actual measurements for different animals, and the straight line is a plot of metabolic rate scaling with body ma.s.s to exactly the three-fourths power. The data do not exactly fit this line, but they are pretty close. figure 17.2 is a special kind of plot-technically called a double logarithmic (or log-log) plot-in which the numbers on both axes increase by a power of ten with each tic on the axis. If you plot a power law on a double logarithmic plot, it will look like a straight line, and the slope of that line will be equal to the power law's exponent. (See the notes for an explanation of this.) FIGURE 17.2. Metabolic rate of various animals as a function of their body ma.s.s. (From K. Schmidt-Nielsen, Scaling: Why Is Animal Size So Important? Copyright 1984 by Cambridge University Press. Reprinted with permission of Cambridge University Press.) This power law relation is now called Kleiber's law. Such 3/4-power scaling has more recently been claimed to hold not only for mammals and birds, but also for the metabolic rates of many other living beings, such as fish, plants, and even single-celled organisms.
Kleiber's law is based only on observation of metabolic rates and body ma.s.ses; Kleiber offered no explanation for why his law was true. In fact, Kleiber's law was baffling to biologists for over fifty years. The ma.s.s of living systems has a huge range: from bacteria, which weigh less than one one-trillionth of a gram, to whales, which can weigh over 100 million grams. Not only does the law defy simple geometric reasoning; it is also surprising that such a law seems to hold so well for organisms over such a vast variety of sizes, species types, and habitat types. What common aspect of nearly all organisms could give rise to this simple, elegant law?
Several other related scaling relations.h.i.+ps had also long puzzled biologists. For example, the larger a mammal is, the longer its life span. The life span for a mouse is typically two years or so; for a pig it is more like ten years, and for an elephant it is over fifty years. There are some exceptions to this general rule, notably humans, but it holds for most mammalian species. It turns out that if you plot average life span versus body ma.s.s for many different species, the relations.h.i.+p is a power law with exponent 1/4. If you plot average heart rate versus body ma.s.s, you get a power law with exponent 1/4 (the larger an animal, the slower its heart rate). In fact, biologists have identified a large collection of such power law relations.h.i.+ps, all having fractional exponents with a 4 in the denominator. For that reason, all such relations.h.i.+ps have been called quarter-power scaling laws. Many people suspected that these quarter-power scaling laws were a signature of something very important and common in all these organisms. But no one knew what that important and common property was.
An Interdisciplinary Collaboration.
By the mid-1990s, James Brown, an ecologist and professor at the University of New Mexico, had been thinking about the quarter-power scaling problem for many years. He had long realized that solving this problem-understanding the reason for these ubiquitous scaling laws-would be a key step in developing any general theory of biology. A biology graduate student named Brian Enquist, also deeply interested in scaling issues, came to work with Brown, and they attempted to solve the problem together.