The biggest challenge, however, is assembling all this information into a coherent whole. What are all the things that affect your risk of heart disease, and how do they interact? All Newton needed was three laws of motion and one of gravitation, but a complete model of a cell, an organism, or a society is more than any one human can discover. As knowledge grows, scientists specialize ever more narrowly, but no one is able to put the pieces together because there are far too many pieces. Scientists collaborate, but language is a very slow medium of communication. Scientists try to keep up with othersвЂ™ research, but the volume of publications is so high that they fall farther and farther behind. Often, redoing an experiment is easier than finding the paper that reported it. Machine learning comes to the rescue, scouring the literature for relevant information, translating one areaвЂ™s jargon into anotherвЂ™s, and even making connections that scientists werenвЂ™t aware of. Increasingly, machine learning acts as a giant hub, through which modeling techniques invented in one field make their way into others.. Two hundred and fifty years after Hume set off his bombshell, it was given elegant mathematical form by David Wolpert, a physicist turned machine learner. His result, known as theвЂњno free lunchвЂќ theorem, sets a limit on how good a learner can be. The limit is pretty low: no learner can be better than random guessing! OK, we can go home: the Master Algorithm is just flipping coins. Seriously, though, how is it that no learner can beat coin flipping? And if thatвЂ™s so, how come the world is full of highly successful learners, from spam filters to (any day now) self-driving cars?. Overfitting happens when you have too many hypotheses and not enough data to tell them apart. The bad news is that even for the simple conjunctive learner, the number of hypotheses grows exponentially with the number of attributes. Exponential growth is a scary thing. AnE. coli bacterium can divide into two roughly every fifteen minutes; given enough nutrients it can grow into a mass of bacteria the size of Earth in about a day. When the number of things an algorithm needs to do grows exponentially with the size of its input, computer scientists call it a combinatorial explosion and run for cover. In machine learning, the number of possible instances of a concept is an exponential function of the number of attributes: if the attributes are Boolean, each new attribute doubles the number of possible instances by taking each previous instance and extending it with a yes or no for that attribute. In turn, the number of possible concepts is an exponential function of the number of possible instances: since a concept labels each instance as positive or negative, adding an instance doubles the number of possible concepts. As a result, the number of concepts is an exponential function of an exponential function of the number of attributes! In other words, machine learning is a combinatorial explosion of combinatorial explosions. Perhaps we should just give up and not waste our time on such a hopeless problem?. One such rule is:If Socrates is human, then heвЂ™s mortal. This does the job, but is not very useful because itвЂ™s specific to Socrates. But now we apply NewtonвЂ™s principle and generalize the rule to all entities:If an entity is human, then itвЂ™s mortal. Or, more succinctly:All humans are mortal. Of course, it would be rash to induce this rule from Socrates alone, but we know similar facts about other humans:. [РљР°СЂС‚РёРЅРєР°: pic_11.jpg]. A living cell is a quintessential example of a nonlinear system. The cell performs all of its functions by turning raw materials into end products through a complex web of chemical reactions. We can discover the structure of this network using symbolist methods like inverse deduction, as we saw in the last chapter, but to build a complete model of a cell we need to get quantitative, learning the parameters that couple the expression levels of different genes, relate environmental variables to internal ones, and so on. This is difficult because there is no simple linear relationship between these quantities. Rather, the cell maintains its stability through interlocking feedback loops, leading to very complex behavior. Backpropagation is well suited to this problem because of its ability to efficiently learn nonlinear functions. If we had a complete map of the cellвЂ™s metabolic pathways and enough observations of all the relevant variables, backprop could in principle learn a detailed model of the cell, with a multilayer perceptron to predict each variable as a function of its immediate causes.. Stripped down to its bare essentials (no giggles, please), sexual reproduction consists of swapping material between chromosomes from the mother and father, a process called crossing over. This produces two new chromosomes, one of which consists of the motherвЂ™s chromosome up to the crossover point and the fatherвЂ™s thereafter, and the other one is the opposite:. Nurturing nature. Luckily, we have since cracked the problem, and the Master Algorithm now looks that much closer. WeвЂ™ll see how we did it in Chapter 9 and take it from there. But first we need to gather a very important, still-missing piece of the puzzle: how to learn from very little data. That might seem unnecessary in these days of data deluge, but the truth is that we often find ourselves with reams of data about some parts of the problem we want to solve and almost none about others. This is where one of the most important ideas in machine learning comes in: analogy. All of
the tribes weвЂ™ve met so far have one thing in common: they learn an explicit model of the phenomenon under consideration, whether itвЂ™s a set of rules, a multilayer perceptron, a genetic program, or a Bayesian network. When they donвЂ™t have enough data to do that, theyвЂ™re stumped. But analogizers can learn from as little as one example because they never form a model. LetвЂ™s see what they do instead.. We can represent a cluster by its prototypical element: the image of your mother that you see with your mindвЂ™s eye or the quintessential cat, sports car, country house, or tropical beach. Peoria, Illinois, is the average American town, according to marketing lore. Bob Burns, a fifty-three-year-old building maintenance supervisor in Windham, Connecticut, is AmericaвЂ™s most ordinary citizen-at least if you believe Kevin OвЂ™KeefeвЂ™s bookThe Average American. Anything described by numeric attributes-say, peopleвЂ™s heights, weights, girths, shoe sizes, hair lengths, and so on-makes it easy to compute the average member: his height is the average height of all the cluster members, his weight the average of all the weights, and so on. For categorical attributes, like gender, hair color, zip code, or favorite sport, the вЂњaverageвЂќ is simply the most frequent value. The average member described by this set of attributes may or may not be a real person, but either way itвЂ™s a useful reference to have: if youвЂ™re brainstorming how to market a new product, picturing Peoria as the town where youвЂ™re launching it or Bob Burns as your target customer beats thinking of abstract entities like вЂњthe marketвЂќ or вЂњthe consumer.вЂќ. Clustering and dimensionality reduction get us closer to human learning, but thereвЂ™s still something very important missing. Children donвЂ™t just passively observe the world; they do things. They pick up objects they see, play with them, run around, eat, cry, and ask questions. Even the most advanced visual system is of no use to Robby if it doesnвЂ™t help him interact with the environment. Robby needs to know not just whatвЂ™s where but what to do at each moment. In principle we could teach him using step-by-step instructions, pairing sensor readings with the appropriate actions to take in response, but this is viable only for narrow tasks. The actions you take depend on your goals, not just whatever you are currently perceiving, and those goals can be far in the future. Step-by-step supervision shouldnвЂ™t be needed, in any case. Parents donвЂ™t teach their children to crawl, walk, or run; they figure it out on their own. But none of the learning algorithms weвЂ™ve seen so far can do this.. The first step accomplished, you hurry on to the Bayesian district. Even from a distance, you can see how it clusters around the Cathedral of BayesвЂ™ Theorem. MCMC Alley zigzags randomly along the way. This is going to take a while. You take a shortcut onto Belief Propagation Street, but it seems to loop around forever. Then you see it: the Most Likely Avenue, rising majestically toward the Posterior Probability Gate. Rather than average over all models, you can head straight for the most probable one, confident that the resulting predictions will be almost the same. And you can let genetic search pick the modelвЂ™s structure and gradient descent its parameters. With a sigh of relief, you realize thatвЂ™s all the probabilistic inference youвЂ™ll need, at least until itвЂ™s time to answer questions using the model.. A company like this could quickly become one of the most valuable in the world. As Alexis Madrigal of theAtlantic points out, today your profile can be bought for half a cent or less, but the value of a user to the Internet advertising industry is more like $1,200 per year. GoogleвЂ™s sliver of your data is worth about $20, FacebookвЂ™s $5, and so on. Add to that all the slivers that no one has yet, and the fact that the whole is more than the sum of the parts-a model of you based on all your data is much better than a thousand models based on a thousand slivers-and weвЂ™relooking at easily over a trillion dollars per year for an economy the size of the United States. It doesnвЂ™t take a large cut of that to make a Fortune 500 company. If you decide to take up the challenge and wind up becoming a billionaire, remember where you first got the idea.. The ferret brain rewiring experiments are described inвЂњVisual behaviour mediated by retinal projections directed to the auditory pathway,вЂќ by Laurie von Melchner, Sarah Pallas, and Mriganka Sur (Nature, 2000). Ben UnderwoodвЂ™s story is told in вЂњSeeing with sound,вЂќ by Joanna Moorhead (Guardian, 2007), and at www.benunderwood.com. Otto Creutzfeldt makes the case that the cortex is one algorithm inвЂњGenerality of the functional structure of the neocortexвЂќ (Naturwissenschaften, 1977), as does Vernon Mountcastle inвЂњAn organizing principle for cerebral function: The unit model and the distributed system,вЂќ inThe Mindful Brain, edited by Gerald Edelman and Vernon Mountcastle (MIT Press, 1978). Gary Marcus, Adam Marblestone, and Tom Dean make the case against inвЂњThe atoms of neural computationвЂќ (Science, 2014).. More on the various tribesвЂ™ paths to the Master Algorithm in the corresponding sections below..