Homo sapiens is the species that adapts the world to itself instead of adapting itself to the world. Machine learning is the newest chapter in this million-year saga: with it, the world senses what you want and changes accordingly, without you having to lift a finger. Like a magic forest, your surroundings-virtual today, physical tomorrow-rearrange themselves as you move through them. The path you picked out between the trees and bushes grows into a road. Signs pointing the way spring up in the places where you got lost.. LifeвЂ™s infinite variety is the result of a single mechanism: natural selection. Even more remarkable, this mechanism is of a type very familiar to computer scientists: iterative search, where we solve a problem by trying many candidate solutions, selecting and modifying the best ones, and repeating these steps as many times as necessary. Evolutionis an algorithm. Paraphrasing Charles Babbage, the Victorian-era computer pioneer, God created not species but the algorithm for creating species. TheвЂњendless forms most beautifulвЂќ Darwin spoke of in the conclusion ofThe Origin of Species belie a most beautiful unity: all of those forms are encoded in strings of DNA, and all of them come about by modifying and combining those strings. Who would have guessed, given only a description of this algorithm, that it could produce you and me? If evolution can learn us, it can conceivably also learn everything that can be learned, provided we implement it on a powerful enough computer. Indeed, evolving programs by simulating natural selection is a popular endeavor in machine learning. Evolution, then, is another promising path to the Master Algorithm.. A related, frequently heard objection isвЂњData canвЂ™t replace human intuition.вЂќ In fact, itвЂ™s the other way around: human intuition canвЂ™t replace data. Intuition is what you use when you donвЂ™t know the facts, and since you often donвЂ™t, intuition is precious. But when the evidence is before you, why would you deny it? Statistical analysis beats talent scouts in baseball (as Michael Lewis memorably documented inMoneyball), it beats connoisseurs at wine tasting, and every day we see new examples of what it can do. Because of the influx of data, the boundary between evidence and intuition is shifting rapidly, and as with any revolution, entrenched ways have to be overcome. If IвЂ™m the expert on X at company Y, I donвЂ™t like to be overridden by some guy with data. ThereвЂ™s a saying in industry: вЂњListen to your customers, not to the HiPPO,вЂќ HiPPO being short for вЂњhighest paid personвЂ™s opinion.вЂќ If you want to be tomorrowвЂ™s authority, ride the data, donвЂ™t fight it.. The perceptronвЂ™s revenge. Deeper into the brain. We can get even fancier by allowing rules for intermediate concepts to evolve, and then chaining these rules at performance time. For example, we could evolve the rulesIf the e-mail contains the word loanthen itвЂ™s a scam andIf the e-mail is a scam then itвЂ™s spam. Since a ruleвЂ™s consequent is no longer alwaysspam, this requires introducing additional bits in rule strings to represent their consequents. Of course, the computer doesnвЂ™t literally use the wordscam; it just comes up with some arbitrary bit string to represent the concept, but thatвЂ™s good enough for our purposes. Sets of rules like this, which Holland called classifier systems, are one of the workhorses of the machine-learning tribe he founded: the evolutionaries. Like multilayer perceptrons, classifier systems face the credit-assignment problem-what is the fitness of rules for intermediate concepts?-and Holland devised the so-called bucket brigade algorithm to solve it. Nevertheless, classifier systems are much less widely used than multilayer perceptrons.. Now that we know how to (more or less) solve the inference problem, weвЂ™re ready to learn Bayesian networks from data, because for Bayesians learning is just another kind of probabilistic inference. All you have to do is apply BayesвЂ™ theorem with the hypotheses as the possible causes and the data as the observed effect:. Match me if you can. The dotted border separates the positive and negative examples just fine, but it comes dangerously close to stepping on the landmines at A and B. These examples are support vectors: delete one of them, and the maximum-margin border moves to a different place. In general, the border can be curved, of course, making the margin harder to visualize, but we can think of the border as a snake slithering down the no-manвЂ™s-land, and the margin is how fat the snake can be. If a very fat snake can slither all the way down without blowing itself to smithereens, then the SVM can separate the positive and negative examples very well, and Vapnik showed that in this case we can be confident that the SVM didnвЂ™t overfit. Intuitively, compared to a thin snake, there are fewer ways a fat snake can slither down while avoiding the landmines; and likewise, compared to a low-margin SVM, a high-margin one has fewer chances of overfitting by drawing an overly intricate border.. The single most surprising property of SVMs, however, is that no matter how curvy the frontiers they form, those frontiers are always just straight lines (or hyperplanes, in general). The reason thatвЂ™s not a contradiction is that the straight lines are in a different space. Suppose the examples live on the (x,y) plane, and the boundary between the positive and negative regions is the parabolay =x2. ThereвЂ™s no way to represent it with a straight line, but if we add a third coordinatez, meaning the data now lives in (x,y,z) space, and we set each exampleвЂ™sz coordinate to the square of itsx coordinate, the frontier is now just the diagonal plane defined byy =z. In effect, the data points rise up into the third dimension, some rise more than others by just the right amount, and presto-in this new dimension the positive and negative examples can be separated by a plane. It turns out that we can view what SVMs do with kernels, support vectors, and weights as mapping the data to a higher-dimensional space and finding a maximum-margin hyperplane in that space. For some kernels, the derived space has infinite dimensions, but SVMs are completely unfazed by that. Hyperspace may be the Twilight Zone, but SVMs have figured out how to navigate it.. AnalogizersвЂ™ neatest trick, however, is learning across problem domains. Humans do it all the time: an executive can move from, say, a media company to a consumer-products one without starting from scratch because many of the same management skills still apply. Wall Street hires lots of physicists because physical and financial problems, although superficially very different, often have a similar mathematical structure. Yet all the learners weвЂ™ve seen so far would fall flat if we, say, trained them to predict Brownian motion and then asked them to predict the stock market. Stock prices and the velocities of particles suspended in a fluid are just different variables, so the learner wouldnвЂ™t even know where to start. But analogizers can do this using structure mapping, an algorithm invented by Dedre Gentner, a psychologist at Northwestern University. Structure mapping takes two descriptions, finds a coherent correspondence between some of their parts and relations, and then, based on that correspondence, transfers further properties from one structure to the other. For example, if the structures are the solar system and the atom, we can map planets to electrons and the sun to the nucleus and conclude, as Bohr did, that electrons revolve around the nucleus. The truth is more subtle, of course, and we often need to refine analogies after we make them. But being able to learn from a single example like this is surely a key attribute of a universal learner. When weвЂ™re confronted with a new type of cancer-and that happens all the time because cancers keep mutating-the models weвЂ™ve learned for previous ones donвЂ™t apply. Neither do we have time to gather data on the new cancer from a lot of patients; there may be only one, and she urgently needs a cure. Our best hope is then to compare the new cancer with known ones and try to find one whose behavior is similar enough that some of the same lines of attack will work.. So far weвЂ™ve only seen how to learn one level of clusters, but the world is, of course, much richer than that, with clusters within clusters all the way down to individual objects: living things cluster into plants and animals, animals into mammals, birds, fishes, and so on, all the way down to Fido the family dog. No problem: once weвЂ™ve learned one set of clusters, we can treat them as objects and cluster them in turn, and so on up to the cluster of all things. Alternatively, we can start with a coarse clustering and then further divide each cluster into subclusters: RobbyвЂ™s toys divide into stuffed animals, constructions toys, and so on; stuffed animals into teddy bears, plush kittens, and so on. Children seem to start out in the middle and then work their way up and down. For example, they learndog before they learnanimal orbeagle. This might be a good strategy for Robby, as well.. After a long dayвЂ™s journey, the sun is rapidly nearing the horizon, and you need to hurry before it gets dark. The cityвЂ™s outer wall has five massive gates, each controlled by one of the tribes and leading to its district in Optimization Town. Let us enter through the Gradient Descent Gate, after whispering the watchword-вЂњdeep learningвЂќ-to the guard, and spiral in toward the Towers of Representation. From the gate the street ascends steeply up the hill to the citadelвЂ™s Squared Error Gate, but instead you turn left toward the evolutionary sector. The houses in the gradient descent district are all smooth curves and densely intertwined patterns, almost more like a jungle than a city. But when gradient descent gives way to genetic search, the picture changes abruptly. Here the houses rise higher, with structure piled on structure, but the structures are spare, almost vacant, as if waiting to be filled in by gradient descentвЂ™s curves. ThatвЂ™s it: the way to combine the two is to use genetic search to find the structure of the model and let gradient descent fill in its parameters. This is what nature does: evolution creates brain structures, and individual experience modulates them.. A better way for all concerned is to focus on your specific, unusual attributes that are highly predictive of a match, in the sense that they pick out people you like that not everyone else does, and therefore have less competition for. Your job (and your prospective dateвЂ™s) is to provide these attributes. The matcherвЂ™s job is to learn from them, in the same way that an old-fashioned matchmaker would. Compared to a village matchmaker, Match.comвЂ™s algorithm has the advantage that it knows vastly more people, but the disadvantage is that it knows them much moresuperficially. A naГЇve learner, such as a perceptron, will be content with broad generalizations like вЂњgentlemen prefer blondes.вЂќ A more sophisticated one will find patterns like вЂњpeople with the same unusual musical tastes are often good matches.вЂќ If Alice and Bob both like BeyoncГ©, thatalone hardly singles them out for each other. But if they both like Bishop Allen, that makes them at least a little bit more likely to be potential soul mates. If theyвЂ™re both fans of a band the learner does not know about, thatвЂ™s even better, but only a relational algorithm like Alchemy can pick it up. The better the learner, the more itвЂ™s worth your time to teach it about you. But as a rule of thumb, you want to differentiate yourself enough so that it wonвЂ™t confuse you with the вЂњaverage personвЂќ (remember Bob Burns from Chapter 8), but not be so unusual that it canвЂ™t fathom you.. Hahahaha! Seriously, though, should we worry that machines will take over? The signs seem ominous. With every passing year, computers donвЂ™t just do more of the worldвЂ™s work; they make more of the decisions. Who gets credit, who buys what, who gets what job and what raise, which stocks will go up and down, how much insurance costs, where police officers patrol and therefore who gets arrested, how long their prison terms will be, who dates whom and therefore who will be born: machine-learned models already play a part in all of these. The point where we could turn off all our computers without causing the collapse of modern civilization has long passed. Machine learning is the last straw: if computers can start programming themselves, all hope of controlling them is surely lost. Distinguished scientists like Stephen Hawking have called for urgent research on this issue before itвЂ™s too late..