Hundreds of new learning algorithms are invented every year, but theyвЂ™re all based on the same few basic ideas. These are what this book is about, and theyвЂ™re all you really need to know to understand how machine learning is changing the world. Far from esoteric, and quite aside even from their use in computers, they are answers to questions that matter to all of us: How do we learn? Is there a better way? What can we predict? Can we trust what weвЂ™ve learned? Rival schools of thought within machine learning have very different answers to these questions. The main ones are five in number, and weвЂ™ll devote a chapter to each. Symbolists view learning as the inverse of deduction and take ideas from philosophy, psychology, and logic. Connectionists reverse engineer the brain and are inspired by neuroscience and physics. Evolutionaries simulate evolution on the computer and draw on genetics and evolutionary biology. Bayesians believe learning is a formof probabilistic inference and have their roots in statistics. Analogizers learn by extrapolating from similarity judgments and are influenced by psychology and mathematical optimization. Driven by the goal of building learning machines, weвЂ™ll tour a good chunk of the intellectual history of the last hundred years and see it in a new light.. More generally, Chomsky is critical of all statistical learning. He has a list of things statistical learners canвЂ™t do, but the list is fifty years out of date. Chomsky seems to equate machine learning with behaviorism, where animal behavior is reduced to associating responses with rewards. But machine learning is not behaviorism. Modern learning algorithms can learn rich internal representations, not just pairwise associations between stimuli.. The rationalist likes to plan everything in advance before making the first move. The empiricist prefers to try things and see how they turn out. I donвЂ™t know if thereвЂ™s a gene for rationalism or one for empiricism, but looking at my computer scientist colleagues, IвЂ™ve observed time and again that they are almost like personality traits: some people are rationalistic to the core and could never have been otherwise; and others are empiricistthrough and through, and thatвЂ™s what theyвЂ™ll always be. The two sides can converse with each other and sometimes draw on each otherвЂ™s results, but they can understand each other only so much. Deep down each believes that what the other does is secondary, and not very interesting.. Hopfield noticed an interesting similarity between spin glasses and neural networks: an electronвЂ™s spin responds to the behavior of its neighbors much like a neuron does. In the electronвЂ™s case, it flips up if the weighted sum of the neighbors exceeds a threshold and flips (or stays) down otherwise. Inspired by this, he defined a type of neural network that evolves over time in the same way that a spin glass does and postulated that the networkвЂ™s minimum energy states are its memories. Each such state has a вЂњbasin of attractionвЂќ of initial states that converge to it, and in this way the network can do pattern recognition: for example, if one of the memories is the pattern of black-and-white pixels formed by the digit nine and the network sees a distorted nine, it will converge to the вЂњidealвЂќ one and thereby recognize it. Suddenly, a vast body of physical theory was applicable to machine learning, and a flood of statistical physicists poured into the field, helping itbreak out of the local minimum it had been stuck in.. Whenever the learnerвЂ™s вЂњretinaвЂќ sees a new image, that signal propagates forward through the network until it produces an output. Comparing this output with the desired one yields an error signal, which then propagates back through the layers until it reaches the retina. Based on this returning signal and on theinputs it had received during the forward pass, each neuron adjusts its weights. As the network sees more and more images of your grandmother and other people, the weights gradually converge to values that let it discriminate between the two. Backpropagation, as this algorithm is known, is phenomenally more powerful than the perceptron algorithm. A single neuron could only learn straight lines. Given enough hidden neurons, a multilayer perceptron, as itвЂ™s called, can represent arbitrarily convoluted frontiers. This makes backpropagation-or simply backprop-the connectionistsвЂ™ master algorithm.. But first he had to graduate. Prudently, he picked a more conservative topic for his dissertation-Boolean circuits with cycles-and in 1959 he earned the worldвЂ™s first PhD in computer science. His PhD advisor, Arthur Burks, nevertheless encouraged HollandвЂ™s interest in evolutionary computation and was instrumental in getting him a faculty job at Michigan and shielding him from senior colleagues who didnвЂ™t think that stuff was computer science. Burks himself was so open-minded because he had been a close collaborator of John von Neumann, who had proved the possibility of self-reproducing machines. Indeed, it had fallen to him to complete the work when von Neumann died of cancer in 1957. That von Neumann could prove that such machines are possible was quite remarkable, given the primitive state of genetics and computer science at the time. But his automaton just made exact copies of itself; evolving automata had to wait for Holland.. [РљР°СЂС‚РёРЅРєР°: pic_13.jpg]. It often happens that, even after we take all conditional independences into account, some nodes in a Bayesian network still have too many parents. Some networks are so dense with arrows that when we print them, the page turns solid black. (The physicist Mark Newman calls themвЂњridiculograms.вЂќ) A doctor needs to simultaneously diagnose all the possible diseases a patient could have, not just one, and every disease is a parent of many different symptoms. A fever could be caused by any number of conditions besides the flu, but itвЂ™s hopeless to try to predict its probability given every possible combination of conditions. All is not lost. Instead of a table specifying the nodeвЂ™s conditional probability for every state of its parents, we can learn a simpler distribution. The most popular choice is a probabilistic version of the logical OR operation: any cause alone can provoke a fever, but each cause has a certain probability of failing to do so, even if itвЂ™s usually sufficient. Heckerman and others have learned Bayesian networks that diagnose hundreds of infectious diseases in this way. Google uses a giant Bayesian network of this type in its AdSense system for automatically choosing ads to place on web pages. The network relates a million content variables to each other and to twelve million words and phrases via over three hundred million arrows, all learned from a hundred billion text snippets and search queries.. This is a radical departure from the way science is usually done. ItвЂ™s like saying, вЂњActually, neither Copernicus nor Ptolemy was right; letвЂ™s just predict the planetsвЂ™ future trajectories assuming Earth goes round the sun and vice versa and average the results.вЂќ. CHAPTER SEVEN: You Are What You Resemble. Nearest-neighbor is the simplest and fastest learning algorithm ever invented. In fact, you could even say itвЂ™s the fastest algorithm of any kind that could ever be invented. It consists of doing exactly nothing, and therefore takes zero time to run. CanвЂ™t beat that. If you want to learn to recognize faces and have a vast database of images labeled face/not face, just let it sit there. DonвЂ™t worry, be happy. Without knowing it, those images already implicitly form a model of what a face is. Suppose youвЂ™re Facebook and you want to automatically identify faces in photos people upload as a prelude to tagging them with their friendsвЂ™ names. ItвЂ™s nice to not have to do anything, given that Facebook users upload upward of three hundred million photos per day. Applying any of the learners weвЂ™ve seen so far to them, with the possible exception of NaГЇve Bayes, would take a truckload of computers. And NaГЇve Bayes is not smart enough to recognize faces.. The same idea of forming a local model rather than a global one applies beyond classification. Scientists routinely use linear regression to predict continuous variables, but most phenomena are not linear. Luckily, theyвЂ™re locally linear because smooth curves are locally well approximated by straight lines. So if instead of trying to fit a straight line to all the data, you just fit it to the points near the query point, you now have a very powerful nonlinear regression algorithm. Laziness pays. If Kennedy had needed a complete theory of international relations to decide what to do about the Soviet missiles in Cuba, he would have been in trouble. Instead, he saw an analogy between that crisis and the outbreak of World War I, and that analogy guided him to the right decisions.. You can probably tell just by looking at this plot that the main street in Palo Alto runs southwest-northeast. You didnвЂ™t draw a street, but you can intuit that itвЂ™s there from the fact that all the points fall along a straight line (or close to it-they can be on different sides of the street). Indeed, the street is University Avenue, and if you want to shop or eat out in Palo Alto, thatвЂ™s the place to go. Asa bonus, once you know that the shops are on University Avenue, you donвЂ™t need two numbers to locate them, just one: the street number (or, if you wanted to be really precise, the distance from the shop to the Caltrain station, on the southwest corner, which is where University Avenue begins).. You can download the learner IвЂ™ve just described from alchemy.cs.washington.edu. We christened it Alchemy to remind ourselves that, despite all its successes, machine learning is still in the alchemy stage of science. If you do download it, youвЂ™ll see that it includes a lot more than the basic algorithm IвЂ™ve described butalso that it is still missing a few things I said the universal learner ought to have, like crossover. Nevertheless, letвЂ™s use the name Alchemy to refer to our candidate universal learner for simplicity.. Chapter Two.