Computers write their own programs. Now thatвЂ™s a powerful idea, maybe even a little scary. If computers start to program themselves, how will we control them? Turns out we can control them quite well, as weвЂ™ll see. A more immediate objection is that perhaps this sounds too good to be true. Surely writing algorithms requires intelligence,creativity, problem-solving chops-things that computers just donвЂ™t have? How is machine learning distinguishable from magic? Indeed, as of today people can write many programs that computers canвЂ™t learn. But, more surprisingly, computers can learn programs that people canвЂ™t write. We know howto drive cars and decipher handwriting, but these skills are subconscious; weвЂ™re not able to explain to a computer how to do these things. If we give a learner a sufficient number of examples of each, however, it will happily figure out how to do them on its own, at which point we can turn it loose. ThatвЂ™s how the post office reads zip codes, and thatвЂ™s why self-driving cars are on the way.. Learning algorithms are the matchmakers: they find producers and consumers for each other, cutting through the information overload. If theyвЂ™re smart enough, you get the best of both worlds: the vast choice and low cost of the large scale, with the personalized touch of the small. Learners are not perfect, and the last step of the decision is usually still for humans to make, but learners intelligently reduce the choices to somethinga human can manage.. A billion Bill Clintons. You might think the 2012 election was a fluke: most elections are not close enough for machine learning to be the deciding factor. But machine learning willcause more elections to be close in the future. In politics, as in everything, learning is an arms race. In the days of Karl Rove, a former direct marketer and data miner, the Republicans were ahead. By 2012, theyвЂ™d fallen behind, but now theyвЂ™re catching up again. We donвЂ™t know whoвЂ™ll be ahead in the next election cycle, but both parties will be working hard to win. That means understanding the voters better and tailoring the candidatesвЂ™ pitches-even choosing the candidates themselves-accordingly. The same applies to entire party platforms, during and between election cycles: if detailed voter models, based on hard data, say a partyвЂ™s current platform is a losing one, the party will change it. As a result, major events aside, gaps between candidates in the polls will be smaller and shorter lived. Other things being equal, the candidates with the better voter models will win, and voters will be better served for it.. The Master AlgorithmвЂ™s impact on technology will not be limited to AI. A universal learner is a phenomenal weapon against the complexity monster. Systems that today are too complex to build will no longer be. Computers will do more with less help from us. They will not repeat the same mistakes over and over again, but learn with practice, like people do. Sometimes, like the butlers of legend, theyвЂ™ll even guess what we want before we express it. If computers make us smarter, computers running the Master Algorithm will make us feel like geniuses. Technological progress will noticeably speed up, not just in computer science but in many different fields. This in turn will add to economic growth and speed povertyвЂ™s decline. With the Master Algorithm to help synthesize and distribute knowledge, the intelligence of an organization will be more than the sum of its parts, not less. Routine jobs will be automated and replaced by more interesting ones. Every job will be done better than it is today, whether by a better-trained human, a computer, or a combination of the two. Stock-market crashes will be fewer and smaller. With a fine grid of sensors covering the globe and learned models to make sense of its output moment by moment, we will no longer be flying blind; the health of our planet will take a turn for the better. A model of you will negotiate the world on your behalf, playing elaborate games with other peopleвЂ™s and entitiesвЂ™ models. And as a result of all this, our lives will be longer, happier, and more productive.. Two hundred and fifty years after Hume set off his bombshell, it was given elegant mathematical form by David Wolpert, a physicist turned machine learner. His result, known as theвЂњno free lunchвЂќ theorem, sets a limit on how good a learner can be. The limit is pretty low: no learner can be better than random guessing! OK, we can go home: the Master Algorithm is just flipping coins. Seriously, though, how is it that no learner can beat coin flipping? And if thatвЂ™s so, how come the world is full of highly successful learners, from spam filters to (any day now) self-driving cars?. For each pair of facts, we construct the rule that allows us to infer the second fact from the first one and generalize it by NewtonвЂ™s principle. When the same general rule is induced over and over again, we can
have some confidence that itвЂ™s true.. Socrates is a philosopher.. As with rule learning, we donвЂ™t want to induce a tree that perfectly predicts the classes of all the training examples, because it would probably overfit. As before, we can use significance tests or a penalty on the size of the tree to prevent this.. Hopfield noticed an interesting similarity between spin glasses and neural networks: an electronвЂ™s spin responds to the behavior of its neighbors much like a neuron does. In the electronвЂ™s case, it flips up if the weighted sum of the neighbors exceeds a threshold and flips (or stays) down otherwise. Inspired by this, he defined a type of neural network that evolves over time in the same way that a spin glass does and postulated that the networkвЂ™s minimum energy states are its memories. Each such state has a вЂњbasin of attractionвЂќ of initial states that converge to it, and in this way the network can do pattern recognition: for example, if one of the memories is the pattern of black-and-white pixels formed by the digit nine and the network sees a distorted nine, it will converge to the вЂњidealвЂќ one and thereby recognize it. Suddenly, a vast body of physical theory was applicable to machine learning, and a flood of statistical physicists poured into the field, helping itbreak out of the local minimum it had been stuck in.. KozaвЂ™s confidence stands out even in a field not known for its shrinking violets. He sees genetic programming as an invention machine, a silicon Edison for the twenty-first century. He and other evolutionaries believe it can learn any program, making it their entry in the Master Algorithm sweepstakes. In 2004, they instituted the annual Humie Awards to recognize вЂњhuman-competitiveвЂќ genetic creations; thirty-nine have been awarded to date.. FromEugene Onegin to Siri. Humans do have one constant guide: their emotions. We seek pleasure and avoid pain. When you touch a hot stove, you instinctively recoil. ThatвЂ™s the easy part. The hard part is learning not to touch the stove in the first place. That requires moving to avoid a sharp pain that you have not yet felt. Your brain does this by associating the pain not just with the moment you touch the stove, but with the actions leading up to it. Edward Thorndike called this the law of effect: actions that lead to pleasure are more likely to be repeated in the future; actions that lead to pain, less so. Pleasure travels back through time, so to speak, and actions can eventually become associated with effects that are quite remote from them. Humans can do this kind of long-range reward seeking better than any other animal, and itвЂ™s crucial to our success. In a famous experiment, children were presented with a marshmallow and told that if they resisted eating it for a few minutes, they could have two. The ones who succeeded went on to do better in school and adult life. Perhaps less obviously, companies using machine learning to improve their websites or their business practices face a similar problem. A company may make a change that brings in more revenue in the short term-like selling an inferior product that costs less to make for the same price as the original superior product-but miss seeing that doing this will lose customers in the longer term.. ConnectionistsвЂ™ models are inspired by the brain, with networks of S curves that correspond to neurons and weighted connections between them corresponding to synapses. In Alchemy, two variables are connected if they appear together in some formula, and the probability of a variable given its neighbors is an S curve. (Although I wonвЂ™t show why, itвЂ™s a direct consequence of the master equation we saw in the previous section.) The connectionistsвЂ™ master algorithm is backpropagation, which they use to figure out which neurons are responsible for which errors and adjust their weights accordingly. Backpropagation is a form of gradient descent, which Alchemy uses to optimize the weights of a Markov logic network.. The two leading decision tree learners are presented inC4.5: Programs for Machine Learning,* by J. Ross Quinlan (Morgan Kaufmann, 1992), andClassification and Regression Trees,* by Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone (Chapman and Hall, 1984).вЂњReal-time human pose recognition in parts from single depth images,вЂќ* by Jamie Shotton et al. (Communications of the ACM, 2013), explains how MicrosoftвЂ™s Kinect uses decision trees to track gamersвЂ™ motions. вЂњCompeting approaches to predicting Supreme Court decision making,вЂќ by Andrew Martin et al. (Perspectives on Politics, 2004), describes how decision trees beat legal experts at predicting Supreme Court votes and shows the decision tree for Justice Sandra Day OвЂ™Connor..