You get home and walk to the mailbox. You have a letter from a friend, routed to you by a learning algorithm that can read handwritten addresses. ThereвЂ™s also the usual junk, selected for you by other learning algorithms (oh, well). You stop for a moment to take in the cool night air. Crime in your city is noticeably down since the police started using statistical learning to predict where crimes are most likely to occur and concentrating beat officers there. You eat dinner with your family. The mayor is in the news. You voted for him because he personally called you on election day, after a learning algorithm pinpointed you as a key undecided voter. After dinner, you watch the ball game. Both teams selected their players with the help ofstatistical learning. Or perhaps you play games on your Xbox with your kids, and KinectвЂ™s learning algorithm figures out where you are and what youвЂ™re doing. Before going to sleep, you take your medicine, which was designed and tested with the help of yet more learning algorithms. Your doctor, too, may have used machine learning to help diagnose you, from interpreting X-rays to figuring out an unusual set of symptoms.. An algorithm is a sequence of instructions telling a computer what to do. Computers are made of billions of tiny switches called transistors, and algorithms turn those switches on and off billions of times per second. The simplest algorithm is: flip a switch. The state of one transistor is one bit of information: one if the transistor is on, and zero if itвЂ™s off. One bit somewhere in your bankвЂ™s computers says whether your account is overdrawn or not. Another bit somewhere in the Social Security AdministrationвЂ™s computers says whether youвЂ™re alive or dead. The second simplest algorithm is: combine two bits. Claude Shannon, better known as the father of information theory, was the first to realize that what transistors are doing, as they switch on and off in response to other transistors, is reasoning. (That was his masterвЂ™s thesis at MIT-the most important masterвЂ™s thesis of all time.) If transistor A turns on only when transistorsB and C are both on, itвЂ™s doing a tiny piece of logical reasoning. If A turns on when either B or C is on, thatвЂ™s another tiny logical operation. And if A turns on whenever B is off, and vice versa, thatвЂ™s a third operation. Believe it or not, every algorithm, no matter how complex, can be reduced to just these three operations: AND, OR, and NOT. Simple algorithms can be represented by diagrams, using different symbols for the AND, OR, and NOT operations. For example, if a fever can be caused by influenza or malaria, and you should take Tylenol for a fever and a headache, this can be expressed as follows:. In the end, the proof is in the pudding. Statistical language learners work, and hand-engineered language systems donвЂ™t. The first eye-opener came in the 1970s, when DARPA, the PentagonвЂ™s research arm, organized the first large-scale speech recognition project. To everyoneвЂ™s surprise, a simple sequential learner of the type Chomsky derided handily beat a sophisticated knowledge-based system. Learners like it are now used in just about every speech recognizer, including Siri. Fred Jelinek, head of the speech group at IBM, famously quipped that вЂњevery time I fire a linguist, the recognizerвЂ™s performance goes up.вЂќ Stuck in the knowledge-engineering mire, computational linguistics had a near-death experience in the late 1980s. Since then, learning-based methods have swept the field, to the point where itвЂ™s hard to find a paper devoid of learning in a computational linguistics conference. Statistical parsers analyze language with accuracy close to that of humans, where hand-coded ones lagged far behind. Machine translation, spelling correction, part-of-speech tagging, word sense disambiguation, question answering, dialogue, summarization: the best systems in these areas all use learning. Watson, theJeopardy! computer champion, would not have been possible without it.. Philosophers have debated HumeвЂ™s problem of induction ever since he posed it, but no one has come up with a satisfactory answer. Bertrand Russell liked to illustrate the problem with the story of the inductivist turkey. On his first morning at the farm, the turkey was fed at 9:00 a.m., but being a good inductivist, he didnвЂ™t jump to conclusions. He first collected many observations on many different days under many different circumstances. Having been fed consistently at 9:00 a.m. for many consecutive days, he finally concluded that yes, he would always be fed at 9:00 a.m. Then came the morning of Christmas eve, and his throat was cut.. A game of twenty questions. What the McCulloch-Pitts neuron doesnвЂ™t do is learn. For that we need to give variable weights to the connections between neurons, resulting in whatвЂ™s called a perceptron. Perceptrons were invented in the late 1950s by Frank Rosenblatt, a Cornell psychologist. A charismatic speaker and lively character, Rosenblatt did more than anyone else to shape the early days of machine learning. The nameperceptron derives from his interest in applying his models to perceptual tasks like speech and character recognition. Rather than implement perceptrons in software, which was very slow in those days, Rosenblatt built his own devices. The weights were implemented by variable resistors like those found in dimmable light switches, and weight learning was carried out by electric motors that turned the knobs on the resistors. (Talk about high tech!). Since perceptrons can only learn linear boundaries, they canвЂ™t learn XOR. And if they canвЂ™t do even that, theyвЂ™re not a very good model of how the brain learns, or a viable candidate for the Master Algorithm.. The S curve is not just important as a model in its own right; itвЂ™s also the jack-of-all-trades of mathematics. If you zoom in on its midsection, it approximates a straight line. Many phenomena we think of as linear are in fact S curves, because nothing can grow without limit. Because of relativity, andcontra Newton, acceleration does not increase linearly with force, but follows an S curve centered at zero. So does electric current as a function of voltage in the resistors found in electronic circuits, or in a light bulb (until the filament melts, which is itself another phase transition). If you zoom out from an S curve, it approximates a step function, with the output suddenly changing from zero to one at the threshold. So depending on the input voltages, the same curve represents the workings of a transistor in both digital computers and analog devices like amplifiers and radio tuners. The early part of an S curve is effectively an exponential, and near the saturation point it approximates exponential decay. When someone talks about exponential growth, ask yourself: How soon will it turn into an S curve? When will the population bomb peter out, MooreвЂ™s law lose steam, or the singularity fail to happen? Differentiate an S curve and you get a bell curve: slow, fast, slow becomes low, high, low. Add a succession of staggered upward and downward S curves, and you get something close to a sine wave. In fact, every function can be closely approximated by a sum of S curves: when the function goes up, you add an S curve; when it goes down, you subtract one. ChildrenвЂ™s learning is not a steady improvement but an accumulation of S curves. So is technological change. Squint at the New York City skyline and you can see a sum of S curves unfolding across the horizon, each as sharp as a skyscraperвЂ™s corner.. BackpropвЂ™s applications are now too many to count. As its fame has grown, more of its history has come to light. It turns out that, as is often the case in science, backprop was invented more than once. Yann LeCun in France and others hit on it at around the same time as Rumelhart. A paper on backprop was rejected by the leading AI conference in the early 1980s because, according to the reviewers, Minsky and Papert had already proved that perceptrons donвЂ™t work. In fact, Rumelhart is credited with inventing backprop by the Columbus test: Columbus was not the first person to discover America, but the last. It turns out that Paul Werbos, a graduate student at Harvard, had proposed a similar algorithm in his PhD thesis in 1974. And in a supreme irony, Arthur Bryson and Yu-Chi Ho, two control theorists, had done the same even earlier: in 1969, the same year that Minsky and Papert publishedPerceptrons! Indeed, the history of machine learning itself shows why we need learning algorithms. If algorithms that automatically find related papers in the scientific literature had existed in 1969, they could have potentially helped avoid decades of wasted time and accelerated who knows what discoveries.. If the states and observations are continuous variables instead of discrete ones, the HMM becomes whatвЂ™s known as a Kalman filter. Economists use Kalman filters to remove noise from time series of quantities like GDP, inflation, and unemployment. The вЂњtrueвЂќ GDP values are the hidden states; at each time step, the true value should be similar to the observed one, but also to the previous true value, since the economy seldom makes abrupt jumps. The Kalman filter trades off these two, yielding a smoother curve that still accords with the observations. When a missile cruises to its target, itвЂ™s a Kalman filter that keeps it on track. Without it, there would have been no man on the moon.. Being classic AI types, Newell, Simon, and their students and followers were strong believers in the primacy of problem solving. If the problem solver is powerful, the learner can piggyback on it and be simple. Indeed, learning is just another kind of problem solving. Newell and company made a concerted effort to reduce all learning to chunking and all cognition to Soar, but in the end they failed. One problem was that, as the problem solver learned more chunks, and more complicated ones, the cost of trying them often became so high that the program got slower instead of faster. Somehow humans avoid this, but so far researchers in this area have not figured out how. On top of that, trying to reduce reinforcement learning, supervised learning, and everything else to chunking ultimately created more problems than it solved. Eventually, the Soar researchers conceded defeat and incorporated those other types of learning into Soar as separate mechanisms. Nevertheless, chunking remains a preeminent example of a learning algorithm inspired by psychology, and the true Master Algorithm, whatever it turns out to be, must surely share its ability to improve with practice.. If we endow Robby the robot with all the learning abilities weвЂ™ve seen so far in this book, heвЂ™ll be pretty smart but still a bit autistic. HeвЂ™ll see the world as a bunch of separate objects, which he can identify, manipulate, and even make predictions about, but he wonвЂ™t understand that the world is a web of interconnections. Robby the doctor would be very good at diagnosing someone with the flu based on his symptoms but unable to suspect that the patient has swine flu because he has been in contact with someone infected with it. Before Google, search engines decided whether a web page was relevant to your query by looking at its content-what else? Brin and PageвЂ™s insight was that the strongest sign a page is relevant is that relevant pages link to it. Similarly, if you want to predict whether a teenager is at risk of starting to smoke, by far the best thing you can do is check whether her close friends smoke. An enzymeвЂ™s shape is as inseparable from the shapes of the molecules it brings together as a lock is from its key. Predator and prey have deeply entwined properties, each evolved to defeat the otherвЂ™s properties. In all of these cases, the best way to understand an entity-whether itвЂ™s a person, an animal, a web page, or a molecule-is to understand how it relates to other entities. This requires a new kind of learning that doesnвЂ™t treat the data as a random sample of unrelated objects but as a glimpse into a complex network. Nodes in the network interact; what you do to one affects the others and comes back to affect you. Relational learners, as theyвЂ™re called, may not quite have social intelligence, but theyвЂ™re the next best thing. In traditional statistical learning, every man is an island, entire of itself. In relational learning, every man is a piece of the continent, a part of the main. Humans arerelational learners, wired to connect, and if we want Robby to grow into a perceptive, socially adept robot, we need to wire him to connect, too.. The neatest trick a relational learner can do is to turn a sporadic teacher into an assiduous one. For an ordinary classifier, examples without classes are useless. If IвЂ™m given a patientвЂ™s symptoms, but not the diagnosis, that doesnвЂ™t help me learn to diagnose. But if I know that some of the patientвЂ™s friends have the flu, thatвЂ™s indirect evidence that he may have the flu as well. Diagnosing a few people in a network and then propagating those diagnosesto their friends, and their friendsвЂ™ friends, is the next best thing to diagnosing everyone. The inferred diagnoses may be noisy, but the overall statistics of how symptoms correlate with the flu will probably be a lot more accurate and complete than if I had only a handful of isolated diagnoses to draw on. Children are very good at making the most of the sporadic supervision they get (provided they donвЂ™t choose to ignore it). Relational learners share some of that ability.. Now that youвЂ™ve toured the machine learning wonderland, letвЂ™s switch gears and see what it all means to you. Like the red pill inThe Matrix, the Master Algorithm is the gateway to a different reality: the one you already live in but didnвЂ™t know it yet. From dating to work, from self-knowledge to the future of society, from data sharing to war, and from the dangers of AI to the next step in evolution, a new world is taking shape, and machine learning is the key that unlocks it. This chapter will help you make the most of it in your life and be ready for what comes next. Machine learning will not single-handedly determine the future, any more than any other technology; itвЂ™s what we decide to do with it that counts, and now you have the tools to decide.. Google + Master Algorithm = Skynet?.