During a break you check on your mutual funds. Most of them use learning algorithms to help pick stocks, and one of them is completely run by a learning system. At lunchtime you walk down the street, smart phone in hand, looking for a place to eat. YelpвЂ™s learning system helps you find it. Your cell phone is chock-full of learning algorithms. TheyвЂ™re hard at work correcting your typos, understanding your spoken commands, reducing transmission errors, recognizing bar codes, and much else. Your phone can even anticipate what youвЂ™re going to do next and advise you accordingly. For example, as youвЂ™re finishing lunch, it discreetly alerts you that your afternoon meeting with an out-of-town visitor will have to start late because her flight has been delayed.. It would be prohibitively expensive, though, if we had to build a new computer for every different thing we want to do. Rather, a modern computer is a vast assembly of transistors that can do many different things, depending on which transistors are activated. Michelangelo said that all he did was see the statue inside the block of marble and carve away the excess stone until the statue was revealed. Likewise, an algorithm carves away the excess transistors in the computer until the intended function is revealed, whether itвЂ™s an airlinerвЂ™s autopilot or a new Pixar movie.. In politics, as in business and war, there is nothing worse than seeing your opponent make moves that you donвЂ™t understand and donвЂ™t know what to do about until itвЂ™s too late. ThatвЂ™s what happened to the Romney campaign. They could see the other side buying ads in particular cable stations in particular towns but couldnвЂ™t tell why; their crystal ball was too fuzzy. In the end, Obama won every battleground state save North Carolina and by larger margins than even the most accurate pollsters had predicted. The most accurate pollsters, in turn, were the ones (like Nate Silver) who used the most sophisticated prediction techniques; they were less accurate than the Obama campaign because they had fewer resources. But they were a lot more accurate than the traditional pundits, whose predictions were based on their expertise.. The most important argument for the brain being the Master Algorithm, however, is that itвЂ™s responsible for everything we can perceive and imagine. If something exists but the brain canвЂ™t learn it, we donвЂ™t know it exists. We may just not see it or think itвЂ™s random. Either way, if we implement the brain in a computer, that algorithm can learn everything we can. Thus one route-arguably the most popular one-to inventing the Master Algorithm is to reverse engineer the brain. Jeff Hawkins took a stab at this in his bookOn Intelligence. Ray Kurzweil pins his hopes for the Singularity-the rise of artificial intelligence that greatly exceeds the human variety-on doing just that and takes a stab at it himself in his bookHow to Create a Mind. Nevertheless, this is only one of several possible approaches, as weвЂ™ll see. ItвЂ™s not even necessarily the most promising one, because the brain is phenomenally complex, and weвЂ™re still in the very early stages of deciphering it. On the other hand, if we canвЂ™t figure out the Master Algorithm, the Singularity wonвЂ™t happen any time soon.. Suppose youвЂ™ve been diagnosed with cancer, and the traditional treatments-surgery, chemotherapy, and radiation therapy-have failed. What happens next will determine whether you live or die. The first step is to get the tumorвЂ™s genome sequenced. Companies like Foundation Medicine in Cambridge, Massachusetts, will do that for you: send them a sample of the tumor and they will send back a list of the known cancer-related mutations in its genome. This is needed because every cancer is different, and no single drug is likely to work for all. Cancers mutate as they spread through your body, and by naturalselection, the mutations most resistant to the drugs youвЂ™re taking are the most likely to grow. The right drug for you may be one that works for only 5 percent of patients, or you may need a combination of drugs that has never been tried before. Perhaps it will take a new drug designed specifically for your cancer, or a sequence of drugs to parry the cancerвЂ™s adaptations. Yet these drugs may have side effects that are deadly for you but not most other people. No doctor can keep track of all the information needed to predict the best treatment for you, given your medical history and your cancerвЂ™s genome. ItвЂ™s an ideal job for machine learning, and yet todayвЂ™s learners arenвЂ™t up to it. Each has some of the needed capabilities but is missing others. The Master Algorithm is the complete package. Applying it to vast amounts of patient and drug data, combined with knowledge mined from the biomedical literature, is how we will cure cancer.. So far we havenвЂ™t done anything that the вЂњdivide and conquerвЂќ algorithm couldnвЂ™t do. Suppose, however, that instead of knowing that Socrates, Plato, and Aristotle are human, we just know that theyвЂ™re philosophers. We still want to conclude that theyвЂ™re mortal, and we have previously induced or been told that all humans are mortal. WhatвЂ™s missing now? A different rule:All philosophers are human. This also a valid generalization (at least until we solve AI and robots start philosophizing), and itвЂњfills the holeвЂќ in our reasoning:. If the history of machine learning were a Hollywood movie, the villain would be Marvin Minsky. HeвЂ™s the evil queen who gives Snow White a poisoned apple, leaving her in suspended animation. (In a 1988 essay, Seymour Papert even compared himself, tongue-in-cheek, to the huntsman the queen sent to kill Snow White in the forest.) And Prince Charming would be a Caltech physicist by the name of John Hopfield. In 1982, Hopfield noticed a striking analogy between the brain and spin glasses, an exotic material much beloved of statistical physicists. This set off a connectionist renaissance that culminated a few years later in the invention of the first algorithms capable of solving the credit-assignment problem, ushering in a new era where machine learning replaced knowledge engineering as the dominant paradigm in AI.. Bayesians and symbolists agree that prior assumptions are inevitable, but they differ in the kinds of prior knowledge they allow. For Bayesians, knowledge goes in the prior distribution over the structure and parameters of the model. In principle, the parameter prior could be anything we please, but ironically, Bayesians tend to choose uninformative priors (like assigning the same probability to all hypotheses) because theyвЂ™re easier to compute with. In any case, humans are not very good at estimating probabilities. For structure, Bayesian networks provide an intuitive way to incorporate knowledge: draw an arrow from A to B if you think that A directly causes B. But symbolists are much more flexible: you can provide as prior knowledge to your learner anything you can encode in logic, and practically anything can be encoded in logic-provided itвЂ™s black and white.. One of the most popular algorithms for nonlinear dimensionality reduction, called Isomap, does just this. It connects each data point in a high-dimensional space (a face, say) to all nearby points (very similar faces), computes the shortest distances between all pairs of points along the resulting network and finds the reduced coordinates that best approximate these distances. In contrast to PCA, facesвЂ™ coordinates in this space are often quite meaningful: one may represent which direction the face is facing (left profile, three quarters, head on, etc.); another how the face looks (very sad, a little sad, neutral, happy, very happy, etc.); and so on. From understanding motion in video to detecting emotion in speech, Isomap has a surprising ability to zero in on the most important dimensions of complex data.. Representation is the formal language in which the learner expresses its models. The symbolistsвЂ™ formal language is logic, of which rules and decision trees are special cases. The connectionistsвЂ™ is neural networks. The evolutionariesвЂ™ is genetic programs, including classifier systems. The BayesiansвЂ™ is graphical models, an umbrella term for Bayesian networks and Markov networks. TheanalogizersвЂ™ is specific instances, possibly with weights, as in an SVM.. Of course, donвЂ™t be deceived by the simple MLN above for predicting the spread of flu. Picture instead an MLN for diagnosing and curing cancer. The MLN represents a probability distribution over the states of a cell. Every part of the cell, every organelle, every metabolic pathway, every gene and protein is anentity in the MLN, and the MLNвЂ™s formulas encode the dependencies between them. We can ask the MLN, вЂњIs this cell cancerous?вЂќ and probe it with different drugs and see what happens. We donвЂ™t have an MLN like this yet, but later in this chapter IвЂ™ll envisage how it might come about.. In sum, all four kinds of data sharing have problems. These problems all have a common solution: a new type of company that is to your data what your bank is to your money. Banks donвЂ™t steal your money (with rare exceptions). TheyвЂ™re supposed to invest it wisely, and your deposits are FDIC-insured. Many companies today offer to consolidate your data somewhere in the cloud, but theyвЂ™re still a far cry from your personal data bank. If theyвЂ™re cloud providers, they try tolock you in-a big no-no. (Imagine depositing your money with Bank of America and not knowing if youвЂ™ll be able to transfer it to Wells Fargo somewhere down the line.) Some startups offer to hoard your data and then mete it out to advertisers in return for discounts, but to me that misses the point. Sometimes you want to give information to advertisers for free because itвЂ™s in your interests, sometimes you donвЂ™t want to give it at all, and what to share when is a problem that only a good model of you can solve.. The main problem with this scenario, as you may have already guessed, is that letting robots learn ethics by observing humans may not be such a good idea. The robot is liable to get seriously confused when it sees that humansвЂ™ actions often violate their ethical principles. We can clean up the training data by including only the examples where, say, a panel of ethicists agrees that the soldier made the right decision, and the panelists can also inspect and tweak the model post-learning to their satisfaction. Agreement may be hard to reach, however, particularly if the panel includes all the different kinds of people it should. Teaching ethics to robots, with their logical minds and lack of baggage, will force us to examine our assumptions and sort out our contradictions. In this, as in many other areas, the greatest benefit of machine learning may ultimately be not what the machines learn but what we learn by teaching them.. Acknowledgments. Chapter Two.