The second goal of this book is thus to enableyou to invent the Master Algorithm. YouвЂ™d think this would require heavy-duty mathematics and severe theoretical work. On the contrary, what it requires is stepping back from the mathematical arcana to see the overarching pattern of learning phenomena; and for this the layman, approaching the forest from a distance, is in some ways better placed than the specialist, already deeply immersed in the study of particular trees. Once we have the conceptual solution, we can fill in the mathematical details; but that is not for this book, and not the most important part. Thus, as we visit each tribe, our goal is to gather its piece of the puzzle and understand where it fits, mindful that none of the blind men can see the whole elephant. In particular, weвЂ™ll see what each tribe can contribute to curing cancer, and also what itвЂ™s missing. Then, step-by-step, weвЂ™ll assemble all the pieces into the solution-or rather,a solution that is not yet the Master Algorithm, but is the closest anyone has come, and hopefully makes a good launch pad for your imagination. And weвЂ™ll preview the use of this algorithm as a weapon in the fight against cancer. As you read the book, feel free to skim or skip any parts you find troublesome; itвЂ™s the big picture that matters, and youвЂ™ll probably get more out of those parts if you revisit them after the puzzle is assembled.. A related, frequently heard objection isвЂњData canвЂ™t replace human intuition.вЂќ In fact, itвЂ™s the other way around: human intuition canвЂ™t replace data. Intuition is what you use when you donвЂ™t know the facts, and since you often donвЂ™t, intuition is precious. But when the evidence is before you, why would you deny it? Statistical analysis beats talent scouts in baseball (as Michael Lewis memorably documented inMoneyball), it beats connoisseurs at wine tasting, and every day we see new examples of what it can do. Because of the influx of data, the boundary between evidence and intuition is shifting rapidly, and as with any revolution, entrenched ways have to be overcome. If IвЂ™m the expert on X at company Y, I donвЂ™t like to be overridden by some guy with data. ThereвЂ™s a saying in industry: вЂњListen to your customers, not to the HiPPO,вЂќ HiPPO being short for вЂњhighest paid personвЂ™s opinion.вЂќ If you want to be tomorrowвЂ™s authority, ride the data, donвЂ™t fight it.. The Master AlgorithmвЂ™s impact on technology will not be limited to AI. A universal learner is a phenomenal weapon against the complexity monster. Systems that today are too complex to build will no longer be. Computers will do more with less help from us. They will not repeat the same mistakes over and over again, but learn with practice, like people do. Sometimes, like the butlers of legend, theyвЂ™ll even guess what we want before we express it. If computers make us smarter, computers running the Master Algorithm will make us feel like geniuses. Technological progress will noticeably speed up, not just in computer science but in many different fields. This in turn will add to economic growth and speed povertyвЂ™s decline. With the Master Algorithm to help synthesize and distribute knowledge, the intelligence of an organization will be more than the sum of its parts, not less. Routine jobs will be automated and replaced by more interesting ones. Every job will be done better than it is today, whether by a better-trained human, a computer, or a combination of the two. Stock-market crashes will be fewer and smaller. With a fine grid of sensors covering the globe and learned models to make sense of its output moment by moment, we will no longer be flying blind; the health of our planet will take a turn for the better. A model of you will negotiate the world on your behalf, playing elaborate games with other peopleвЂ™s and entitiesвЂ™ models. And as a result of all this, our lives will be longer, happier, and more productive.. In the meantime, the practical consequence of theвЂњno free lunchвЂќ theorem is that thereвЂ™s no such thing as learning without knowledge. Data alone is not enough. Starting from scratch will only get you to scratch. Machine learning is a kind of knowledge pump: we can use it to extract a lot of knowledge from data, but first we have to prime the pump.. In the meantime, one important application of inverse deduction is predicting whether new drugs will have harmful side effects. Failure during animal testing and clinical trials is the main reason new drugs take many years and billions of dollars to develop. By generalizing from known toxic molecular structures, we can form rules that quickly weed out many apparently promising compounds, greatly increasing the chances of successful trials on the remaining ones.. The symbolistsвЂ™ core belief is that all intelligence can be reduced to manipulating symbols. A mathematician solves equations by moving symbols around and replacing symbols by other symbols according to predefined rules. The same is true of a logician carrying out deductions. According to this hypothesis, intelligence is independent of the substrate; it doesnвЂ™t matter if the symbol manipulations are done by writing on a blackboard, switching transistors on and off, firing neurons, or playing with Tinkertoys. If you have a setup with the power of a universal Turing machine, you can do anything. Softwarecan be cleanly separated from hardware, and if your concern is figuring out how machines can learn, you (thankfully) donвЂ™t need to worry about the latter beyond buying a PC or cycles on AmazonвЂ™s cloud.. Suppose a perceptron has two continuous inputsx andy. (In other words,x andy can take on any numeric values, not just 0 and 1.) Then each example can be represented by a point on the plane, and the boundary between positive examples (for which the perceptron outputs 1) and negative ones (output 0) is a straight line:. The optimal weight, where the error is lowest, is 2.0. If the network starts out with a weight of 0.75, for example, backprop will get to the optimum in a few steps, like a ball rolling downhill. But if it starts at 5.5, on the other hand, backprop will roll down to 7.0 and remain stuck there. Backprop, with its incremental weight changes, doesnвЂ™t know how to find the global error minimum, and local ones can be arbitrarily bad, like mistaking your grandmother for a hat. With one weight, you could try every possible value at increments of 0.01 and find the optimum that way. But with thousands of weights, let alone millions or billions, this is not an option because the number of points on the grid goes up exponentially with the number of weights. The global minimum is hidden somewhere in the unfathomable vastness of
hyperspace-and good luck finding it.. Compared to the simple model in FisherвЂ™s book, genetic algorithms are quite a leap forward. Darwin lamented his lack of mathematical ability, but if he had lived a century later he probably would have yearned for programming prowess instead. Indeed, capturing natural selection by a set of equations is extremely difficult, but expressing it as an algorithm is another matter, and can shed light on many otherwise vexing questions. Why do species appear suddenly in the fossil record? WhereвЂ™s the evidence that they evolved gradually from earlier species? In 1972, Niles Eldredge and Stephen Jay Gould proposed that evolution consists of a series of вЂњpunctuated equilibria,вЂќ alternating long periods of stasis with short bursts of rapid change, like the Cambrian explosion. This sparked a heated debate, with critics of the theory nicknaming it вЂњevolution by jerksвЂќ and Eldredge and Gould retorting that gradualism is вЂњevolution by creeps.вЂќ Experience with genetic algorithms lends support to the jerks. If you run a genetic algorithm for one hundred thousand generations and observe the population at one-thousand-generation intervals, the graph of fitness against time will probably look like an uneven staircase, with sudden improvements followed by flat periods that tend to become longer over time. ItвЂ™s also not hard to see why. Once the algorithm reaches a local maximum of fitness-a peak in the fitness landscape-it will stay there for a long time until a lucky mutation or crossover lands an individual on the slope to a higher peak, at which point that individual will multiply and climb up the slope with each passing generation. And the higher the current peak, the longer before that happens. Of course, natural evolution is more complicated than this: for one, the environment may change, either physically or because other organisms have themselves evolved, and an organism that was on a fitness peak may suddenly find itself under pressure to evolve again. So, while helpful, current genetic algorithms are far from the end of the story.. You could even use NaГЇve Bayes, tongue-in-cheek, on a much larger scale than GoogleвЂ™s: to model the whole universe. Indeed, if you believe in an omnipotent God, then you can model the universe as a vast NaГЇve Bayes distribution where everything that happens is independent given GodвЂ™s will. The catch, of course, is that we canвЂ™t read GodвЂ™s mind, but in Chapter 8 weвЂ™ll investigate how to learn NaГЇve Bayes models even when we donвЂ™t know the classes of the examples.. In general, we have to deal with many constraints at once (one per example, in the case of SVMs). Suppose you wanted to get as close as possible to the North Pole but couldnвЂ™t leave your room. Each of the roomвЂ™s four walls is a constraint, and the solution is to follow the compass until you bump into the corner where the northeast and northwest walls meet. We say that these two walls are the active constraints because theyвЂ™re what prevents you from reaching the optimum, namely the North Pole. If your room has a wall facing exactly north, thatвЂ™s the sole active constraint, and the solution is a point in the middle of it. And if youвЂ™re Santa and your room is already over the North Pole, all constraints are inactive, and you can just sit there pondering the optimal toy distribution problem instead. (Traveling salesmen have it easy compared to Santa.) In an SVM, the active constraints are the support vectors since their margin is already the smallest itвЂ™s allowed to be; moving the frontier would violate one or more constraints. All other examples are irrelevant, and their weight is zero.. Robby doesnвЂ™t have the benefit of your highly evolved visual system, though, so if you want him to go fetch your dry cleaning from Elite Cleaners and you only allow his map of Palo Alto to have one coordinate, he needs an algorithm to вЂњdiscoverвЂќ University Avenue from the GPS coordinates of the shops. The key to this is to notice that, if you put the origin of thex,y plane at the average of the shopsвЂ™ locations and slowly rotate the axes, the shops are closest to thex axis when youвЂ™ve turned it by about 60 degrees, that is, when it lines up with University Avenue:. The doctor will see you now. Your digital future begins with a realization: every time you interact with a computer-whether itвЂ™s your smart phone or a server thousands of miles away-you do so on two levels. The first one is getting what you want there and then: an answer to a question, a product you want to buy, a new credit card. The second level, and in the long run the most important one, is teaching the computer about you. The more you teach it, the better it can serve you-or manipulate you. Life is a game between you and the learners that surround you. You can refuse to play, but then youвЂ™ll have to live a twentieth-century life in the twenty-first. Or you can play to win. What model of you do you want the computer to have? And what data can you give it that will produce that model? Those two questions should always be in the back of your mind whenever you interact with a learning algorithm-as they are when you interact with other people. Alice knows that Bob has a mental model of her and seeks to shape it through her behavior. If Bob is her boss, she tries to come across as competent, loyal, and hardworking. If instead Bob is someone sheвЂ™s trying to seduce, sheвЂ™ll be at her most seductive. We could hardly function in society without this ability to intuit and respond to whatвЂ™s on other peopleвЂ™s minds. The novelty in the world today is that computers, not just people, are starting to have theories of mind. Their theories are still primitive, but theyвЂ™re evolving quickly, and theyвЂ™re what we have to work with to get what we want-no less than with other people. And soyou need a theory ofthe computerвЂ™s mind, and thatвЂ™s what the Master Algorithm provides, after plugging in the score function (what you think the learnerвЂ™s goals are, or more precisely its ownerвЂ™s) and the data (what you think it knows).. In the Land of Learning where the Data lies..