Supercharging the scientific method. The most important argument for the brain being the Master Algorithm, however, is that itвЂ™s responsible for everything we can perceive and imagine. If something exists but the brain canвЂ™t learn it, we donвЂ™t know it exists. We may just not see it or think itвЂ™s random. Either way, if we implement the brain in a computer, that algorithm can learn everything we can. Thus one route-arguably the most popular one-to inventing the Master Algorithm is to reverse engineer the brain. Jeff Hawkins took a stab at this in his bookOn Intelligence. Ray Kurzweil pins his hopes for the Singularity-the rise of artificial intelligence that greatly exceeds the human variety-on doing just that and takes a stab at it himself in his bookHow to Create a Mind. Nevertheless, this is only one of several possible approaches, as weвЂ™ll see. ItвЂ™s not even necessarily the most promising one, because the brain is phenomenally complex, and weвЂ™re still in the very early stages of deciphering it. On the other hand, if we canвЂ™t figure out the Master Algorithm, the Singularity wonвЂ™t happen any time soon.. Accuracy you can believe in. Learning a perceptronвЂ™s weights means varying the direction of the straight line until all the positive examples are on one side and all the negative ones on the other. In one dimension, the boundary is a point; in two, itвЂ™s a straight line; in three, itвЂ™s a plane; and in more than three, itвЂ™s a hyperplane. ItвЂ™s hard to visualize things in hyperspace, but the math works just the same way. Inn dimensions, we haven inputs and the perceptron hasn weights. To decide whether the perceptron fires or not, we multiply each weight by the corresponding input and compare the sum of all of them with the threshold.. The path to optimal learning begins with a formula that many people have heard of: BayesвЂ™ theorem. But here weвЂ™ll see it in a whole new light and realize that itвЂ™s vastly more powerful than youвЂ™d guess from its everyday uses. At heart, BayesвЂ™ theorem is just a simple rule for updating your degree of belief in a hypothesis when you receive new evidence: if the evidence is consistent with the hypothesis, the probability of the hypothesis goes up; if not, it goes down. For example, if you test positive for AIDS, your probability of having it goes up. Things get more interesting when you have many pieces of evidence, such as the results of multiple tests. To combine them all without suffering a combinatorial explosion, we need to make simplifying assumptions. Things get even more interesting when we consider many hypotheses at once, such as all the different possible diagnoses for a patient. Computing the probability of each disease from the patientвЂ™s symptoms in areasonable amount of time can take a lot of smarts. Once we know how to do all these things, weвЂ™ll be ready to learn the Bayesian way. For Bayesians, learning is вЂњjustвЂќ another application of BayesвЂ™ theorem, with whole models as the hypotheses and the data as the evidence: as you see more data, some models become more likely and some less, until ideally one model stands out as the clear winner. Bayesians have invented fiendishly clever kinds of models. So letвЂ™s get started.. After pioneering the application of machine learning to spam filtering, David Heckerman turned to using Bayesian networks in the fight against AIDS. The AIDS virus is a tough adversary because it mutates rapidly, making it difficult for any one vaccine or drug to pin it down for long. Heckerman noticed that this is the same cat-and-mouse game that spam filters play with spam and decided to apply a lesson he had learned there: attack the weakest link. In the case of spam, weak links include the URLs you have to use to take payment from the customer. In the case of HIV, theyвЂ™re small regions of the virus protein that canвЂ™t change without hurting the virus. If he could train the immune system to recognize these regions and attack the cells displaying them, he just might have an AIDS vaccine. Heckerman and coworkers used a Bayesian network to help identify the vulnerable regions and developed a vaccine delivery mechanism that could teach the immune system to attack just those regions. The delivery mechanism worked in mice, and clinical trials are now in preparation.. Analogizers took this line of reasoning to its logical conclusion, as weвЂ™ll see in the next chapter. In the first decade of the new millennium, they in turn took over NIPS. Now the connectionists dominate once more, under the banner of deep learning. Some say that research goes in cycles, but itвЂ™s more like a spiral, with loops winding around the direction of progress. In machine learning, the spiral converges to the Master Algorithm.. YouвЂ™d think that Bayesians and symbolists would get along great, given that they both believe in a first-principles approach to learning, rather than a nature-inspired one. Far from it. Symbolists donвЂ™t like probabilities and tell jokes like вЂњHow many Bayesians does it take to change a lightbulb? TheyвЂ™re not sure. Come to think of it, theyвЂ™re not sure the lightbulb is burned out.вЂќ More seriously, symbolists point to the high price we pay for probability. Inference suddenly becomes a lot more expensive, all those numbers are hard to understand, we have to deal with priors, and hordes of zombie hypotheses chase us around forever. The ability to compose pieces of knowledge on the fly, so dear to symbolists, is gone. Worst of all, we donвЂ™t know how to put probability distributions on many of the things we need to learn. A Bayesian network is a distribution over a vector of variables, but what about distributions over networks, databases, knowledge bases, languages, plans, and computer programs, to name a few? All of these are easily handled in logic, and an algorithm that canвЂ™t learn them is clearly not the Master Algorithm.. Of course, thereвЂ™s a price to pay, and the price comes at test time. Jane User has just uploaded a new picture. Is it a face? Nearest-neighborвЂ™s answer is: find the picture most similar to it
in FacebookвЂ™s entire database of labeled photos-its вЂњnearest neighborвЂќ-and if that picture contains a face, so does this one. Simple enough, but now you have to scan through potentially billions of photos in (ideally) a fraction of a second. Like a lazy student who doesnвЂ™t bother to study for the test, nearest-neighbor is caught unprepared and has to scramble. But unlike real life, where your mother taught you to never leave until tomorrow what you can do today, in machine learning procrastination can really pay off. In fact, the entire genre of learning that nearest-neighbor is part of is sometimes called вЂњlazy learning,вЂќ and in this context thereвЂ™s nothing pejorative about the term.. And so we have traveled through the territories of the five tribes, gathering their insights, negotiating the border crossings, wondering how the pieces might fit together. We know immensely more now than when we started out. But something is still missing. ThereвЂ™s a gaping hole in the center of the puzzle, making it hard to see the pattern. The problem is that all the learners weвЂ™ve seen so far need a teacher to tell them the right answer. They canвЂ™t learn to distinguish tumor cells from healthy ones unless someone labels them вЂњtumorвЂќ or вЂњhealthy.вЂќ But humans can learn without a teacher; they do it from the day theyвЂ™re born. Like Frodo at the gates of Mordor, our long journey will have been in vain if we donвЂ™t find a way around this barrier. But there is a path past the ramparts and the guards, and the prize is near. Follow meвЂ¦. The question, of course, is what algorithm should be running in RobbyвЂ™s brain at birth. Researchers influenced by child psychology look askance at neural networks because the microscopic workings of a neuron seem a million miles from the sophistication of even a childвЂ™s most basic behaviors, like reaching for an object, grasping it, and inspecting it with wide, curious eyes. We need to model the childвЂ™s learning at a higher level of abstraction, lest we miss the planet for the trees. Above all, even though children certainly get plenty of help from their parents, they learn mostly on their own, without supervision, and thatвЂ™s what seems most miraculous. None of the algorithms weвЂ™ve seen so far can do it, but weвЂ™re about to see several that can-bringing us one step closer to the Master Algorithm.. Finally, we can turn Alchemy into a metalearner like stacking by encoding the individual classifiers as MLNs and adding or learning formulas to combine them. This is what DARPA did in its PAL project. PAL, the Personalized Assistant that Learns, was the largest AI project in DARPA history and the progenitor of Siri. PALвЂ™s goal was to build an automated secretary. It used Markov logic as its overarching representation, combining the outputs from different modules into the final decisions on what to do. This also allowed PALвЂ™s modules to learn from each other by evolving toward a consensus.. CHAPTER TEN: This Is the World on Machine Learning. Your digital future begins with a realization: every time you interact with a computer-whether itвЂ™s your smart phone or a server thousands of miles away-you do so on two levels. The first one is getting what you want there and then: an answer to a question, a product you want to buy, a new credit card. The second level, and in the long run the most important one, is teaching the computer about you. The more you teach it, the better it can serve you-or manipulate you. Life is a game between you and the learners that surround you. You can refuse to play, but then youвЂ™ll have to live a twentieth-century life in the twenty-first. Or you can play to win. What model of you do you want the computer to have? And what data can you give it that will produce that model? Those two questions should always be in the back of your mind whenever you interact with a learning algorithm-as they are when you interact with other people. Alice knows that Bob has a mental model of her and seeks to shape it through her behavior. If Bob is her boss, she tries to come across as competent, loyal, and hardworking. If instead Bob is someone sheвЂ™s trying to seduce, sheвЂ™ll be at her most seductive. We could hardly function in society without this ability to intuit and respond to whatвЂ™s on other peopleвЂ™s minds. The novelty in the world today is that computers, not just people, are starting to have theories of mind. Their theories are still primitive, but theyвЂ™re evolving quickly, and theyвЂ™re what we have to work with to get what we want-no less than with other people. And soyou need a theory ofthe computerвЂ™s mind, and thatвЂ™s what the Master Algorithm provides, after plugging in the score function (what you think the learnerвЂ™s goals are, or more precisely its ownerвЂ™s) and the data (what you think it knows).. If this book whetted your appetite for machine learning and the issues surrounding it, youвЂ™ll find many suggestions in this section. Its aim is not to be comprehensive but to provide an entrance to machine learningвЂ™s garden of forking paths (as Borges put it). Wherever possible, I chose books and articles appropriate for the general reader. Technical publications, which require at least some computational, statistical, or mathematical background, are marked with an asterisk (*). Even these, however, often have large sections accessible to the general reader. I didnвЂ™t list volume, issue, or page numbers, since the web renders them superfluous; likewise for publishersвЂ™ locations..