Computers write their own programs. Now thatвЂ™s a powerful idea, maybe even a little scary. If computers start to program themselves, how will we control them? Turns out we can control them quite well, as weвЂ™ll see. A more immediate objection is that perhaps this sounds too good to be true. Surely writing algorithms requires intelligence,creativity, problem-solving chops-things that computers just donвЂ™t have? How is machine learning distinguishable from magic? Indeed, as of today people can write many programs that computers canвЂ™t learn. But, more surprisingly, computers can learn programs that people canвЂ™t write. We know howto drive cars and decipher handwriting, but these skills are subconscious; weвЂ™re not able to explain to a computer how to do these things. If we give a learner a sufficient number of examples of each, however, it will happily figure out how to do them on its own, at which point we can turn it loose. ThatвЂ™s how the post office reads zip codes, and thatвЂ™s why self-driving cars are on the way.. SoвЂ¦ what shall it be? Date or no date? Is there a pattern that distinguishes the yeses from the nos? And, most important, what does that pattern say about today?. Discovering rules in this way was the brainchild of Ryszard Michalski, a Polish computer scientist. MichalskiвЂ™s hometown of Kalusz was successively part of Poland, Russia, Germany, and Ukraine, which may have left him more attuned than most to disjunctive concepts. After immigrating to the United States in 1970, he went on to found the symbolist school of machine learning, along with Tom Mitchell and Jaime Carbonell. He had an imperious personality. If you gave a talk at a machine-learning conference, the odds were good that at the end heвЂ™d raise his hand to point out that you had just rediscovered one of his old ideas.. The deeper problem, however, is that most learners start out knowing too little, and no amount of knob-twiddling will get them to the finish line. Without the guidance of an adult brainвЂ™s worth of knowledge, they can easily go astray. Even though itвЂ™s what most learners do, just assuming you know the form of the truth (for example, that itвЂ™s a small set of rules) is not much to hang your hat on. A strict empiricist would say that thatвЂ™s all a newborn has, encoded in her brainвЂ™s architecture, and indeed children overfit more than adults do, but we would like to learn faster than a child does. (Eighteen years is a long time, and thatвЂ™s not counting college.) The Master Algorithm should be able to start with a large body of knowledge, whether it was provided by humans or learned in previous runs, and use it to guide new generalizations from data. ThatвЂ™s what scientists do, and itвЂ™s as far as it gets from a blank slate. The вЂњdivide and conquerвЂќ rule induction algorithm canвЂ™t do it, but thereвЂ™s another way to learn rules that can.. Robotic Park doesnвЂ™t exist yet, but it may someday. I suggested it as a thought experiment at a DARPA workshop a few years ago, and one of the military brass present said matter-of-factly, вЂњThatвЂ™s feasible.вЂќ His willingness might seem less startling if you consider that the army already runs a full-blown mockup of an Afghan village in the California desert, complete with villagers, for training its troops, and a few billion dollars would be a small price to pay for the ultimate soldier.. Evolution searches for good structures, and neural learning fills them in: this combination is the easiest of the steps weвЂ™ll take toward the Master Algorithm. This may come as a surprise to anyone familiar with the never-ending twists and turns of the nature versus nurture controversy, 2,500 years old and still going strong. Seeing life through the eyes of a computer clarifies a lot of things, however. вЂњNatureвЂќfor a computer is the program it runs, and вЂњnurtureвЂќ is the data it gets. The question of which one is more important is clearly absurd; thereвЂ™s no output without both program and data, and itвЂ™s not like the output is, say, 60 percent caused by the program and 40 percent by the data. ThatвЂ™s the kind of linear thinking that a familiarity with machine learning immunizes you against.. Replacecause byA andeffect byB and omit the multiplication sign for brevity, and you get the ten-foot formula in the cathedral.. Everything is connected, but not directly. Is there anything analogy canвЂ™t do? Not according to Douglas Hofstadter, cognitive scientist and author ofGГ¶del, Escher, Bach: An Eternal Golden Braid. Hofstadter, who looks a bit like the GrinchвЂ™s good twin, is probably the worldвЂ™s best-known analogizer. In their bookSurfaces and Essences: Analogy as the Fuel and Fire of Thinking, Hofstadter and his collaborator Emmanuel Sander argue passionately that all intelligent behavior reduces to analogy. Everything we learn or discover, from the meaning of everyday words likemother andplay to the brilliant insights of geniuses like Albert Einstein andГ‰variste Galois, is the result of analogy in action. When little Tim sees women looking after other children like his mother looks after him, he generalizes the concept вЂњmommyвЂќ to mean anyoneвЂ™s mommy, not just his. That in turn is a springboard for understanding things like вЂњmother shipвЂќand вЂњMother Nature.вЂќ EinsteinвЂ™s вЂњhappiest thought,вЂќ out of which grew the general theory of relativity, was an analogy between gravity and acceleration: if youвЂ™re in an elevator, you canвЂ™t tell whether your weight is due to one or the other because their effects are the same. We swim in a vast ocean of analogies, which we both manipulate for our ends and are unwittingly manipulated by. Books have analogies on every page (like the title of this section, or the previous oneвЂ™s).GГ¶del, Escher, Bach is an extended analogy between GГ¶delвЂ™s theorem, EscherвЂ™s art, and BachвЂ™s music. If the Master Algorithm is not analogy, it must surely be something like it..
Suppose I give you the GPS coordinates of all the shops in Palo Alto, California, and you plot a few of them on a piece of paper:. Chunking and reinforcement learning are not as widely used in business as supervised learning, clustering, or dimensionality reduction, but a simpler type of learning by interacting with the environment is: learning the effects of your actions (and acting accordingly). If the background color of your e-commerce siteвЂ™s home page is currently blue and youвЂ™re wondering whether making it red would increase sales, try it out on a hundred thousand randomly chosen customers and compare the results with those of the regular site. This technique, called A/B testing, was at first used mainly in drug trials but has since spread to many fields where data can be gathered on demand, from marketing to foreign aid. It can also be generalized to try many combinations of changes at once, without losing track of which changes lead to which gains (or losses). Companies like Amazon and Google swear by it; youвЂ™ve probably participated in thousands of A/B tests without realizing it. A/B testing gives the lie to the oft-heard criticism that big data is only good for finding correlations, not causation. Philosophical fine points aside, learning causality is learning the effects of your actions, and anyone with a stream of data they can affect can do it-from a one-year-old splashing around in the bathtub to a president campaigning for reelection.. Relational learners can generalize from one network to another (e.g., learn a model of how flu spreads in Atlanta and apply it in Boston). They can also learn on more than one network (e.g., Atlanta and Boston, assuming, unrealistically, that no one in Atlanta is ever in contact with anyone in Boston). But unlikeвЂњregularвЂќ learning, where all examples must have exactly the same number of attributes, in relational learning networks can vary in size; a larger network will just have more instances of the same templates than a smaller one. Of course, the generalization from a smaller network to a larger onemay or may not be accurate, but the point is that nothing prevents it; and large networks often do behave locally like small ones.. All this power comes at a cost, however. In an ordinary classifier, such as a decision tree or a perceptron, inferring an entityвЂ™s class from its attributes is a matter of a few lookups and a bit of arithmetic. In a network, each nodeвЂ™s class depends indirectly on all the othersвЂ™, and we canвЂ™t infer it in isolation. We can resort to the same kinds of inference techniques we used for Bayesian networks, like loopy belief propagation or MCMC, but the scale is different. A typical Bayesian network has perhaps thousands of variables, but a typical social network has millions of nodes or more. Luckily, because the model of the network consists of many repetitions of the same features with the same weights, we can often condense the network into вЂњsupernodes,вЂќ each consisting of many nodes that we know will have the same probabilities, and solve a much smaller problem with the same result.. After a long dayвЂ™s journey, the sun is rapidly nearing the horizon, and you need to hurry before it gets dark. The cityвЂ™s outer wall has five massive gates, each controlled by one of the tribes and leading to its district in Optimization Town. Let us enter through the Gradient Descent Gate, after whispering the watchword-вЂњdeep learningвЂќ-to the guard, and spiral in toward the Towers of Representation. From the gate the street ascends steeply up the hill to the citadelвЂ™s Squared Error Gate, but instead you turn left toward the evolutionary sector. The houses in the gradient descent district are all smooth curves and densely intertwined patterns, almost more like a jungle than a city. But when gradient descent gives way to genetic search, the picture changes abruptly. Here the houses rise higher, with structure piled on structure, but the structures are spare, almost vacant, as if waiting to be filled in by gradient descentвЂ™s curves. ThatвЂ™s it: the way to combine the two is to use genetic search to find the structure of the model and let gradient descent fill in its parameters. This is what nature does: evolution creates brain structures, and individual experience modulates them.. Take a moment to consider all the data about you thatвЂ™s recorded on all the worldвЂ™s computers: your e-mails, Office docs, texts, tweets, and Facebook and LinkedIn accounts; your web searches, clicks, downloads, and purchases; your credit, tax, phone, and health records; your Fitbit statistics; your driving as recorded by your carвЂ™s microprocessors; your wanderings as recorded by your cell phone; all the pictures of you ever taken; brief cameos on security cameras; your Google Glass snippets-and so on and so forth. If a future biographer had access to nothing but this вЂњdata exhaustвЂќ of yours, what picture of you would he form? Probablya quite accurate and detailed one in many ways, but also one where some essential things would be missing. Why did you, one beautiful day, decide to change careers? Could the biographer have predicted it ahead of time? What about that person you met one day and secretly never forgot? Could the biographer wind back through the found footage and say вЂњAh, thereвЂќ?.