The argument from statistics. In the early days of AI, machine learning seemed like the obvious path to computers with humanlike intelligence; Turing and others thought it was theonly plausible path. But then the knowledge engineers struck back, and by 1970 machine learning was firmly on the back burner. For a moment in the 1980s, it seemed like knowledge engineering was about to take over the world, with companies and countries making massive investments in it. But disappointment soon set in, and machine learning began its inexorable rise, at first quietly, and then riding a roaring wave of data.. вЂњNo matter how smart your algorithm, there are some things it just canвЂ™t learn.вЂќ Outside of AI and cognitive science, the most common objections to machine learning are variants of this claim. Nassim Taleb hammered on it forcefully in his bookThe Black Swan. Some events are simply not predictable. If youвЂ™ve only ever seen white swans, you think the probability of ever seeing a black one is zero. The financial meltdown of 2008 was a вЂњblack swan.вЂќ. One such rule is:If Socrates is human, then heвЂ™s mortal. This does the job, but is not very useful because itвЂ™s specific to Socrates. But now we apply NewtonвЂ™s principle and generalize the rule to all entities:If an entity is human, then itвЂ™s mortal. Or, more succinctly:All humans are mortal. Of course, it would be rash to induce this rule from Socrates alone, but we know similar facts about other humans:. Backprop was invented in 1986 by David Rumelhart, a psychologist at the University of California, San Diego, with the help of Geoff Hinton and Ronald Williams. Among other things, they showed that backprop can learn XOR, enabling connectionists to thumb their noses at Minsky and Papert. Recall the Nike example: young men and middle-aged women are the most likely buyers of Nike shoes. We can represent this with a network of three neurons: one that fires when it sees a young male, another that fires when it sees a middle-aged female, and another that fires when either of those does. And with backprop we can learn the appropriate weights, resulting in a successful Nike prospect detector. (So there, Marvin.). [РљР°СЂС‚РёРЅРєР°: pic_14.jpg]. CHAPTER SEVEN: You Are What You Resemble. Decision trees are not immune to the curse of dimensionality either. LetвЂ™s say the concept youвЂ™re trying to learn is a sphere: points inside it are positive, and points outside it are negative. A decision tree can approximate a sphere by the smallest cube it fits inside. Not perfect, but not too bad either: only the corners of the cube get misclassified. But in high dimensions, almost the entire volume of the hypercube lies outside the hypersphere. For every example you correctly classify as positive, you incorrectly classify many negative ones as positive, causing your accuracy to plummet.. In fact, no learner is immune to the curse of dimensionality. ItвЂ™s the second worst problem in machine learning, after overfitting. The termcurse of dimensionality was coined by Richard Bellman, a control theorist, in the fifties. He observed that control algorithms that worked fine in three dimensions became hopelessly inefficient in higher-dimensional spaces, such as when you want to control every joint in a robot arm or every knob in a chemical plant. But in machine learning the problem is more than just computational cost-itвЂ™s that learning itself becomes harder and harder as the dimensionality goes up.. Humans do have one constant guide: their emotions. We seek pleasure and avoid pain. When you touch a hot stove, you instinctively recoil. ThatвЂ™s the easy part. The hard part is learning not to touch the stove in the first place. That requires moving to avoid a sharp pain that you have not yet felt. Your brain does this by associating the pain not just with the moment you touch the stove, but with the actions leading up to it. Edward Thorndike called this the law of effect: actions that lead to pleasure are more likely to be repeated in the future; actions that lead to pain, less so. Pleasure travels back through time, so to speak, and actions can eventually become associated with effects that are quite remote from them. Humans can do this kind of long-range reward seeking better than any other animal, and itвЂ™s crucial to our success. In a famous experiment, children were presented with a marshmallow and told that if they resisted eating it for a few minutes, they could have two. The ones who succeeded went on to do better in school and adult life. Perhaps less obviously, companies using machine learning to improve their websites or their business practices face a similar problem. A company may make a change that brings in more revenue in the short term-like selling an inferior product that costs less to make for the same price as the original superior product-but miss seeing that doing this will lose customers in the
longer term.. Rosenbloom and Newell set their chunking program to work on a series of problems, measured the time it took in each trial, and lo and behold, out popped a series of power law curves. But that was only the beginning. Next they incorporated chunking into Soar, a general theory of cognition that Newell had been working on with John Laird, another one of his students. Instead of working only within a predefined hierarchy of goals, the Soar program could define and solve a new subproblem every time it hit a snag. Once it formed a new chunk, Soar generalized it to apply to similar problems, in a manner similar to inverse deduction. Chunking in Soar turned out to be a good model of lots of learning phenomena besides the power law of practice. It could even be applied to learning new knowledge by chunking data and analogies. This led Newell, Rosenbloom, and Laird to hypothesize that chunking is theonly mechanism needed for learning-in other words, the Master Algorithm.. For those of us who are not keen on online dating, a more immediately useful notion is to choose which interactions to record and where. If you donвЂ™t want your Christmas shopping to leave Amazon confused about your tastes, do it on other sites. (Sorry, Amazon.) If you watch different kinds of videos at home and for work, keep two accounts on YouTube, one for each, and YouTube will learn to make the corresponding recommendations. And if youвЂ™re about to watch some videos of a kind that you ordinarily have no interest in, log out first. Use ChromeвЂ™s incognito mode not for guilty browsing (which youвЂ™d never do, of course) but for when you donвЂ™t want the current session to influence future personalization. On Netflix, adding profiles for the different people using your account will spare you R-rated recommendations on family movie night. If you donвЂ™t like a company, click on their ads: this will not only waste their money now, but teach Google to waste it again in the future by showing the ads to people who are unlikely to buy the products. And if you have very specific queries that you want Google to answer correctly in the future, take a moment to trawl through the later results pages for the relevant links and click on them. More generally, if a system keeps recommending the wrong things to you, try teaching it by finding and clicking on a bunch of the right ones and come back later to see if it did.. The second kind of data should also be unproblematic, but it isnвЂ™t because it overlaps with the third. You share updates and pictures with your friends on Facebook, and they with you. But everyone shares their updates and pictures with Facebook. Lucky Facebook: it has a billion friends. Day by day, it learns a lot more about the world than any one person does. It would learn even more if it had better algorithms, and they are getting better every day, courtesy of us data scientists. FacebookвЂ™s main use for all this knowledge is to target ads to you. In return, it provides the infrastructure for your sharing. ThatвЂ™s the bargain you make when you use Facebook. As its learning algorithms improve, it gets more and more value out of the data, and some of that value returns to you in the form of more relevant ads and better service. The only problem is that Facebook is also free to do things with the data and the models that are not in your interest, and you have no way to stop it.. вЂњThe unreasonable effectiveness of data,вЂќ by Alon Halevy, Peter Norvig, and Fernando Pereira (IEEE Intelligent Systems, 2009), argues for machine learning as the new discovery paradigm. BenoГ®t Mandelbrot explores the fractal geometry of nature in the eponymous book* (Freeman, 1982). James GleickвЂ™sChaos (Viking, 1987) discusses and depicts the Mandelbrot set. The Langlands program, a research effort that seeks to unify different subfields of mathematics, is described inLove and Math, by Edward Frenkel (Basic Books, 2014).The Golden Ticket, by Lance Fortnow (Princeton University Press, 2013), is an introduction to NP-completeness and the P = NP problem.The Annotated Turing,* by Charles Petzold (Wiley, 2008), explains Turing machines by revisiting TuringвЂ™s original paper on them.. Steven Pinker summarizes the symbolistsвЂ™ criticisms of connectionist models in Chapter 2 ofHow the Mind Works (Norton, 1997). Seymour Papert gives his take on the debate inвЂњOne AI or Many?вЂќ (Daedalus, 1988).The Birth of the Mind, by Gary Marcus (Basic Books, 2004), explains how evolution could give rise to the human brainвЂ™s complex abilities..