As you drive to work, your car continually adjusts fuel injection and exhaust recirculation to get the best gas mileage. You use Inrix, a traffic prediction system, to shorten your rush-hour commute, not to mention lowering your stress level. At work, machine learning helps you combat information overload. You use a data cube to summarize masses of data, look at it from every angle, and drill down on the most important bits. You have a decision to make: Will layout A or B bring more business to your website? A web-learning system tries both out and reports back. You need to check out a potential supplierвЂ™s website, but itвЂ™s in a foreign language. No problem: Google automatically translates it for you. Your e-mail conveniently sorts itself into folders, leaving only the most important messages in the inbox. Your word processor checks your grammar and spelling. You find a flight for an upcoming trip, but hold off on buying the ticket because Bing Travel predicts its price will go down soon. Without realizing it, you accomplish a lot more, hour by hour, than you would without the help of machine learning.. Cyberwar is an instance of asymmetric warfare, where one side canвЂ™t match the otherвЂ™s conventional military power but can still inflict grievous damage. A handful of terrorists armed with little more than box cutters can knock down the Twin Towers and kill thousands of innocents. All the biggest threats to US security today are in the realm of asymmetric warfare, and thereвЂ™s an effective weapon against all of them: information. If the enemy canвЂ™t hide, he canвЂ™t survive. The good news is that we have plenty of information, and thatвЂ™s also the bad news.. For example, consider NaГЇve Bayes, a learning algorithm that can be expressed as a single short equation. Given a database of patient records-their symptoms, test results, and whether or not they had some particular condition-NaГЇve Bayes can learn to diagnose the condition in a fraction of a second, often better than doctors who spent many years in medical school. It can also beat medical expert systems that took thousands of person-hours to build. The same algorithm is widely used to learn spam filters, a problem that at first sight has nothing to do with medical diagnosis. Another simple learner, called the nearest-neighbor algorithm, has been used for everything from handwriting recognition to controlling robot hands to recommending books and movies you might like. And decision tree learners are equally apt at deciding whether your credit-card application should be accepted, finding splice junctions in DNA, and choosing the next move in a game of chess.. More generally, Chomsky is critical of all statistical learning. He has a list of things statistical learners canвЂ™t do, but the list is fifty years out of date. Chomsky seems to equate machine learning with behaviorism, where animal behavior is reduced to associating responses with rewards. But machine learning is not behaviorism. Modern learning algorithms can learn rich internal representations, not just pairwise associations between stimuli.. Our quest will take us across the territory of each of the five tribes. The border crossings, where they meet, negotiate and skirmish, will be the trickiest part of the journey. Each tribe has a different piece of the puzzle, which we must gather. Machine learners, like all scientists, resemble the blind men and the elephant: one feels the trunk and thinks itвЂ™s a snake, another leans against the leg and thinks itвЂ™s a tree, yet another touches the tusk and thinks itвЂ™s a bull. Our aim is to touch each part without jumping to conclusions; and once weвЂ™ve touched all of them, we will try to picture the whole elephant. ItвЂ™s far from obvious how to combine all the pieces into one solution-impossible, according to some-but this is what we will do.. Overfitting happens when you have too many hypotheses and not enough data to tell them apart. The bad news is that even for the simple conjunctive learner, the number of hypotheses grows exponentially with the number of attributes. Exponential growth is a scary thing. AnE. coli bacterium can divide into two roughly every fifteen minutes; given enough nutrients it can grow into a mass of bacteria the size of Earth in about a day. When the number of things an algorithm needs to do grows exponentially with the size of its input, computer scientists call it a combinatorial explosion and run for cover. In machine learning, the number of possible instances of a concept is an exponential function of the number of attributes: if the attributes are Boolean, each new attribute doubles the number of possible instances by taking each previous instance and extending it with a yes or no for that attribute. In turn, the number of possible concepts is an exponential function of the number of possible instances: since a concept labels each instance as positive or negative, adding an instance doubles the number of possible concepts. As a result, the number of concepts is an exponential function of an exponential function of the number of attributes! In other words, machine learning is a combinatorial explosion of combinatorial explosions. Perhaps we should just give up and not waste our time on such a hopeless problem?. Even test-set accuracy is not foolproof. According to legend, in an early military application a simple learner detected tanks with 100 percent accuracy in both the training set and the test set, each consisting of one hundred images. Amazing-or suspicious? Turns out all the tank images were lighter than
the nontank ones, and thatвЂ™s all the learner was picking up. These days we have larger data sets, but the quality of data collection isnвЂ™t necessarily better, so caveat emptor. Hard-nosed empirical evaluation played an important role in the growth of machine learning from a fledgling field into a mature one. Up to the late 1980s, researchers in each tribe mostly believed their own rhetoric, assumed their paradigm was fundamentally better, and communicated little with the other camps. Then symbolists like Ray Mooney and Jude Shavlik started to systematically compare the different algorithms on the same data sets and-surprise, surprise-no clear winner emerged. Today the rivalry continues, but there is much more cross-pollination. Having a common experimental framework and a large repository of data sets maintained by the machine-learning group at the University of California, Irvine, did wonders for progress. And as weвЂ™ll see, our best hope of creating a universal learner lies in synthesizing ideas from different paradigms.. If your learnerвЂ™s test-set accuracy disappoints, you need to diagnose the problem. Was it blindness or hallucination? In machine learning, the technical terms for these arebias andvariance. A clock thatвЂ™s always an hour late has high bias but low variance. If instead the clock alternates erratically between fast and slow but on average tells the right time, it has high variance but low bias. Suppose youвЂ™re down at the pub with some friends, drinking and playing darts. Unbeknownst to them, youвЂ™ve been practicing for years, and youвЂ™re a master of the game. All your darts go straight to the bullвЂ™s-eye. You have low bias and low variance, which is shown in the bottom left corner of this diagram:. Decision trees are used in many different fields. In machine learning, they grew out of work in psychology. Earl Hunt and colleagues used them in the 1960s to model how humans acquire new concepts, and one of HuntвЂ™s graduate students, J. Ross Quinlan, later tried using them for chess. His original goal was to predict the outcome of king-rook versus king-knight endgames from the board positions. From those humble beginnings, decision trees have grown to be, according to surveys, the most widely used machine-learning algorithm. ItвЂ™s not hard to see why: theyвЂ™re easy to understand, fast to learn, and usually quite accurate without too much tweaking. Quinlan is the most prominent researcher in the symbolist school. An unflappable, down-to-earth Australian, he made decision trees the gold standard inclassification by dint of relentlessly improving them year after year, and writing beautifully clear papers about them.. Suppose a perceptron has two continuous inputsx andy. (In other words,x andy can take on any numeric values, not just 0 and 1.) Then each example can be represented by a point on the plane, and the boundary between positive examples (for which the perceptron outputs 1) and negative ones (output 0) is a straight line:. But now thereвЂ™s an important subtlety, in both natural and artificial evolution. We need to learn weights for every candidate structure along the way, not just the final one, in order to see how well it does in the struggle for life (in the natural case) or on the training data (in the artificial case). The structure we want to select at each step is the one that does best after learning weights, not before. So in reality, nature does not come before nurture; rather, they alternate, with each round of вЂњnurtureвЂќ learning setting the stage for the next round of вЂњnatureвЂќ learning and vice versa. Nature evolves for the nurture it gets. The evolutionary growth of the cortexвЂ™s associative areas builds on neural learning in the sensory areas, without which it would be useless. Goslings follow their mother around (evolved behavior) but that requires recognizing her (learned ability). If youвЂ™rethe first thing they see when they hatch, theyвЂ™ll follow you instead, as Konrad Lorenz memorably showed. The newborn brain already encodes features of the environment but not explicitly; rather, evolution optimized it to extract those features from the expected input. Likewise, in an algorithm that iteratively learns both structure and weights, each new structure is implicitly a function of the weights learned in previous rounds.. He who learns fastest wins. [РљР°СЂС‚РёРЅРєР°: pic_25.jpg]. This does not mean that there is nothing to worry about, however. The first big worry, as with any technology, is that AI could fall into the wrong hands. If a criminal or prankster programs an AI to take over the world, weвЂ™d better have an AI police capable of catching it and erasing it before it gets too far. The best insurance policy against vast AIs gone amok is vaster AIs keeping the peace.. Allen Newell and Herbert Simon formulated the hypothesis that all intelligence is symbol manipulation inвЂњComputer science as empirical enquiry: Symbols and searchвЂќ (Communications of the ACM, 1976). David Marr proposed his three levels of information processing inVision* (Freeman, 1982).Machine Learning: An Artificial Intelligence Approach,* edited by Ryszard Michalski, Jaime Carbonell, and Tom Mitchell (Tioga, 1983), gives a snapshot of the early days of symbolist research in machine learning.вЂњConnectionist AI, symbolic AI, and the brain,вЂќ* by Paul Smolensky (Artificial Intelligence Review, 1987), gives a connectionist view of symbolist models..