By taking automation to new heights, the machine-learning revolution will cause extensive economic and social changes, just as the Internet, the personal computer, the automobile, and the steam engine did in their time. One area where these changes are already apparent is business.. Sets of rules are vastly more powerful than conjunctive concepts. TheyвЂ™re so powerful, in fact, that you can representany concept using them. ItвЂ™s not hard to see why. If you give me a complete list of all the instances of a concept, I can just turn each instance into a rule that specifies all attributes of that instance, and the set of all those rules is the definition of the concept. Going back to the dating example, one rule would be:If itвЂ™s a warm weekend night, thereвЂ™s nothing good on TV, and you propose going to a club, sheвЂ™ll say yes. The table only contains a few examples, but if it contained all 2Г— 2 Г— 2 Г— 2 = 16 possible ones, with each labeled вЂњDateвЂќ or вЂњNo date,вЂќ turning each positive example into a rule in this way would do the trick.. If youвЂ™re for cutting taxes, pro-choice, and pro-gun control, youвЂ™re a Democrat.. [РљР°СЂС‚РёРЅРєР°: pic_9.jpg]. Whenever the learnerвЂ™s вЂњretinaвЂќ sees a new image, that signal propagates forward through the network until it produces an output. Comparing this output with the desired one yields an error signal, which then propagates back through the layers until it reaches the retina. Based on this returning signal and on theinputs it had received during the forward pass, each neuron adjusts its weights. As the network sees more and more images of your grandmother and other people, the weights gradually converge to values that let it discriminate between the two. Backpropagation, as this algorithm is known, is phenomenally more powerful than the perceptron algorithm. A single neuron could only learn straight lines. Given enough hidden neurons, a multilayer perceptron, as itвЂ™s called, can represent arbitrarily convoluted frontiers. This makes backpropagation-or simply backprop-the connectionistsвЂ™ master algorithm.. Notice that weвЂ™re only saying that fever and cough are independent given that you have the flu, not overall. Clearly, if we donвЂ™t know whether you have the flu, fever and cough are highly correlated, since youвЂ™re much more likely to have a cough if you already have a fever.P(fever, cough) isnot equal toP(fever)Г— P(cough). All weвЂ™re saying is that, if we know you have the flu, knowing whether you have a fever gives us noadditional information about whether you have a cough. Likewise, if you donвЂ™t know the sun is about to rise and you see the stars fade, your expectation that the sky will lighten increases; but if you already know that sunrise is imminent, seeing the stars fade makes no difference.. If the states and observations are continuous variables instead of discrete ones, the HMM becomes whatвЂ™s known as a Kalman filter. Economists use Kalman filters to remove noise from time series of quantities like GDP, inflation, and unemployment. The вЂњtrueвЂќ GDP values are the hidden states; at each time step, the true value should be similar to the observed one, but also to the previous true value, since the economy seldom makes abrupt jumps. The Kalman filter trades off these two, yielding a smoother curve that still accords with the observations. When a missile cruises to its target, itвЂ™s a Kalman filter that keeps it on track. Without it, there would have been no man on the moon.. The inference problem. The analogizers are the least cohesive of the five tribes. Unlike the others, which have a strong identity and common ideals, the analogizers are more of a loose collection of researchers, united only by their reliance on similarity judgments as the basis for learning. Some, like the support vector machine folks, might even object to being brought under such an umbrella. But itвЂ™s raining deep models outside, and I think they would benefit greatly from making common cause. Similarity is one of the central ideas in machine learning, and the analogizers in all their guises are its keepers. Perhaps in a future decade, machine learning will be dominated by deep analogy, combining in one algorithm the efficiency of nearest-neighbor, the mathematical sophistication of support vector machines, and the power and flexibility of analogical reasoning. (There, I just gave away one of my secret research projects.). The learners we saw in the previous chapters are all guided by instant gratification: every action, whether itвЂ™s flagging a spam e-mail or buying a stock, gets an immediate reward or punishment from the teacher. But thereвЂ™s a whole subfield of machine learning dedicated to algorithms that explore
on their own, flail, hit on rewards, and figure out how to get them again in the future, much like babies crawling around and putting things in their mouths.. Take a moment to consider all the data about you thatвЂ™s recorded on all the worldвЂ™s computers: your e-mails, Office docs, texts, tweets, and Facebook and LinkedIn accounts; your web searches, clicks, downloads, and purchases; your credit, tax, phone, and health records; your Fitbit statistics; your driving as recorded by your carвЂ™s microprocessors; your wanderings as recorded by your cell phone; all the pictures of you ever taken; brief cameos on security cameras; your Google Glass snippets-and so on and so forth. If a future biographer had access to nothing but this вЂњdata exhaustвЂќ of yours, what picture of you would he form? Probablya quite accurate and detailed one in many ways, but also one where some essential things would be missing. Why did you, one beautiful day, decide to change careers? Could the biographer have predicted it ahead of time? What about that person you met one day and secretly never forgot? Could the biographer wind back through the found footage and say вЂњAh, thereвЂќ?. In sum, all four kinds of data sharing have problems. These problems all have a common solution: a new type of company that is to your data what your bank is to your money. Banks donвЂ™t steal your money (with rare exceptions). TheyвЂ™re supposed to invest it wisely, and your deposits are FDIC-insured. Many companies today offer to consolidate your data somewhere in the cloud, but theyвЂ™re still a far cry from your personal data bank. If theyвЂ™re cloud providers, they try tolock you in-a big no-no. (Imagine depositing your money with Bank of America and not knowing if youвЂ™ll be able to transfer it to Wells Fargo somewhere down the line.) Some startups offer to hoard your data and then mete it out to advertisers in return for discounts, but to me that misses the point. Sometimes you want to give information to advertisers for free because itвЂ™s in your interests, sometimes you donвЂ™t want to give it at all, and what to share when is a problem that only a good model of you can solve.. In the same way that culture coevolved with larger brains, we will coevolve with our creations. We always have: humans would be physically different if we had not invented fire or spears. We areHomo technicus as much asHomo sapiens. But a model of the cell of the kind I envisaged in the last chapter will allow something entirely new: computers that design cells based on the parameters we give them, in the same way that silicon compilers design microchips based on their functional specifications. The corresponding DNA can then be synthesized and inserted into aвЂњgenericвЂќ cell, transforming it into the desired one. Craig Venter, the genome pioneer, has already taken the first steps in this direction. At first we will use this power to fight disease: a new pathogen is identified, the cure is immediately found, and your immune system downloads it from the Internet.Health problems becomes an oxymoron. Then DNA design will let people at last have the body they want, ushering in an age of affordable beauty, in William GibsonвЂ™s memorable words. And thenHomo technicus will evolve into a myriad different intelligent species, each with its own niche, a whole new biosphere as different from todayвЂ™s as todayвЂ™s is from the primordial ocean.. Prologue. The need for weighting the word probabilities in speech recognition is discussed in Section 9.6 ofSpeech and Language Processing,* by Dan Jurafsky and James Martin (2nd ed., Prentice Hall, 2009). My paper on NaГЇve Bayes, with Mike Pazzani, is вЂњOn the optimality of the simple Bayesian classifier under zero-one lossвЂќ* (Machine Learning, 1997; expanded journal version of the 1996 conference paper). Judea PearlвЂ™s book,* mentioned above, discusses Markov networks along with Bayesian networks. Markov networks in computer vision are the subject ofMarkov Random Fields for Vision and Image Processing,* edited by Andrew Blake, Pushmeet Kohli, and Carsten Rother (MIT Press, 2011). Markov networks that maximize conditional likelihood were introduced inвЂњConditional random fields: Probabilistic models for segmenting and labeling sequence data,вЂќ* by John Lafferty, Andrew McCallum, and Fernando Pereira (International Conference on Machine Learning, 2001)..