Traditionally, the only way to get a computer to do something-from adding two numbers to flying an airplane-was to write down an algorithm explaining how, in painstaking detail. But machine-learning algorithms, also known as learners, are different: they figure it out on their own, by making inferences from data. And the more data they have, the better they get. Now we donвЂ™t have to program computers; they program themselves.. Otherwise, if thereвЂ™s a move that creates two lines of two in a row, play that.. More generally, Chomsky is critical of all statistical learning. He has a list of things statistical learners canвЂ™t do, but the list is fifty years out of date. Chomsky seems to equate machine learning with behaviorism, where animal behavior is reduced to associating responses with rewards. But machine learning is not behaviorism. Modern learning algorithms can learn rich internal representations, not just pairwise associations between stimuli.. Someday thereвЂ™ll be a robot in every house, doing the dishes, making the beds, even looking after the children while the parents work. How soon depends on how hard finding the Master Algorithm turns out to be. If the best we can do is combine many different learners, each of which solves only a small part of the AI problem, weвЂ™ll soon run into the complexity wall. This piecemeal approach worked forJeopardy!, but few believe tomorrowвЂ™s housebots will be WatsonвЂ™s grandchildren. ItвЂ™s not that the Master Algorithm will single-handedly crack AI; thereвЂ™ll still be great feats of engineering to perform, and Watson is a good preview of them. But the 80/20 rule applies: the Master Algorithm will be 80 percent of the solution and 20 percent of the work, so itвЂ™s surely the best place to start.. ItвЂ™s not like big data would solve the problem. You could be super-Casanova and have dated millions of women thousands of times each, but your master database still wouldnвЂ™t answer the question of whatthis woman is going to saythis time. Even if today is exactly like some previous occasion when she said yes-same day of week, same type of date, same weather, and same shows on TV-that still doesnвЂ™t mean that this time she will say yes. For all you know, her answer is determined by some factor that you didnвЂ™t think of or donвЂ™t have access to. Or maybe thereвЂ™s no rhyme or reason to her answers: theyвЂ™re random, and youвЂ™re just spinning your wheels trying to find a pattern in them.. DonвЂ™t give up on machine learning or the Master Algorithm just yet, though. We donвЂ™t care about all possible worlds, only the one we live in. If we know something about the world and incorporate it into our learner, it now has an advantage over random guessing. To this Hume would reply that that knowledge must itself have come from induction and is therefore fallible. ThatвЂ™s true, even if the knowledge was encoded into our brains by evolution, but itвЂ™s a risk weвЂ™ll have to take. We can also ask whether thereвЂ™s a nugget of knowledge so incontestable, so fundamental, that we can build all induction on top of it. (Something like DescartesвЂ™ вЂњI think, therefore I am,вЂќ although itвЂ™s hard to see how to turn that one into a learning algorithm.) I think the answer is yes, and weвЂ™ll see what that nugget is in Chapter 9.. As far as its neighbors are concerned, a neuron can only be in one of two states: firing or not firing. This misses an important subtlety, however. Action potentials are short lived; the voltage spikes for a small fraction of a second and immediately goes back to its resting state. And a single spike barely registers in the receiving neuron; it takes a train of spikes closely on each otherвЂ™s heels to wake it up. A typical neuron spikes occasionally in the absence of stimulation, spikes more and more frequently as stimulation builds up, and saturates at the fastest spiking rate it can muster, beyond which increased stimulation has no effect. Rather than a logic gate, a neuron is more like a voltage-to-frequency converter. The curve of frequency as a function of voltage looks like this:. The optimal weight, where the error is lowest, is 2.0. If the network starts out with a weight of 0.75, for example, backprop will get to the optimum in a few steps, like a ball rolling downhill. But if it starts at 5.5, on the other hand, backprop will roll down to 7.0 and remain stuck there. Backprop, with its incremental weight changes, doesnвЂ™t know how to find the global error minimum, and local ones can be arbitrarily bad, like mistaking your grandmother for a hat. With one weight, you could try every possible value at increments of 0.01 and find the optimum that way. But with thousands of weights, let alone millions or billions, this is not an option because the number of points on the grid goes up exponentially with the number of weights. The global minimum is hidden somewhere in the unfathomable vastness of hyperspace-and good luck finding it.. Beware of attaching too much meaning to the weights backprop finds, however. Remember that there are probably many very different ones that are just as good. Learning in multilayer perceptrons is a chaotic process in the sense that starting in slightly different places can cause you to wind up at very different solutions. The phenomenon
is the same whether the slight difference is in the initial weights or the training data and manifests itself in all powerful learners, not just backprop.. Evolution searches for good structures, and neural learning fills them in: this combination is the easiest of the steps weвЂ™ll take toward the Master Algorithm. This may come as a surprise to anyone familiar with the never-ending twists and turns of the nature versus nurture controversy, 2,500 years old and still going strong. Seeing life through the eyes of a computer clarifies a lot of things, however. вЂњNatureвЂќfor a computer is the program it runs, and вЂњnurtureвЂќ is the data it gets. The question of which one is more important is clearly absurd; thereвЂ™s no output without both program and data, and itвЂ™s not like the output is, say, 60 percent caused by the program and 40 percent by the data. ThatвЂ™s the kind of linear thinking that a familiarity with machine learning immunizes you against.. To learn an SVM, we need to choose the support vectors and their weights. The similarity measure, which in SVM-land is called the kernel, is usually chosen a priori. One of VapnikвЂ™s key insights was that not all borders that separate the positive training examples from the negative ones are created equal. Suppose Posistan and Negaland are at war, and theyвЂ™re separated by a no-manвЂ™s-land with minefields on either side. Your mission is to survey the no-manвЂ™s-land, walking from one end of it to the other without stepping on any mines. Luckily, you have a map of where the mines are buried. Obviously, you donвЂ™t just take any old path: you give the mines the widest possible berth. ThatвЂ™s what SVMs do, with the examples as mines and the learned border as the chosen path. The closest the border ever comes to an example is its margin of safety, and the SVM chooses the support vectors and weights that yield the maximum possible margin. For example, the solid straight-line border in this figure is better than the dotted one:. One of the most popular algorithms for nonlinear dimensionality reduction, called Isomap, does just this. It connects each data point in a high-dimensional space (a face, say) to all nearby points (very similar faces), computes the shortest distances between all pairs of points along the resulting network and finds the reduced coordinates that best approximate these distances. In contrast to PCA, facesвЂ™ coordinates in this space are often quite meaningful: one may represent which direction the face is facing (left profile, three quarters, head on, etc.); another how the face looks (very sad, a little sad, neutral, happy, very happy, etc.); and so on. From understanding motion in video to detecting emotion in speech, Isomap has a surprising ability to zero in on the most important dimensions of complex data.. [РљР°СЂС‚РёРЅРєР°: pic_34.jpg]. Bayesians believe that modeling uncertainty is the key to learning and use formal representations like Bayesian networks and Markov networks to do so. As we already saw, Markov networks are a special type of MLN. Bayesian networks are also easily represented using the MLN master equation, with a feature for each possible state of a variable and its parents, and the logarithm of the corresponding conditional probability as its weight. (The normalization constantZ then conveniently reduces to 1, meaning we can ignore it.) BayesiansвЂ™ master algorithm is BayesвЂ™ theorem, implemented using probabilistic inference algorithms like belief propagation and MCMC. As you may have noticed, BayesвЂ™ theorem is a special case of the master equation, withP =P(A|B),Z =P(B), and features and weights corresponding toP(A) andP(B|A). The Alchemy system includes both belief propagation and MCMC for inference, generalized to handle weighted logical formulas. Using probabilistic inference over the proof paths provided by logic, Alchemy weighs the evidence for and against a conclusion and outputs the probability of the conclusion. This contrasts with theвЂњplain vanillaвЂќ logic used by symbolists, which is all or none and so falls apart when given contradictory evidence.. вЂњLove, actuarially,вЂќ by Kevin Poulsen (Wired, 2014), tells the story of how one man used machine learning to find love on the OkCupid dating site.Dataclysm, by Christian Rudder (Crown, 2014), mines OkCupidвЂ™s data for sundry insights.Total Recall, by Gordon Moore and Jim Gemmell (Dutton, 2009), explores the implications of digitally recording everything we do.The Naked Future, by Patrick Tucker (Current, 2014), surveys the use and abuse of data for prediction in our world. Craig Mundie argues for a balanced approach to data collection and use inвЂњPrivacy pragmatismвЂќ (Foreign Affairs, 2014).The Second Machine Age, by Erik Brynjolfsson and Andrew McAfee (Norton, 2014), discusses how progress in AI will shape the future of work and the economy.вЂњWorld War R,вЂќ by Chris Baraniuk (New Scientist, 2014) reports on the debate surrounding the use of robots in battle.вЂњTranscending complacency on superintelligent machines,вЂќ by Stephen Hawking et al. (Huffington Post, 2014), argues that now is the time to worry about AIвЂ™s risks. Nick BostromвЂ™sSuperintelligence (Oxford University Press, 2014) considers those dangers and what to do about them..