Your clock radio goes off at 7:00 a.m. ItвЂ™s playing a song you havenвЂ™t heard before, but you really like it. Courtesy of Pandora, itвЂ™s been learning your tastes in music, like your own personal radio jock. Perhaps the song itself was produced with the help of machine learning. You eat breakfast and read the morning paper. It came off the printing press a few hours earlier, the printing process carefully adjusted to avoid streaking using a learning algorithm. The temperature in your house is just right, and your electricity bill noticeably down, since you installed a Nest learning thermostat.. Machine learning is sometimes confused with artificial intelligence (or AI for short). Technically, machine learning is a subfield of AI, but itвЂ™s grown so large and successful that it now eclipses its proud parent. The goal of AI is to teach computers to do what humans currently do better, and learning is arguably the most important of those things: without it, no computer can keep up with a human for long; with it, the rest follows.. If so few learners can do so much, the logical question is: Could one learner do everything? In other words, could a single algorithm learn all that can be learned from data? This is a very tall order, since it would ultimately include everything in an adultвЂ™s brain, everything evolution has created, and the sum total of all scientific knowledge. But in fact all the major learners-including nearest-neighbor, decision trees, and Bayesian networks, a generalization of NaГЇve Bayes-are universal in the following sense: if you give the learner enough ofthe appropriate data, it can approximate any function arbitrarily closely-which is math-speak for learning anything. The catch is that вЂњenough dataвЂќ could be infinite. Learning from finite data requires making assumptions, as weвЂ™ll see, and different learners make different assumptions, whichmakes them good for some things but not others.. In April 2000, a team of neuroscientists from MIT reported inNature the results of an extraordinary experiment. They rewired the brain of a ferret, rerouting the connections from the eyes to the auditory cortex (the part of the brain responsible for processing sounds) and rerouting the connections from the ears to the visual cortex. YouвЂ™d think the result would be a severely disabled ferret, but no: the auditory cortex learned to see, the visual cortex learned to hear, and the ferret was fine. In normal mammals, the visual cortex contains a map of the retina: neurons connected to nearby regions of the retina are close to each other in the cortex. Instead, the rewired ferrets developed a map of the retina in the auditory cortex. If the visual input is redirected instead to the somatosensory cortex, responsible for touch perception, it too learns to see. Other mammals also have this ability.. Of course, the Master Algorithm has at least as many skeptics as it has proponents. Doubt is in order when something looks like a silver bullet. The most determined resistance comes from machine learningвЂ™s perennial foe: knowledge engineering. According to its proponents, knowledge canвЂ™t be learned automatically; it must be programmed into the computer by human experts. Sure, learners can extract some things from data, but nothing youвЂ™d confuse withreal knowledge. To knowledge engineers, big data is not the new oil; itвЂ™s the new snake oil.. A different theory of everything. Our search for the Master Algorithm is complicated, but also enlivened, by the rival schools of thought that exist within machine learning. The main ones are the symbolists, connectionists, evolutionaries, Bayesians, and analogizers. Each tribe has a set of core beliefs, and a particular problem that it cares most about. It has found a solution to that problem, based on ideas from its allied fields of science, and it has a master algorithm that embodies it.. If youвЂ™re a member of the Sierra Club and read science-fiction books, youвЂ™ll like Avatar.. The symbolistsвЂ™ core belief is that all intelligence can be reduced to manipulating symbols. A mathematician solves equations by moving symbols around and replacing symbols by other symbols according to predefined rules. The same is true of a logician carrying out deductions. According to this hypothesis, intelligence is independent of the substrate; it doesnвЂ™t matter if the symbol manipulations are done by writing on a blackboard, switching transistors on and off, firing neurons, or playing with Tinkertoys. If you have a setup with the power of a universal Turing machine, you can do anything. Softwarecan be cleanly separated from hardware, and if your concern is figuring out how machines can learn, you (thankfully) donвЂ™t need to worry about the latter beyond buying a PC or cycles on AmazonвЂ™s cloud.. CHAPTER SEVEN: You Are What You Resemble. [РљР°СЂС‚РёРЅРєР°: pic_24.jpg]. Generally, the fewer support vectors an SVM selects, the better it generalizes. Any training example that is not a support vector would be correctly classified if it showed up as a test example instead because the frontier between positive and negative examples would still be in the same place. So the expected error rate of an SVM is at most the fraction of examples that are support vectors. As the number of dimensions goes up, this fraction tends to go up as well, so SVMs are not immune to the curse of dimensionality. But theyвЂ™re more resistant to it than most.. When you arrange books on a shelf so that books on similar topics are close to each other, youвЂ™re doing a kind of dimensionality reduction, from the vast space of topics to the one-dimensional shelf. Unavoidably, some books that are closely related will wind up far apart on the shelf, but you can still order them in a way that minimizes such occurrences. ThatвЂ™s what dimensionality reduction algorithms do.. Notice that the network has a separate feature for each pair of people:Alice and Bob both have the flu, Alice and Chris both have the flu, and so on. But we canвЂ™t learn a separate weight for each pair, because we only have one data point per pair (whether itвЂ™s infected or not), and we wouldnвЂ™t be able to generalize to members of the network we havenвЂ™t diagnosed yet (do Yvette and Zach both have the flu?). What we can do instead is learn a single weight for all features of the same form, based on all the instances of it that weвЂ™ve seen. In effect,X and Y have the flu is a template for features that can be instantiated with each pair of acquaintances (Alice and Bob, Alice and Chris, etc.). The weights for all the instances of a template areвЂњtied together,вЂќ in the sense that they all have the same value, and thatвЂ™s how we can generalize despite having only one example (the whole network). In nonrelational learning, the parameters of a model are tied in only one way: across all the independent examples (e.g., all the patients weвЂ™ve diagnosed). In relational learning, every feature template we create ties the parameters of all its instances.. Chapter Nine.