Once we look at machine learning this way, two things immediately jump out. The first is that the more data we have, the more we can learn. No data? Nothing to learn. Big data? Lots to learn. ThatвЂ™s why machine learning has been turning up everywhere, driven by exponentially growing mountains of data. If machine learning was something you bought in the supermarket, its carton would say: вЂњJust add data.вЂќ. For example, we believe that the laws of physics gave rise to evolution, but we donвЂ™t know how. Instead, we can induce natural selection directly from observations, as Darwin did. Countless wrong inferences could be drawn from those observations, but most of them never occur to us, because our inferences are influenced by our broad knowledge of the world, and that knowledge is consistent with the laws of nature.. Figuring out how proteins fold into their characteristic shapes; reconstructing the evolutionary history of a set of species from their DNA; proving theorems in propositional logic; detecting arbitrage opportunities in markets with transaction costs; inferring a three-dimensional shape from two-dimensional views; compressing data on a disk; forming a stable coalition in politics; modeling turbulence in sheared flows; finding the safest portfolio of investments with a given return, the shortest route to visit a set of cities, the best layout of components on a microchip, the best placement of sensors in an ecosystem, or the lowest energy state of a spin glass; scheduling flights, classes, and factory jobs; optimizing resource allocation, urban traffic flow, social welfare, and (most important) your Tetris score: these are all NP-complete problems, meaning that if you can efficiently solve one of them you can efficiently solve all problems in the class NP, including each other. Who would have guessed that all these problems, superficially so different, are really the same? But if they are, it makes sense that one algorithm could learn to solve all of them (or, more precisely, all efficiently solvable instances).. As Isaiah Berlin memorably noted, some thinkers are foxes-they know many small things-and some are hedgehogs-they know one big thing. The same is true of learning algorithms. I hope the Master Algorithm is a hedgehog, but even if itвЂ™s a fox, we canвЂ™t catch it soon enough. The biggest problem with todayвЂ™s learning algorithms is not that they are plural; itвЂ™s that, useful as they are, they still donвЂ™t do everything weвЂ™d like them to. Before we can discover deep truths with machine learning, we have to discover deep truths about machine learning.. Candidates that donвЂ™t make the cut. HumeвЂ™s question is also the departure point for our journey. WeвЂ™ll start by illustrating it with an example from daily life and meeting its modern embodiment in the famous вЂњno free lunchвЂќ theorem. Then weвЂ™ll see the symbolistsвЂ™ answer to Hume. This leads us to the most important problem in machine learning: overfitting, or hallucinating patterns that arenвЂ™t really there. WeвЂ™ll see how the symbolists solve it, and how machine learning is at heart a kind of alchemy, transmuting data into knowledge with the aid of a philosopherвЂ™s stone. For the symbolists, the philosopherвЂ™s stoneis knowledge itself. In the next four chapters weвЂ™ll study the solutions of the other tribesвЂ™ alchemists.. If you liked Star
Wars,episodes IV-VI, youвЂ™ll like Avatar.. Socrates is human.. One such rule is:If Socrates is human, then heвЂ™s mortal. This does the job, but is not very useful because itвЂ™s specific to Socrates. But now we apply NewtonвЂ™s principle and generalize the rule to all entities:If an entity is human, then itвЂ™s mortal. Or, more succinctly:All humans are mortal. Of course, it would be rash to induce this rule from Socrates alone, but we know similar facts about other humans:. HebbвЂ™s rule was a confluence of ideas from psychology and neuroscience, with a healthy dose of speculation thrown in. Learning by association was a favorite theme of the British empiricists, from Locke and Hume to John Stuart Mill. In hisPrinciples of Psychology, William James enunciates a general principle of association thatвЂ™s remarkably similar to HebbвЂ™s rule, with neurons replaced by brain processes and firing efficiency by propagation of excitement. Around the same time, the great Spanish neuroscientist Santiago RamГіn y Cajal was making the first detailed observations of the brain, staining individual neurons using the recently invented Golgi method and cataloguing what he saw like a botanist classifying new species of trees. By HebbвЂ™s time, neuroscientists had a rough understanding of how neurons work, but he was the first to propose a mechanism by which they could encode associations.. [РљР°СЂС‚РёРЅРєР°: pic_10.jpg]. [РљР°СЂС‚РёРЅРєР°: pic_12.jpg]. Like many other early machine-learning researchers, Holland started out working on neural networks, but his interests took a different turn when, while a graduate student at the University of Michigan, he read Ronald FisherвЂ™s classic treatiseThe Genetical Theory of Natural Selection. In it, Fisher, who was also the founder of modern statistics, formulated the first mathematical theory of evolution. Brilliant as it was, Holland felt that FisherвЂ™s theory left out the essence of evolution. Fisher considered each gene in isolation, but an organismвЂ™s fitness is a complex function of all its genes. If genes are independent, the relative frequencies of their variants rapidly converge to the maximum fitness point and remain in equilibrium thereafter. But if genes interact, evolution-the search for maximum fitness-is vastly more complex. With one thousand genes, each with two variants, the genome has 21000 possible states, and no planet in the universe is remotely large or ancient enough to have tried them all out. Yet on Earth evolution has managed to come up with some remarkably fit organisms, and DarwinвЂ™s theory of natural selection explains how, at least qualitatively. Holland decided to turn it into an algorithm.. [РљР°СЂС‚РёРЅРєР°: pic_13.jpg]. Practical successes aside, SVMs also turned a lot of machine-learning conventional wisdom on its head. For example, they gave the lie to the notion, sometimes misidentified with OccamвЂ™s razor, that simpler models are more accurate. On the contrary, an SVM can have an infinite number of parameters and still not overfit, provided it has a large enough margin..