IвЂ™ve been a machine-learning researcher for more than twenty years. My interest in it was sparked by a book with an odd title I saw in a bookstore when I was a senior in college:Artificial Intelligence. It had only a short chapter on machine learning, but on reading it, I immediately became convinced that learning was the key to solving AI and that the state of the art was so primitive that maybe I could contribute something. Shelving plans for an MBA, I entered the PhD program at the University of California, Irvine. Machine learning was then a small, obscure field, and UCI had one of the few sizable research groups anywhere. Some of my classmates dropped out because they didnвЂ™t see much of a future in it, but I persisted. To me nothing could have more impact than teaching computers to learn: if we could do that, we would get a leg up on every other problem. By the time I graduated five years later, the data-mining explosion was under way, and so was my path to this book. My doctoral dissertation unified symbolic and analogical learning. IвЂ™ve spent much of the last ten years unifying symbolism and Bayesianism, and more recently those two with connectionism. ItвЂ™s time to go the next step and attempt a synthesis of all five paradigms.. In any case, if we formalize ChomskyвЂ™s вЂњpoverty of the stimulusвЂќ argument, we find that itвЂ™s demonstrably false. In 1969, J. J. Horning proved that probabilistic context-free grammars can be learned from positive examples only, and stronger results have followed. (Context-free grammars are the linguistвЂ™s bread and butter, and the probabilistic version models how likely each rule is to be used.) Besides, language learning doesnвЂ™t happen in a vacuum; children get all sorts of cues from their parents and the environment. If weвЂ™re able to learn language from a few yearsвЂ™ worth of examples, itвЂ™s partly because of the similarity between its structure and the structure of the world. This common structure is what weвЂ™re interested in, and we know from Horning and others that it suffices.. So, if the Master Algorithm exists, what is it? A seemingly obvious candidate is memorization: just remember everything youвЂ™ve seen; after a while youвЂ™ll have seen everything there is to see, and therefore know everything there is to know. The problem with this is that, as Heraclitus said, you never step in the same river twice. ThereвЂ™s far more to see than you ever could. No matter how many snowflakes youвЂ™ve examined, the next one will be different. Even if you had been present at the Big Bang and everywhere since, you would still have seen only a tiny fraction of what you could see in the future. If you had witnessed life on Earth up to ten thousand years ago, that would not have prepared you for what was to come. Someone who grew up in one city doesnвЂ™t become paralyzed when they move to another, but a robot capable only of memorization would. Besides, knowledge is not just a long list of facts. Knowledge is general, and has structure. вЂњAll humans are mortalвЂќ is much more succinct than sevenbillion statements of mortality, one for each human. Memorization gives us none of these things.. Aristotle is human. Aristotle is mortal.. Adam, the robot scientist we met in Chapter 1, gives a preview. AdamвЂ™s goal is to figure out how yeast cells work. It starts with basic knowledge of yeast genetics and metabolism and a trove of gene expression data from yeast cells. It then uses inverse deduction to hypothesize which genes are expressed as which proteins, designs microarray experiments to test them, revises its hypotheses, and repeats. Whether each gene is expressed depends on other genes and conditions in the environment, and the resulting web of interactions can be represented as a set of rules, such as:. The symbolists. As far as its neighbors are concerned, a neuron can only be in one of two states: firing or not firing. This misses an important subtlety, however. Action potentials are short lived; the voltage spikes for a small fraction of a second and immediately goes back to its resting state. And a single spike barely registers in the receiving neuron; it takes a train of spikes closely on each otherвЂ™s heels to wake it up. A typical neuron spikes occasionally in the absence of stimulation, spikes more and more frequently as stimulation builds up, and saturates at the fastest spiking rate it can muster, beyond which increased stimulation has no effect. Rather than a logic gate, a neuron is more like a voltage-to-frequency converter. The curve of frequency as a function of voltage looks like this:. When backprop first hit the streets, connectionists had visions of quickly learning larger and larger networks until, hardware permitting, they amounted to artificial brains. It didnвЂ™t turn out that way. Learning networks with one hidden layer was fine, but after that things soon got very difficult. Networks with a few layers worked only if they were carefully designed for the application (character recognition, say). Beyond that, backprop broke down. As we add layers, the error signal becomes more and more diffuse, like a river branching into smaller and smaller tributaries, until weвЂ™re down to individual raindrops that just donвЂ™t register. Learning with dozens or hundreds of hidden layers, like the brain, remained a distant dream, and by the mid-1990s, the excitement for multilayer perceptrons had petered out. A hard core of connectionists soldiered on, but by and large the attention of the machine-learning field moved elsewhere. (WeвЂ™ll survey those lands in Chapters 6 and 7.). One of the most important problems in machine learning-and life-is the exploration-exploitation dilemma. If youвЂ™ve found something that works, should you just keep doing it? Or is it better to try new things, knowing it could be a waste of time but also might lead to a better solution? Would you rather be a cowboy or a farmer? Start a company or run an existing one? Go steady or play the field? A midlife crisis is the yearning to explore after many years spent exploiting. On an impulse, you fly to Vegas, ready to gamble away your lifeвЂ™s savings on the chance of becoming a millionaire. You enter the first casino and face a row of slot machines. The one to play is the one that gives you the best payoff on average, but you donвЂ™t know which that is. You have to try each one enough times to figure it out. But if you do this for too long, you waste your money on losing machines. Conversely, if you jump the gun and pick a machine that looked good by chance on the first few turns but is in fact not the best one, you waste your money playing it for the rest of the night. ThatвЂ™s the exploration-exploitation dilemma. Each time you play, you have to choose between repeating the best move youвЂ™ve found so far, which gives you the best payoff, or trying other moves, which gather information that may lead to even better payoffs. With two slot machines, Holland showed that the optimal strategy is to flip a biased coin each time, where the coin becomes exponentially more biased as you go along. (DonвЂ™t sue me if it doesnвЂ™t work for you, though. Remember the house always wins in the end.) The better a slot machine looks, the more you should play it, but never completely give up on the other one, in case it turns out to be the best one after all.. In genetic programming, as Koza called his method, we cross over two program trees by randomly swapping two of their subtrees. For example, crossing over these two trees at the highlighted nodes yields the correct program for computingT as one of the children:. To learn is to get better with practice. You may barely remember it now, but learning to tie your shoelaces was really hard. At first you couldnвЂ™t do it at all, despite your five years of age. Then your laces probably came undone faster than you could tie them. But little by little you learned to tie them faster and better until it became completely automatic. The same happens with lots of other things, like crawling, walking, running, riding a bike, and driving a car; reading, writing, and arithmetic; playing an instrument and practicing a sport; cooking and using a computer. Ironically, you learn the most when itвЂ™s most painful: early on, when every step is difficult, you keep failing, and even when you succeed, the results arenot very pretty. After youвЂ™ve mastered your golf swing or tennis serve, you can spend years perfecting it, but all those years make less difference than the first few weeks did. You get better with practice, but not at a constant rate: at first you improve quickly, then not so quickly, then very slowly. Whether itвЂ™s playing games or the guitar, the curve of performance improvement over time-how well you do something or how long it takes you to do it-has a very specific form:. To share or not to share, and how and where. Seven for the Engineers in their halls of servers,. The third and perhaps biggest worry is that, like the proverbial genie, the machines will give us what we ask for instead of what we want. This is not a hypothetical scenario; learning algorithms do it all the time. We train a neural network to recognize horses, but it learns instead to recognize brown patches, because all the horses in its training set happened to be brown. You just bought a watch, so Amazon recommends similar items: other watches, which are now the last thing you want to buy. If you examine all the decisions that computers make today-who gets credit, for example-youвЂ™ll find that theyвЂ™re often needlessly bad. Yours would be too, if your brain was a support vector machine and all your knowledge of credit scoring came from perusing one lousy database. People worry that computers will get too smart and take over the world, but the real problem is that theyвЂ™re too stupid and theyвЂ™ve already taken over the world.. Chapter Five.