After a point, there just arenвЂ™t enough programmers and consultants to do all thatвЂ™s needed, and the company inevitably turns to machine learning. Amazon canвЂ™t neatly encode the tastes of all its customers in a computer program, and Facebook doesnвЂ™t know how to write a program that will choose the best updates to show to each of its users. Walmart sells millions of products and has billions of choices to make every day; if the programmers at Walmart tried to write a program to make all of them, they would never be done. Instead, what these companies do is turn learning algorithms loose on the mountains of data theyвЂ™ve accumulated and let them divine what customers want.. Machine learning was the kingmaker in the 2012 presidential election. The factors that usually decide presidential elections-the economy, likability of the candidates, and so on-added up to a wash, and the outcome came down to a few key swing states. Mitt RomneyвЂ™s campaign followed a conventional polling approach, grouping voters into broad categories and targeting each one or not. Neil Newhouse, RomneyвЂ™s pollster, said that вЂњif we can win independents in Ohio, we can win this race.вЂќ Romney won them by 7 percent but still lost the state and the election.. Of course, as with companies, politicians can put their machine-learned knowledge to bad uses as well as good ones. For example, they could make inconsistent promises to different voters. But voters, media, and watchdog organizations can do their own data mining and expose politicians who cross the line. The arms race is not just between candidates but among all participants in the democratic process.. This does not mean that we canвЂ™t simulate a brain with a computer; after all, thatвЂ™s what connectionist algorithms do. Because a computer is a universal Turing machine, it can implement the brainвЂ™s computations as well as any others, provided we give it enough time and memory. In particular, the computer can use speed to make up for lack of connectivity, using the same wire a thousand times over to simulate a thousand wires. In fact, these days the main limitation of computers compared to brains is energy consumption: your brain uses only about as much power as a small lightbulb, while WatsonвЂ™s supply could light up a whole office building.. In HebbвЂ™s time there was no way to measure synaptic strength or change in it, let alone figure out the molecular biology of synaptic change. Today, we know that synapses do grow (or form anew) when the postsynaptic neuron fires soon after the presynaptic one. Like all cells, neurons have different concentrations of ions inside and outside, creating a voltage across their membrane. When the presynaptic neuron fires, tiny sacs release neurotransmitter molecules into the synaptic cleft. These cause channels in the postsynaptic neuronвЂ™s membrane to open, letting in potassium and sodium ions and changing the voltage across the membrane as a result. If enough presynaptic neurons fire close together, the voltage suddenly spikes, and an action potential travels down the postsynaptic neuronвЂ™s axon. This also causes the ion channels to become more responsive and new channels to appear, strengthening the synapse. To the best of our knowledge, this is how neurons learn.. Imagine youвЂ™ve been kidnapped and left blindfolded somewhere in the Himalayas. Your head is throbbing, and your memory is not too good, either. All you know is you need to get to the top of Mount Everest. What do you do? You take a step forward and nearly slide into a ravine. After catching your breath, youdecide to be a bit more systematic. You carefully feel around with your foot until you find the highest point you can and step gingerly to that point. Then you do the same again. Little by little, you get higher and higher. After a while, every step you can take is down, and you stop. ThatвЂ™s gradient ascent. If the Himalayas were just Mount Everest, and Everest was a perfect cone, it would work like a charm. But more likely, when you get to a place where every step is down, youвЂ™re still very far from the top. YouвЂ™re just standing on a foothill somewhere, and youвЂ™re stuck. ThatвЂ™s what happens to backprop, except it climbs mountains in hyperspace instead of 3-D. If your network has a single neuron, just climbing to better weights one step at a time will get you to the top. But with a multilayer perceptron, the landscape is very rugged; good luck finding the highest peak.. Since our goal is to produce the best spam filter we can, as opposed to faithfully simulating real natural selection, we can cheat liberally by modifying the algorithm to fit our needs. One way in which genetic algorithms routinely cheat is by allowing immortality. (Too bad we canвЂ™t do that in real life.) That way, a highly fit individual doesnвЂ™t simply compete to reproduce within its own generation, but also with its children, and then its grandchildren, great-grandchildren, and so on, as long as it remains one of the fittest individuals in the population. In contrast,in the real world the best a highly fit individual can do is pass on half its genes to many children, each of which will probably be less fit because of the genes it inherited from its other parent. Immortality avoids this backsliding and with any luck, lets the algorithm reach the desired fitness sooner. Of course, since the fittest humans in history as measured by number of descendants are the likes of Genghis Khan-ancestor to one in two hundred men alive today-perhaps itвЂ™s not so bad that in real life immortality isverboten.. The problems for genetic programming do not end there. Indeed, even its successes might not be as genetic as evolutionaries would like. Take circuit design, which was genetic programmingвЂ™s emblematic success. As a rule, even relatively simple designs require an enormous amount of search, and itвЂ™s not clear how much the results owe to brute force rather than genetic smarts. To address the growing chorus of critics, Koza included in his 1992 bookGenetic Programming experiments showing that genetic programming beat randomly generating candidates on Boolean circuit synthesis problems, but the margin of victory was small. Then, at the 1995 International Conference on Machine Learning (ICML) in Lake Tahoe, California, Kevin Lang published a paper showing that hill climbing beat genetic programming on the same problems, often by a large margin. Koza and other evolutionaries had repeatedly tried to publish papers in ICML, a leading venue in the field, but to their increasing frustration they kept being rejected due to insufficient empirical validation. Already frustrated with his papers being rejected, seeing LangвЂ™s paper made Koza blow his top. On short order, he produced a twenty-three-page paper in two-column ICML format refuting LangвЂ™s conclusions and accusing the ICML reviewers of scientific misconduct. He then placed a copy on every seat in the conference auditorium. Depending on your point of view, either LangвЂ™s paper or KozaвЂ™s response was the last straw; regardless, the Tahoe incident marked the final divorce between the evolutionaries and the rest of the machine-learning community, with the evolutionaries moving out of the house. Genetic programmers started their own conference, which merged with the genetic algorithms conference to form GECCO, the Genetic and Evolutionary Computing Conference. For its part, the machine-learning mainstream largely forgot them. A saddГ©nouement, but not the first time in history that sex is to blame for a breakup.. On the one hand, evolution has produced many amazing things, none more amazing than you. With or without crossover, evolving structure is an essential part of the Master Algorithm. The brain can learn anything, but it canвЂ™t evolve a brain. If we thoroughly understood its architecture, we could just implement it in hardware, but weвЂ™re very far from that; getting an assist from computer-simulated evolution is a no-brainer. WhatвЂ™s more, we also want to evolve the brains of robots, systems with arbitrary sensors,and super-AIs. ThereвЂ™s no reason to stick with the design of the human brain if there are better ones for those tasks. On the other hand, evolution is excruciatingly slow. The entire life of an organism yields only one piece of information about its genome: its fitness, reflected in the organismвЂ™s number of offspring. ThatвЂ™s a colossal waste of information, which neural learning avoids by acquiring the information at the point of use (so to speak). As connectionists like Geoff Hinton like to point out, thereвЂ™s no advantage to carrying around in the genome information that we can readily acquire from the senses. When a newborn opens his eyes, the visual world comes flooding in; the brain just has to organize it. What does need to be specified in the genome, however, is the architecture of the machine that does the organizing.. Analogy was the spark that ignited many of historyвЂ™s greatest scientific advances. The theory of natural selection was born when Darwin, on reading MalthusвЂ™sEssay on Population, was struck by the parallels between the struggle for survival in the economy and in nature. BohrвЂ™s model of the atom arose from seeing it as a miniature solar system, with electrons as the planets and the nucleus as the sun. KekulГ© discovered the ring shape of the benzene molecule after daydreaming of a snake eating its own tail.. Superficially, an SVM looks a lot like weightedk-nearest-neighbor: the frontier between the positive and negative classes is defined by a set of examples and their weights, together with a similarity measure. A test example belongs to the positive class if, on average, it looks more like the positive examples than the negative ones. The average is weighted, and the SVM remembers only the key examples required to pin down the frontier. If you look back at the Posistan/Negaland example, once we throw away all the towns that arenвЂ™t on the border, all thatвЂ™s left is this map:. Learning an MLN means discovering formulas that are true in the world more often than random chance would predict, and figuring out the weights for those formulas that cause their predicted probabilities to match their observed frequencies. Once weвЂ™ve learned an MLN, we can use it to answer questions like вЂњWhat is the probability that Bob has the flu, given that heвЂ™s friends with Alice and she has the flu?вЂќ And guess what? It turns out that the probability is given by an S curve applied to the weighted sum of features, much as in a multilayer perceptron. And an MLN with long chains of rules can represent a deep neural network, with one layer per link in the chain.. Computational complexity is one thing, but human complexity is another. If computers are like idiot savants, learning algorithms can sometimes come across like child prodigies prone to temper tantrums. ThatвЂ™s one reason humans who can wrangle them into submission are so highly paid. If you know how to expertly tweak the control knobs until theyвЂ™re just right, magic can ensue, in the form of a stream of insights beyond the learnerвЂ™s years. And, not unlike the Delphic oracle, interpreting the learnerвЂ™s pronouncements can itself require considerable skill. Turn the knobs wrong, though, and the learner may spew out a torrent of gibberish or clam up in defiance. Unfortunately, in this regard Alchemy is no better than most. Writing down what you know in logic, feeding in the data, and pushingthe button is the fun part. When Alchemy returns a beautifully accurate and efficient MLN, you go down to the pub and celebrate. When it doesnвЂ™t-which is most of the time-the battle begins. Is the problem in the knowledge, the learning, or the inference? On the one hand, because of the learning and probabilistic inference, a simple MLN can do the job of a complex program. On the other, when it doesnвЂ™t work, itвЂ™s much harder to debug. The solution is to make it more interactive, able to introspect and explain its reasoning. That will take us another step closer to the Master Algorithm.. In the Land of Learning where the Data lies.. An early list of examples of machine learningвЂ™s impact on daily life can be found in вЂњBehind-the-scenes data mining,вЂќ by George John (SIGKDD Explorations, 1999), which was also the inspiration for theвЂњday-in-the-lifeвЂќ paragraphs of the prologue. Eric SiegelвЂ™s bookPredictive Analytics (Wiley, 2013) surveys a large number of machine-learning applications. The termbig data was popularized by the McKinsey Global InstituteвЂ™s 2011 reportBig Data: The Next Frontier for Innovation, Competition, and Productivity. Many of the issues raised by big data are discussed inBig Data: A Revolution That Will Change How We Live, Work, and Think, by Viktor Mayer-SchГ¶nberger and Kenneth Cukier (Houghton Mifflin Harcourt, 2013). The textbook I learned AI from isArtificial Intelligence,* by Elaine Rich (McGraw-Hill, 1983). A current one isArtificial Intelligence: A Modern Approach, by Stuart Russell and Peter Norvig (3rd ed., Prentice Hall, 2010). Nils NilssonвЂ™sThe Quest for Artificial Intelligence (Cambridge University Press, 2010) tells the story of AI from its earliest days..