Otherwise, play on any empty square.. In congenitally blind people, the visual cortex can take over other brain functions. In deaf ones, the auditory cortex does the same. Blind people can learn toвЂњseeвЂќ with their tongues by sending video images from a head-mounted camera to an array of electrodes placed on the tongue, with high voltages corresponding to bright pixels and low voltages to dark ones. Ben Underwood was a blind kid who taught himself to use echolocation to navigate, like bats do. By clicking his tongue and listening to the echoes, he could walk around without bumping into obstacles, ride a skateboard, and even play basketball. All of this is evidence that the brain uses the same learning algorithm throughout, with the areas dedicated to the different senses distinguished only by the different inputs they are connected to (e.g., eyes, ears, nose). In turn, the associative areas acquire their function by being connected to multiple sensory regions, and the вЂњexecutiveвЂќ areas acquire theirs by connecting the associative areas and motor output.. EinsteinвЂ™s general relativity was only widely accepted once Arthur Eddington empirically confirmed its prediction that the sun bends the light of distant stars. But you donвЂ™t need to wait around for new data to arrive to decide whether you can trust your learner. Rather, you take the data you have and randomly divide it into a training set, which you give to the learner, and a test set, which you hide from it and use to verify its accuracy. Accuracy on held-out data is the gold standard in machine learning. You can write a paper about a great new learning algorithm youвЂ™ve invented, but if your algorithm is not significantly more accurate than previous ones on held-out data, the paper is not publishable.. Even test-set accuracy is not foolproof. According to legend, in an early military application a simple learner detected tanks with 100 percent accuracy in both the training set and the test set, each consisting of one hundred images. Amazing-or suspicious? Turns out all the tank images were lighter than the nontank ones, and thatвЂ™s all the learner was picking up. These days we have larger data sets, but the quality of data collection isnвЂ™t necessarily better, so caveat emptor. Hard-nosed empirical evaluation played an important role in the growth of machine learning from a fledgling field into a mature one. Up to the late 1980s, researchers in each tribe mostly believed their own rhetoric, assumed their paradigm was fundamentally better, and communicated little with the other camps. Then symbolists like Ray Mooney and Jude Shavlik started to systematically compare the different algorithms on the same data sets and-surprise, surprise-no clear winner emerged. Today the rivalry continues, but there is much more cross-pollination. Having a common experimental framework and a large repository of data sets maintained by the machine-learning group at the University of California, Irvine, did wonders for progress. And as weвЂ™ll see, our best hope of creating a universal learner lies in synthesizing ideas from different paradigms.. And so on.. Consider the grandmother cell, a favorite thought experiment of cognitive neuroscientists. The grandmother cell is a neuron in your brain that fires whenever you see your grandmother, and only then. Whether or not grandmother cells really exist is an open question, but letвЂ™s design one for use in machine learning. A perceptron learns to recognize your grandmother as follows. The inputs to the cell are either the raw pixels in the image or various hardwired features of it, likebrown eyes, which takes the value 1 if the image contains a pair of brown eyes and 0 otherwise. In the beginning, all the connections from features to the neuron have small random weights, like the synapses in your brain at birth. Then we show the perceptron a series of images, some of your grandmother and some not. If it fires upon seeing an image of your grandmother, or doesnвЂ™t fire upon seeing something else, then no learning needs to happen. (If it ainвЂ™t broke, donвЂ™t fix it.) But if the perceptron fails to fire when itвЂ™s looking at your grandmother, that means the weighted sum of its inputs should have been higher, so we increase the weights of the inputs that are on. (For example, if your grandmother has brown eyes, the weight of that feature goes up.) Conversely, if the perceptron fires when it shouldnвЂ™t, we decrease the weights of the active inputs. ItвЂ™s the errors that drive the learning. Over time, the features that are indicative of your grandmother acquire high weights, and the ones that arenвЂ™t get low weights. Once the perceptron always fires upon seeing your grandmother, and only then, the learning is complete.. The problems for genetic programming do not end there. Indeed, even its successes might not be as genetic as evolutionaries would like. Take circuit design, which was genetic programmingвЂ™s emblematic success. As a rule, even relatively simple designs require an enormous amount of search, and itвЂ™s not clear how much the results owe to brute force rather than genetic smarts. To address the growing chorus of critics, Koza included in his 1992 bookGenetic Programming experiments showing that genetic programming beat randomly generating candidates on Boolean circuit synthesis problems, but the margin of victory was small. Then, at the 1995 International Conference on Machine Learning (ICML) in Lake Tahoe, California, Kevin Lang published a paper showing that hill climbing beat genetic programming on the same problems, often by a large margin. Koza and other evolutionaries had repeatedly tried to publish papers in ICML, a leading venue in the field, but to their increasing frustration they kept being rejected due to insufficient empirical validation. Already frustrated with his papers being rejected, seeing LangвЂ™s paper made Koza blow his top. On short order, he produced a twenty-three-page paper in two-column ICML format refuting LangвЂ™s conclusions and accusing the ICML reviewers of scientific misconduct. He then placed a copy on every seat in the conference auditorium. Depending on your point of view, either LangвЂ™s paper or KozaвЂ™s response was the last straw; regardless, the Tahoe incident marked the final divorce between the evolutionaries and the rest of the machine-learning community, with the evolutionaries moving out of the house. Genetic programmers started their own conference, which merged with the genetic algorithms conference to form GECCO, the Genetic and Evolutionary Computing Conference. For its part, the machine-learning mainstream largely forgot them. A saddГ©nouement, but not the first time in history that sex is to blame for a breakup.. HMMs are good for modeling sequences of all kinds, but theyвЂ™re still a far cry from the flexibility of the symbolistsвЂ™IfвЂ¦thenвЂ¦ rules, where anything can appear as an antecedent, and a ruleвЂ™s consequent can in turn be an antecedent in any downstream rule. If we allow such an arbitrary structure in practice, however, the number of probabilities we need to learn blows up. For a long time no one knew
how to square this circle, and researchers resorted to ad-hoc schemes, like attaching confidence estimates to rules and somehow combining them. If A implies B with confidence 0.8 and B implies C with confidence 0.7, then perhaps A implies C with confidence 0.8 Г— 0.7.. The inference problem. Nearest-neighbor can save lives, as Steven Johnson recounted inThe Ghost Map. In 1854, London was struck by a cholera outbreak, which killed as many as one in eight people in parts of the city. The then-prevailing theory that cholera was caused byвЂњbad airвЂќ did nothing to prevent its spread. But John Snow, a physician who was skeptical of the theory, had a better idea. He marked on a map of London the locations of all the known cases of cholera and divided the map into the regions closest to each public water pump. Eureka: nearly all deaths were in the вЂњmetro areaвЂќ of one particular pump, located on Broad Street in the Soho district. Inferring that the water in that well was contaminated, Snow convinced the locals to disable the pump, and the epidemic died out. This episode gave birth to the science of epidemiology, but itвЂ™s also the first success of the nearest-neighbor algorithm-almost a century before its official invention.. These examples are called support vectors because theyвЂ™re the vectors that вЂњhold upвЂќ the frontier: remove one, and a section of the frontier slides to a different place. You may also notice that the frontier is a jagged line, with sudden corners that depend on the exact location of the examples. Real concepts tend to have smoother borders, whichmeans nearest-neighborвЂ™s approximation is probably not ideal. But with SVMs, we can learn smooth frontiers, more like this:. Suppose the entities in RobbyвЂ™s world fall into five clusters (people, furniture, toys, food, and animals), but we donвЂ™t know which things belong to which clusters. This is the type of problem that Robby faces when we switch him on. One simple option for sorting entities into clusters is to pick five random objects as the cluster prototypes and then compare each entity with each prototype and assign it to the most similar prototypeвЂ™s cluster. (As in analogical learning, the choice of similarity measure is important. If the attributes are numeric, it can be as simple as Euclidean distance, but there are many other options.) We now need to update the prototypes. After all, a clusterвЂ™s prototype is supposed to be the average of its members, and although that was necessarily the case when each cluster had only one member, it generally wonвЂ™t be after we have added a bunch of new members to each cluster. So foreach cluster, we compute the average properties of its members and make that the new prototype. At this point, we need to update the cluster memberships again: since the prototypes have moved, the closest prototype to a given entity may also have changed. LetвЂ™s imagine the prototype of one category was a teddy bear and the prototype of another was a banana. Perhaps on our first run we grouped an animal cracker with the bear, but on the second we grouped it with the banana. An animal cracker initially looked like a toy, but now it looks more like food. Once I reclassify animal crackers in the banana group, perhaps the prototypical item for that group also changes, from a banana to a cookie. This virtuous cycle, with entities assigned to better and better clusters, continues until the assignment of entities to clusters doesnвЂ™t change (and therefore neither do the cluster prototypes).. Symbolists combine different pieces of knowledge on the fly, in the same way that mathematicians combine axioms to prove theorems. This contrasts sharply with neural networks and other models with a fixed structure. Alchemy does it using logic, as symbolists do, but with a twist. To prove a theorem in logic, you need to find only one sequence of axiom applications that produces it. Because Alchemy reasons probabilistically, it does more: it finds multiple sequences of formulas that lead to the theorem or its negation and weighs them to compute the theoremвЂ™s probability of being true. This way it can reason not just about mathematical universals, but about whether вЂњthe presidentвЂќ in a news story means вЂњBarack Obama,вЂќ or what folder an e-mail should be filed in. The symbolistsвЂ™ master algorithm, inverse deduction, postulates new logical rules needed to serve as steps between the data and a desired conclusion. Alchemy introduces new rules by hill climbing, starting with the initial rules and constructing rules that, combined with the initial ones and the data, make the conclusions more likely.. A company like this could quickly become one of the most valuable in the world. As Alexis Madrigal of theAtlantic points out, today your profile can be bought for half a cent or less, but the value of a user to the Internet advertising industry is more like $1,200 per year. GoogleвЂ™s sliver of your data is worth about $20, FacebookвЂ™s $5, and so on. Add to that all the slivers that no one has yet, and the fact that the whole is more than the sum of the parts-a model of you based on all your data is much better than a thousand models based on a thousand slivers-and weвЂ™relooking at easily over a trillion dollars per year for an economy the size of the United States. It doesnвЂ™t take a large cut of that to make a Fortune 500 company. If you decide to take up the challenge and wind up becoming a billionaire, remember where you first got the idea.. The trajectory weвЂ™re on is not a singularity but a phase transition. Its critical point-the Turing point-will come when machine learning overtakes the natural variety. Natural learning itself has gone through three phases: evolution, the brain, and culture. Each is a product of the previous one, and each learns faster. Machine learning is the logical next stage of this progression. Computer programs are the fastest replicators on Earth: copying them takes only a fraction of a second. But creating them is slow, if it has to be done by humans. Machine learning removes that bottleneck, leaving a final one: thespeed at which humans can absorb change. This too will eventually be removed, but not because weвЂ™ll decide to hand things off to our вЂњmind children,вЂќ as Hans Moravec calls them, and go gently into the good night. Humans are not a dying twig on the tree of life. On the contrary, weвЂ™re about to start branching..