These seemingly magical technologies work because, at its core, machine learning is about prediction: predicting what we want, the results of our actions, how to achieve our goals, how the world will change. Once upon a time we relied on shamans and soothsayers for this, but they were much too fallible. ScienceвЂ™s predictions are more trustworthy, but they are limited to what we can systematically observe and tractably model. Big data and machine learning greatly expand that scope. Some everyday things can be predicted by the unaided mind, from catching a ball to carrying on a conversation. Some things,try as we might, are just unpredictable. For the vast middle ground between the two, thereвЂ™s machine learning.. Wow.. Technology trends come and go all the time. WhatвЂ™s unusual about machine learning is that, through all these changes, through boom and bust, it just keeps growing. Its first big hit was in finance, predicting stock ups and downs, starting in the late 1980s. The next wave was mining corporate databases, which by the mid-1990s were starting to grow quite large, and in areas like direct marketing, customer relationship management, credit scoring, and fraud detection. Then came the web and e-commerce, where automated personalization quickly became de rigueur. When the dot-com bust temporarily curtailed that, the use of learning for web search and ad placement took off. For better or worse, the 9/11 attacks put machine learning in the front line of the war on terror. Web 2.0 brought a swath of new applications, from mining social networks to figuring out what bloggers are saying about your products. In parallel, scientists of all stripes were increasingly turning to large-scale modeling, with molecular biologists and astronomers leading the charge. The housing bust barely registered; its main effect was a welcome transfer of talent from Wall Street to Silicon Valley. In 2011, the вЂњbig dataвЂќ meme hit, putting machine learning squarely in the center of the global economyвЂ™s future. Today, there seems to be hardly an area of human endeavor untouched by machine learning, including seemingly unlikely candidates like music, sports, and wine tasting.. [РљР°СЂС‚РёРЅРєР°: pic_8.jpg]. The higher an inputвЂ™s weight, the stronger the corresponding synapse. The cell body adds up all the weighted inputs, and the axon applies a step function to the result. The axonвЂ™s box in the diagram shows the graph of a step function: 0 for low values of the input, abruptly changing to 1 when the input reaches the threshold.. [РљР°СЂС‚РёРЅРєР°: pic_9.jpg]. Perhaps connectomics is overkill. Some connectionists have been overheard claiming that backprop is the Master Algorithm and we just need to scale it up. But symbolists pour scorn on this notion. They point to a long list of things that humans can do but neural networks canвЂ™t. Take commonsense reasoning. It involves combining pieces of information that may have never been seen together before. Did Mary eat a shoe for lunch? No, because Mary is a person, people only eat edible things, and shoes are not edible. Symbolic systems have no trouble with this-they just chain the relevant rules-but multilayer perceptrons canвЂ™t do it; once theyвЂ™re done learning, they just compute the same fixed function over and over again. Neural networks are not compositional, and compositionality is a big part of human cognition. Another big issue is that humans-and symbolic models like sets of rules and decision trees-can explain their reasoning, while neural networks are big piles of numbers that no one can understand.. One solution, left as an exercise by Pearl in his book on Bayesian networks, is to pretend the graph has no loops and just keep propagating probabilities back and forth until they converge. This is known as loopy belief propagation, both because it works on graphs with loops and because itвЂ™s a crazy idea. Surprisingly, it turns out to work quite well in many cases. For instance, itвЂ™s a state-of-the art method for wireless communication, with the random variables being the bits in the message, encoded in a clever way. But loopy belief propagation can also converge to the wrong answers or oscillate forever. Another solution, which originated in physics but was imported into machine learning and greatly extended by Michael Jordan and others, is to approximate an intractable distribution with a tractable one and optimize the latterвЂ™s parameters to make it as close as possible to the former.. It gets even worse. Nearest-neighbor is based on finding similar objects, and in high dimensions, the notion of similarity itself breaks down. Hyperspace is like the Twilight Zone. The intuitions we have from living in three dimensions no longer apply, and weird and weirder things start to happen. Consider an orange: a tasty ball of pulp surrounded by a thin shell of skin. LetвЂ™s say 90 percent of the radius of an orange is occupied by pulp, and the remaining 10 percent by skin. That means 73 percent of the volume of the orange is pulp (0.93). Now consider a hyperorange: still with 90 percent of the radius occupied by pulp, but in a hundred dimensions, say. The pulp has shrunk to only about three thousandths of a percent of the hyperorangeвЂ™s volume (0.9100). The hyperorange is all skin, and youвЂ™ll never be done peeling it!. Is there anything analogy canвЂ™t do? Not according to Douglas Hofstadter, cognitive scientist and author ofGГ¶del, Escher, Bach: An Eternal Golden Braid. Hofstadter, who looks a bit like the GrinchвЂ™s good twin, is probably the worldвЂ™s best-known analogizer. In their bookSurfaces and Essences: Analogy as the Fuel and Fire of Thinking, Hofstadter and his collaborator Emmanuel Sander
argue passionately that all intelligent behavior reduces to analogy. Everything we learn or discover, from the meaning of everyday words likemother andplay to the brilliant insights of geniuses like Albert Einstein andГ‰variste Galois, is the result of analogy in action. When little Tim sees women looking after other children like his mother looks after him, he generalizes the concept вЂњmommyвЂќ to mean anyoneвЂ™s mommy, not just his. That in turn is a springboard for understanding things like вЂњmother shipвЂќand вЂњMother Nature.вЂќ EinsteinвЂ™s вЂњhappiest thought,вЂќ out of which grew the general theory of relativity, was an analogy between gravity and acceleration: if youвЂ™re in an elevator, you canвЂ™t tell whether your weight is due to one or the other because their effects are the same. We swim in a vast ocean of analogies, which we both manipulate for our ends and are unwittingly manipulated by. Books have analogies on every page (like the title of this section, or the previous oneвЂ™s).GГ¶del, Escher, Bach is an extended analogy between GГ¶delвЂ™s theorem, EscherвЂ™s art, and BachвЂ™s music. If the Master Algorithm is not analogy, it must surely be something like it.. Recall that a Markov network is defined by a weighted sum of features, much like a perceptron. Suppose we have a collection of photos of people. We pick a random one and compute features of it likeThe person has gray hair, The person is old, The person is a woman, and so on. In a perceptron, we pass the weighted sum of these features through a threshold to decide whether, say, the person is your grandmother or not. In a Markov network, we do something very different (at least at first sight): we exponentiate the weighted sum, turning it into a product of factors, and this product is the probability of choosing that particular picture from the collection, regardless of whether your grandmother is in it. If you have many pictures of old people, the weight of that feature goes up. If most of them are of men, the weight ofThe person is a woman goes down. The features can be anything we want, making Markov networks a remarkably flexible way to represent probability distributions.. One of AlchemyвЂ™s largest applications to date was to learn a semantic network (or knowledge graph, as Google calls it) from the web. A semantic network is a set of concepts (like planets and stars) and relations among those concepts (planets orbit stars). Alchemy learned over a million such patterns from factsextracted from the web (e.g., Earth orbits the sun). It discovered concepts like planet all by itself. The version we used was more advanced than the basic one IвЂ™ve described here, but the essential ideas are the same. Various research groups have used Alchemy or their own MLN implementations to solve problems in natural language processing, computer vision, activity recognition, social network analysis, molecular biology, and many other areas.. Of course, robot armies also raise a whole different specter. According to Hollywood, the future of humanity is to be snuffed out by a gargantuan AI and its vast army of machine minions. (Unless, of course, a plucky hero saves the day in the last five minutes of the movie.) Google already has the gargantuan hardware such an AI would need, and itвЂ™s recently acquired an army of robotics startups to go with it. If we drop the Master Algorithm into its servers, is it game over for humanity? Why yes, of course. ItвЂ™s time to reveal my true agenda, with apologies to Tolkien:. John KozaвЂ™sGenetic Programming* (MIT Press, 1992) is the key reference on this paradigm. An evolved robot soccer team is described inвЂњEvolving teamDarwin United,вЂќ* by David Andre and Astro Teller, inRoboCup-98: Robot Soccer World Cup II, edited by Minoru Asada and Hiroaki Kitano (Springer, 1999).Genetic Programming III,* by John Koza, Forrest Bennett III, David Andre, and Martin Keane (Morgan Kaufmann, 1999), includes many examples of evolved electronic circuits. Danny Hillis argues that parasites are good for evolution inвЂњCo-evolving parasites improve simulated evolution as an optimization procedureвЂќ* (Physica D, 1990). Adi Livnat, Christos Papadimitriou, Jonathan Dushoff, and Marcus Feldman propose that sex optimizes mixability inвЂњA mixability theory of the role of sex in evolutionвЂќ* (Proceedings of the National Academy of Sciences, 2008). Kevin LangвЂ™s paper comparing genetic programming and hill climbing is вЂњHill climbing beats genetic search on a Boolean circuit synthesis problem of KozaвЂ™sвЂќ* (Proceedings of the Twelfth International Conference on Machine Learning, 1995). KozaвЂ™s reply is вЂњA response to the ML-95 paper entitledвЂ¦вЂќ* (unpublished; online at www.genetic-programming.com/jktahoe24page.html).. вЂњCancer: The march on malignancyвЂќ (Nature supplement, 2014) surveys the current state of the war on cancer.вЂњUsing patient data for personalized cancer treatments,вЂќ by Chris Edwards (Communications of the ACM, 2014), describes the early stages of what could grow into CanceRx.вЂњSimulating a living cell,вЂќ by Markus Covert (Scientific American, 2014), explains how his group built a computer model of a whole infectious bacterium.вЂњBreakthrough Technologies 2015: Internet of DNA,вЂќ by Antonio Regalado (MIT Technology Review, 2015), reports on the work of the Global Alliance for Genomics and Health. Cancer Commons is described inвЂњCancer: A Computational Disease that AI Can Cure,вЂќ by Jay Tenenbaum and Jeff Shrager (AI Magazine, 2011)..