Last but not least, if you have an appetite for wonder, machine learning is an intellectual feast, and youвЂ™re invited-RSVP!. Learning algorithms are the matchmakers: they find producers and consumers for each other, cutting through the information overload. If theyвЂ™re smart enough, you get the best of both worlds: the vast choice and low cost of the large scale, with the personalized touch of the small. Learners are not perfect, and the last step of the decision is usually still for humans to make, but learners intelligently reduce the choices to somethinga human can manage.. If youвЂ™re a member of the Sierra Club and read science-fiction books, youвЂ™ll like Avatar.. The algorithm that evolved these robots was invented by Charles Darwin in the nineteenth century. He didnвЂ™t think of it as an algorithm at the time, partly because a key subroutine was still missing. Once James Watson and Francis Crick provided it in 1953, the stage was set for the second coming of evolution:in silico instead ofin vivo, and a billion times faster. Its prophet was a ruddy-faced, perpetually grinning midwesterner by the name of John Holland.. Stripped down to its bare essentials (no giggles, please), sexual reproduction consists of swapping material between chromosomes from the mother and father, a process called crossing over. This produces two new chromosomes, one of which consists of the motherвЂ™s chromosome up to the crossover point and the fatherвЂ™s thereafter, and the other one is the opposite:. We can get even fancier by allowing rules for intermediate concepts to evolve, and then chaining these rules at performance time. For example, we could evolve the rulesIf the e-mail contains the word loanthen itвЂ™s a scam andIf the e-mail is a scam then itвЂ™s spam. Since a ruleвЂ™s consequent is no longer alwaysspam, this requires introducing additional bits in rule strings to represent their consequents. Of course, the computer doesnвЂ™t literally use the wordscam; it just comes up with some arbitrary bit string to represent the concept, but thatвЂ™s good enough for our purposes. Sets of rules like this, which Holland called classifier systems, are one of the workhorses of the machine-learning tribe he founded: the evolutionaries. Like multilayer perceptrons, classifier systems face the credit-assignment problem-what is the fitness of rules for intermediate concepts?-and Holland devised the so-called bucket brigade algorithm to solve it. Nevertheless, classifier systems are much less widely used than multilayer perceptrons.. In contrast to the connectionists and evolutionaries, symbolists and Bayesians do not believe in emulating nature. Rather, they want to figure out from first principles what learners should do-and that includes us humans. If we want to learn to diagnose cancer, for example, itвЂ™s not enough to say вЂњthis is how nature learns; letвЂ™s do the same.вЂќ ThereвЂ™s too much at stake. Errors cost lives. Doctors should diagnose in the most foolproof way they can, with methods similar to those mathematicians use to prove theorems, or as close to that as they can manage, given that itвЂ™s seldom possible to be that rigorous. They need to weigh the evidence to minimize the chances of a wrong diagnosis; or more precisely, so that the costlier an error is, the less likely they are to make it. (For example, failing to find a tumor thatвЂ™s really there is potentially much worse than inferring one that isnвЂ™t.) They need to makeoptimal decisions, not just decisions that seem good.. ThereвЂ™s a serpent in this Eden, of course. ItвЂ™s called the curse of dimensionality, and while it affects all learners to a greater or lesser degree, itвЂ™s particularly bad for nearest-neighbor. In low dimensions (like two or three), nearest-neighbor usually works quite well. But as the number of dimensions goes up, things fall apart pretty quickly. ItвЂ™s not uncommon today to have thousands or even millions of attributes to learn from. For an e-commerce site trying to learn your preferences, every click you make is an attribute. So is every word on a web page, and every pixel on an image. But even with just tens or hundreds of attributes, chances are nearest-neighbor is already in trouble. The first problem is that most attributes are irrelevant: you may know a million factoids about Ken, but chances are only a few of them have anything to say about (for example) his risk of getting lung cancer. And while knowing whether he smokes is crucial for making that particular prediction, itвЂ™s probably not much help in deciding whether heвЂ™ll enjoy seeingGravity. Symbolist methods, for one, are fairly good at disposing of irrelevant attributes. If an attribute has no information about the class, itвЂ™s just never included in the decision tree or rule set. But nearest-neighbor is hopelessly confused by irrelevant attributes because they all contribute to the similarity between examples. With enough irrelevant attributes, accidental similarity in the irrelevant dimensions swamps out meaningfulsimilarity in the important ones, and nearest-neighbor becomes no better than random guessing.. Like human memory, relational learning weaves a rich web of associations. It connects percepts, which a robot like Robby can acquire by clustering and dimensionality reduction, with skills, which he can learn by reinforcement and chunking, and with the higher-level knowledge that comes from reading, going to school, and interacting with humans. Relational learning is the last piece of the puzzle, the final ingredient we need for our alchemy. And now itвЂ™s time to repair to the lab and transmute all these elements into the Master Algorithm.. The Master Algorithm. You rack your brains for a solution, but the more you try, the harder it gets. Perhaps unifying logic and probability is just beyond human ability. Exhausted, you fall asleep. A deep growl jolts you awake. The hydra-headed complexity monster pounces on you, jaws snapping, but you duck at the last moment. Slashing desperately at the monster with the sword of learning, the only one that can slay it, you finally succeed in cutting off all its heads. Before it can grow new ones, you run up the stairs.. Of course, donвЂ™t be deceived by the simple MLN above for predicting the spread of flu. Picture instead an MLN for diagnosing and curing cancer. The MLN represents a probability distribution over the states of a cell. Every part of the cell, every organelle, every metabolic pathway, every gene and protein is anentity in the MLN, and the MLNвЂ™s formulas encode the dependencies between them. We can ask the MLN, вЂњIs this cell cancerous?вЂќ and probe it with different drugs and see what happens. We donвЂ™t have an MLN like this yet, but later in this chapter IвЂ™ll envisage how it might come about.. ItвЂ™s natural to worry about intelligent machines taking over because the only intelligent entities we know are humans and other animals, and they definitely have a will of their own. But there is no necessary connection between intelligence and autonomous will; or rather, intelligence and will may not inhabit the same body, provided there is a line of control between them. InThe Extended Phenotype, Richard Dawkins shows how nature is replete with examples of an animalвЂ™s genes controlling more than its own body, from cuckoo eggs to beaver dams. Technology is the extended phenotype of man. This means we can continue to control it even if it becomes far more complex than we can understand.. The termsingularity comes from mathematics, where it denotes a point at which a function becomes infinite. For example, the function 1/x has a singularity whenx is 0, because 1 divided by 0 is infinity. In physics, the quintessential example of a singularity is a black hole: a point of infinite density, where a finite amount of matter is crammed into infinitesimal space. The only problem with singularities is that they donвЂ™t really exist. (When did you last divide a cake among zero people, and each one got an infinite slice?) In physics, if a theory predicts something is infinite, somethingвЂ™s wrong with the theory. Case in point, general relativity presumably predicts that black holes have infinite density because it ignores quantum effects. Likewise, intelligence cannot continue to increase forever. Kurzweil acknowledges this, but points to a series of exponential curves in technology improvement (processor speed, memory capacity, etc.) and argues that the limits to this growth are so far away that we need not concern ourselves with them.. If youвЂ™d like to learn more about machine learning in general, one good place to start is online courses. Of these, the closest in content to this book is, not coincidentally, the one I teach (www.coursera.org/course/machlearning). Two other options are Andrew NgвЂ™s course (www.coursera.org/course/ml)and Yaser Abu-MostafaвЂ™s (http://work.caltech.edu/telecourse.html). The next step is to read a textbook. The closest to this book, and one of the most accessible, is Tom MitchellвЂ™sMachine Learning* (McGraw-Hill, 1997). More up-to-date, but also more mathematical, are Kevin MurphyвЂ™sMachine Learning: A Probabilistic Perspective* (MIT Press, 2012), Chris BishopвЂ™sPattern Recognition and Machine Learning* (Springer, 2006), andAn Introduction to Statistical Learning with Applications in R,* by Gareth James, Daniela Witten, Trevor Hastie, and Rob Tibshirani (Springer, 2013). My articleвЂњA few useful things to know about machine learningвЂќ (Communications of the ACM, 2012) summarizes some of theвЂњfolk knowledgeвЂќ of machine learning that textbooks often leave implicit and was one of the starting points for this book. If you know how to program and are itching to give machine learning a try, you can start from a number of open-source packages, such as Weka (www.cs.waikato.ac.nz/ml/weka).The two main machine-learning journals areMachine Learning and theJournal of Machine Learning Research. Leading machine-learning conferences, with yearly proceedings, include the International Conference on Machine Learning, the Conference on Neural Information Processing Systems, and the International Conference on Knowledge Discovery and Data Mining. A large number of machine-learning talks are available on http://videolectures.net. The www.KDnuggets.com website is a one-stop shop for machine-learning resources, and you can sign up for its newsletter to keep up-to-date with the latest developments..