If youвЂ™re a student of any age-a high schooler wondering what to major in, a college undergraduate deciding whether to go into research, or a seasoned professional considering a career change-my hope is that this book will spark in you an interest in this fascinating field. The world has a dire shortage of machine-learning experts, and if you decide to join us, you can look forward to not only exciting times and material rewards but also a unique opportunity to serve society. And if youвЂ™re already studying machine learning, I hope the book will help you get the lay of the land; if in your travels you chance upon the Master Algorithm, that alone makes it worth writing.. Some learners learn knowledge, and some learn skills.вЂњAll humans are mortalвЂќ is a piece of knowledge. Riding a bicycle is a skill. In machine learning, knowledge is often in the form of statistical models, because most knowledge is statistical: all humans are mortal, but only 4 percent are Americans. Skills are often in the form of procedures: ifthe road curves left, turn the wheel left; if a deer jumps in front of you, slam on the brakes. (Unfortunately, as of this writing GoogleвЂ™s self-driving cars still confuse windblown plastic bags with deer.) Often, the procedures are quite simple, and itвЂ™s the knowledge at their core thatвЂ™s complex. If you can tell which e-mails are spam, you know which ones to delete. If you can tell how good a board position in chess is, you know which move to make (the one that leads to the best position).. Nevertheless, physics is unique in its simplicity. Outside physics and engineering, the track record of mathematics is more mixed. Sometimes itвЂ™s only reasonably effective, and sometimes its models are too oversimplified to be useful. This tendency to oversimplify stems from the limitations of the human mind, however, not from the limitations of mathematics. Most of the brainвЂ™s hardware (or rather, wetware) is devoted to sensing and moving, and to do math we have to borrow parts of it that evolved for language. Computers have no such limitations and can easily turn big data into very complex models. Machine learning is what you get when the unreasonable effectiveness of mathematics meets the unreasonable effectiveness of data. Biology and sociology will never be as simple as physics, but the method by which we discover their truths can be.. OK, itвЂ™s time to come clean: the Master Algorithm is the equationU(X) = 0. Not only does it fit on a T-shirt; it fits on a postage stamp. Huh?U(X) = 0 just says that some (possibly very complex) functionU of some (possibly very complex) variableX is equal to 0. Every equation can be reduced to this form; for example,F = ma is equivalent toFвЂ“ma = 0, so if you think ofFвЂ“ma as a functionU ofF, voilГ :U(F) = 0. In general,X could be any input andU could be any algorithm, so surely the Master Algorithm canвЂ™t be any more general than this; and since weвЂ™re looking for the most general algorithm we can find, this must be it. IвЂ™m just kidding, of course, but this particular failed candidate points to a real danger in machine learning: coming up with a learner thatвЂ™s so general, it doesnвЂ™t haveenough content to be useful.. A genetic algorithm is like the ringleader of a group of gamblers, playing slot machines in every casino in town at the same time. Two schemas compete with each other if they include the same bits and differ in at least one of them, like *10 and *11, andn competing schemas are liken slot machines. Every set of competing schemas is a casino, and the genetic algorithm simultaneously figures out the winning machine in every casino, following the optimal strategy of playing the better-seeming machines with exponentially increasing frequency. Pretty smart.. The reason lazy learners are a lot smarter than they seem is that their models, although implicit, can in fact be extremely sophisticated. Consider the extreme case where we have only one example of each class. For instance, weвЂ™d like to guess where the border between two countries is, but all we know is their capitalsвЂ™ locations. Most learners would be stumped, but nearest-neighbor happily guesses that the border is a straight line lying halfway between the two cities:. The same idea of forming a local model rather than a global one applies beyond classification. Scientists routinely use linear regression to predict continuous variables, but most phenomena are not linear. Luckily, theyвЂ™re locally linear because smooth curves are locally well approximated by straight lines. So if instead of trying to fit a straight line to all the data, you just fit it to the points near the query point, you now have a very powerful nonlinear regression algorithm. Laziness pays. If Kennedy had needed a complete theory of international relations to decide what to do about the Soviet missiles in Cuba, he would have been in trouble. Instead, he saw an analogy between that crisis and the outbreak of World War I, and that analogy guided him to the right decisions.. Social networks aside, the killer app of relational learning is understanding how living cells work. A cell is a complex metabolic network with genes coding for proteins that regulate other genes, long interlocking chains of chemical reactions, and products migrating from one organelle to another. Independent entities, doing their work in isolation, are nowhere to be seen. A cancer drug must disrupt cancer cellsвЂ™ workings without interfering with normal onesвЂ™. If we have an accurate relational model of both, we can try many different drugsin silico, letting the model infer their good and bad effects and keeping only the best ones to tryin vitro and finallyin vivo.. Out of many models, one. Your head is spinning. You go outside to the balcony. The sun has risen over the city. You gaze out over the rooftops to the countryside beyond. Forests of servers stretch away in all directions, humming quietly, waiting for the Master Algorithm. Convoys move along the roads, carrying gold from the data mines. Far to the west, the land gives way to a sea of information, dotted with ships. You look up at the flag of the Master Algorithm. You can now clearly see the inscription inside the five-pointed star:. An MLN is just a set of logical formulas and their weights. When applied to a particular set of entities, it defines a Markov network over their possible states. For example, if the entities are Alice and Bob, a possible state is that Alice and Bob are friends, Alice has the flu, and so does Bob. LetвЂ™s suppose the MLN has two formulas:Everyone has the flu andIf someone has the flu, so do their friends. In standard logic, this would be a pretty useless pair of statements: the first would rule out any state with even a single healthy person, and the second would be redundant. But in an MLN, the first formula just means that thereвЂ™s a featureX has the flu for every person X, with the same weight as the formula. If people are likely to have the flu, the formula will have a high weight, and so will the corresponding features. A state with many healthy people is less probable than one with few, but not impossible. And because of the second formula, a state where someone has the flu and their friends donвЂ™t is less probable than one where healthy and infected people fall into separate clusters of friends.. The sobering (or perhaps reassuring) thought is that no learner in the world today has access to all this data (not even the NSA), and even if it did, it wouldnвЂ™t know how to turn it into a real likeness of you. But suppose you took all your data and gave it to the-real, future-Master Algorithm, already seeded with everything we could teach it about human life. It would learn a model of you, and you could carry that model in a thumb drive in your pocket, inspect it at will, and use it for everything you pleased. It would surely be a wonderful tool for introspection, like looking at yourself in the mirror, but it would be a digital mirror that showed not just your looks but all things observable about you-a mirror that could come alive and conversewith you. What would you ask it? Some of the answers you might not like, but that would be all the more reason to ponder them. And some would give you new ideas, new directions. The Master AlgorithmвЂ™s model of you might even help you become a better person.. ItвЂ™s not hard to state general principles like military necessity, proportionality, and sparing civilians. But thereвЂ™s a gulf between them and concrete actions, which the soldierвЂ™s judgment has to bridge. AsimovвЂ™s three laws of robotics quickly run into trouble when robots try to apply them in practice, as his stories memorably illustrate. General principles are usually contradictory, if not self-contradictory, and they have to be lest they turn all shades of gray into black and white. When does military necessity outweigh sparing civilians? There is no universal answer and no way to program a computer with all the eventualities. Machine learning, however, provides an alternative. First, teach the robot to recognize the relevant concepts, for example with data sets of situations where civilians were and were not spared, armed response was and was not proportional, and so on. Then give it a code of conduct in the form of rules involving these concepts. Finally, let the robot learn how to apply the code by observing humans: the soldier opened fire in this case but not in that case. By generalizing from these examples, the robot can learn an end-to-end model of ethical decisionmaking, in the form of, say, a large MLN. Once the robotвЂ™s decisions agree with a humanвЂ™s as often as one human agrees with another, the training is complete, meaning the model is ready for download into thousands of robot brains. Unlike humans, robots donвЂ™t lose their heads in the heat of combat. If a robot malfunctions, the manufacturer is responsible. If it makes a wrong call, its teachers are.. One for the Dark AI on its dark throne,. Modeling and Reasoning with Bayesian Networks,* by Adnan Darwiche (Cambridge University Press, 2009), explains the main algorithms for inference in Bayesian networks. The January/February 2000 issue* ofComputing in Science and Engineering, edited by Jack Dongarra and Francis Sullivan, has articles on the top ten algorithms of the twentieth century, including MCMC.вЂњStanley: The robot that won the DARPA Grand Challenge,вЂќ by Sebastian Thrun et al. (Journal of Field Robotics, 2006), explains how the eponymous self-driving car works.вЂњBayesian networks for data mining,вЂќ* by David Heckerman (Data Mining and Knowledge Discovery, 1997), summarizes the Bayesian approach to learning and explains how to learn Bayesian networks from data.вЂњGaussian processes: A replacement for supervised neural networks?,вЂќ* by David MacKay (NIPS tutorial notes, 1997; online at www.inference.eng.cam.ac.uk/mackay/gp.pdf), gives a flavor of how the Bayesians co-opted NIPS..