Paradoxically, even as they open new windows on nature and human behavior, learning algorithms themselves have remained shrouded in mystery. Hardly a day goes by without a story in the media involving machine learning, whether itвЂ™s AppleвЂ™s launch of the Siri personal assistant, IBMвЂ™s Watson beating the humanJeopardy! champion, Target finding out a teenager is pregnant before her parents do, or the NSA looking for dots to connect. But in each case the learning algorithm driving the story is a black box. Even books on big data skirt around what really happens when the computer swallows all those terabytes and magically comes up with new insights. At best, weвЂ™re left with the impression that learning algorithms just find correlations between pairs of events, such as googling вЂњflu medicineвЂќ and having the flu. But finding correlations is to machine learning no more than bricks are to houses, and people donвЂ™t live in bricks.. All knowledge-past, present, and future-can be derived from data by a single, universal learning algorithm.. Of course, the Master Algorithm has at least as many skeptics as it has proponents. Doubt is in order when something looks like a silver bullet. The most determined resistance comes from machine learningвЂ™s perennial foe: knowledge engineering. According to its proponents, knowledge canвЂ™t be learned automatically; it must be programmed into the computer by human experts. Sure, learners can extract some things from data, but nothing youвЂ™d confuse withreal knowledge. To knowledge engineers, big data is not the new oil; itвЂ™s the new snake oil.. Accuracy you can believe in. The first statement is a fact about Socrates, and the second is a general rule about humans. What follows? That Socrates is mortal, of course, by applying the rule to Socrates. In inductive reasoning we start instead with the initial and derived facts, and look for a rule that would allow us to infer the latter from the former:. Robotic Park is a massive robot factory surrounded by ten thousand square miles of jungle, urban and otherwise. Ringing that jungle is the tallest, thickest wall ever built, bristling with sentry posts, searchlights, and gun turrets. The wall has two purposes: to keep trespassers out and the parkвЂ™s inhabitants-millions of robots battling for survival and control of the factory-within. The winning robots get to spawn, their reproduction accomplished by programming the banks of 3-D printers inside. Step-by-step, the robots become smarter, faster-and deadlier. Robotic Park is run by the US Army, and its purpose is to evolve the ultimate soldier.. In this regard, genetic algorithms are a lot like selective breeding. Darwin openedThe Origin of Species with a discussion of it, as a stepping-stone to the more difficult concept of natural selection. All the domesticated plants and animals we take for granted today are the result of selecting and mating, generation after generation, the organisms that best served our purposes: the corn with the largest corncobs, the sweetest fruit trees, the shaggiest sheep, the hardiest horses. Genetic algorithms do the same, except they breed programs instead of living creatures, and a generation is a few seconds of computer time instead of a creatureвЂ™s lifetime.. Of all the possible genomes, very few correspond to viable organisms. The typical fitness landscape thus consists of vast flatlands with occasional sharp peaks, making evolution very hard. If you start out blindfolded in Kansas, you have no idea which way the Rockies lie, and youвЂ™ll wander around for a long time before you bump into their foothills and start climbing. But if you combine evolution with neural learning, something interesting happens. If youвЂ™re on flat ground, but not too far from the foothills, neural learning can get you there, and the closer you are tothe foothills, the more likely it will. ItвЂ™s like being able to scan the horizon: it wonвЂ™t help you in Wichita, but in Denver youвЂ™ll see the Rockies in the distance and head that way. Denver now looks a lot fitter than it did when you were blindfolded. The net effect is to widen the fitness peaks, making it possible for you to find your way to them from previously very tough places, like point A in this graph:. HereвЂ™s the crucial point: Bob calling depends onBurglary andEarthquake, but only throughAlarm. BobвЂ™s call isconditionally independent
ofBurglary andEarthquake givenAlarm, and so is ClaireвЂ™s. If the alarm doesnвЂ™t go off, your neighbors sleep soundly, and the burglar proceeds undisturbed. Also, Bob and Claire are independent givenAlarm. Without this independence structure, youвЂ™d need to learn 25 = 32 probabilities, one for each possible state of the five variables. (Or 31, if youвЂ™re a stickler for details, since the last one can be left implicit.) With the conditional independencies, all you need is 1 + 1 + 4 + 2 + 2 = 10, a savings of 68 percent. And thatвЂ™s just in this tiny example; with hundreds or thousands of variables, the savings would be very close to 100 percent.. Markov weighs the evidence. Whether itвЂ™s data pouring into RobbyвЂ™s brain through his senses or the click streams of millions of Amazon customers, grouping a large number of entities into a smaller number of clusters is only half the battle. The other half is shortening the description of each entity. The very first picture of Mom that Robby sees comprises perhaps a million pixels, each with its own color, but you hardly need a million variables to describe a face. Likewise, each thing you click on at Amazon provides an atom of information about you, but what Amazon would really like to know is your likes and dislikes, not your clicks. The former, which are fairly stable, are somehow immanent in the latter, which grow without limit as you use the site. Little by little, all those clicks should add up to a picture of your taste, in the same way that all those pixels add up to a picture of your face. The question is how todo the adding.. Suppose we zoom out from Palo Alto, and I give you the GPS coordinates of the main cities in the Bay Area:. The world has parts, and parts belong to classes: combining these two gives us most of what we need to make inference in Alchemy tractable. We can learn the worldвЂ™s MLN by breaking it into parts and subparts, such that most interactions are between subparts of the same part, and then grouping the parts into classes and subclasses. If the world is a Lego toy, we can break it up into individual bricks, remembering which attaches to which, and group the bricks by shape and color. If the world is Wikipedia, we can extract the entities it talks about, group them into classes, and learn how classes relate to each other. Then if someone asks us вЂњIs Arnold Schwarzenegger an action star?вЂќ we can answer yes, because heвЂ™s a star and heвЂ™s in action movies. Step-by-step, we can learn larger and larger MLNs, until weвЂ™re doing what a friend of mine at Google calls вЂњplanetary-scale machine learningвЂќ: modeling everyone in the world at once, with data continually streaming in and answers streaming out.. Clearly, without machine learning-programs that design programs-the Singularity cannot happen. We also need sufficiently powerful hardware, but thatвЂ™s coming along nicely. WeвЂ™ll reach the Turing point soon after we invent the Master Algorithm. (IвЂ™m willing to bet Kurzweil a bottle of Dom PГ©rignon that this will happen before we reverse engineer the brain, his method of choice for bringing about human-level AI.)Pace Kurzweil, this will not, however, lead to the Singularity. It will lead to something much more interesting.. If this book whetted your appetite for machine learning and the issues surrounding it, youвЂ™ll find many suggestions in this section. Its aim is not to be comprehensive but to provide an entrance to machine learningвЂ™s garden of forking paths (as Borges put it). Wherever possible, I chose books and articles appropriate for the general reader. Technical publications, which require at least some computational, statistical, or mathematical background, are marked with an asterisk (*). Even these, however, often have large sections accessible to the general reader. I didnвЂ™t list volume, issue, or page numbers, since the web renders them superfluous; likewise for publishersвЂ™ locations..