Machine learning plays a part in every stage of your life. If you studied online for the SAT college admission exam, a learning algorithm graded your practice essays. And if you applied to business school and took the GMAT exam recently, one of your essay graders was a learning system. Perhaps when you applied for your job, a learning algorithm picked your rГ©sumГ© from the virtual pile and told your prospective employer: hereвЂ™s a strong candidate; take a look. Your latest raise may have come courtesy of another learning algorithm. If youвЂ™re looking to buy a house, Zillow.com will estimate what each one youвЂ™re considering is worth. When youвЂ™vesettled on one, you apply for a home loan, and a learning algorithm studies your application and recommends accepting it (or not). Perhaps most important, if youвЂ™ve used an online dating service, machine learning may even have helped you find the love of your life.. By combining many such operations, we can carry out very elaborate chains of logical reasoning. People often think computers are all about numbers, but theyвЂ™re not. Computers are all about logic. Numbers and arithmetic are made of logic, and so is everything else in a computer. Want to add two numbers? ThereвЂ™s a combination of transistors that does that. Want to beat the humanJeopardy! champion? ThereвЂ™s a combination of transistors for that too (much bigger, naturally).. What makes this possible? How do learning algorithms work? What canвЂ™t they currently do, and what will the next generation look like? How will the machine-learning revolution unfold? And what opportunities and dangers should you look out for? ThatвЂ™s what this book is about-read on!. The key is to realize that induction is just the inverse of deduction, in the same way that subtraction is the inverse of addition, or integration the inverse of differentiation. This idea was first proposed by William Stanley Jevons in the late 1800s. Steve Muggleton and Wray Buntine, an English Australian team, designed the first practical algorithm based on it in 1988. The strategy of taking a well-known operation and figuring out its inverse has a storied history in mathematics. Applying it to addition led to the invention of the integers, because without negative numbers, addition doesnвЂ™t always have an inverse (3 вЂ“ 4 = -1). Similarly, applying it to multiplication led to the rationals, and applying it to squaring led to complex numbers. LetвЂ™s see if we can apply it to deduction. A classic example of deductive reasoning is:. And this jungle crackles with electricity. Sparks run along tree trunks and set off more sparks in neighboring trees. Every now and then, a whole area of the jungle whips itself into a frenzy before settling down again. When you wiggle your toe, a series of electric discharges, called action potentials, runs all the way down your spinal chord and leg until it reaches your toe muscles and tells them to move. Your brain at work is a symphony of these electric sparks. If you could sit inside it and watch what happens as you read this page, the scene youвЂ™d see would make even the busiest science-fiction metropolis look laid back by comparison. The end result of this phenomenally complex pattern of neuron firings is your consciousness.. Imagine youвЂ™ve been kidnapped and left blindfolded somewhere in the Himalayas. Your head is throbbing, and your memory is not too good, either. All you know is you need to get to the top of Mount Everest. What do you do? You take a step forward and nearly slide into a ravine. After catching your breath, youdecide to be a bit more systematic. You carefully feel around with your foot until you find the highest point you can and step gingerly to that point. Then you do the same again. Little by little, you get higher and higher. After a while, every step you can take is down, and you stop. ThatвЂ™s gradient ascent. If the Himalayas were just Mount Everest, and Everest was a perfect cone, it would work like a charm. But more likely, when you get to a place where every step is down, youвЂ™re still very far from the top. YouвЂ™re just standing on a foothill somewhere, and youвЂ™re stuck. ThatвЂ™s what happens to backprop, except it climbs mountains in hyperspace instead of 3-D. If your network has a single neuron, just climbing to better weights one step at a time will get you to the top. But with a multilayer perceptron, the landscape is very rugged; good luck finding the highest peak.. Among the many ironies of the history of the perceptron, perhaps the saddest is that Frank Rosenblatt died in a boating accident in Chesapeake Bay in 1969 and never lived to see the second act of his creation.. We havenвЂ™t seen any deep learning yet, though. The next clever idea is to stack sparse autoencoders on top of each other like a club sandwich. The hidden layer of the first autoencoder becomes the input/output layer of the second one, and so on. Because the neurons are nonlinear, each hidden layer learnsa more sophisticated representation of the input, building on the previous one. Given a large set of face images, the first autoencoder learns to encode local features like corners and spots, the second uses those to encode facial features like the tip of a nose or the iris of an eye, the third onelearns whole noses and eyes, and so on. Finally, the top layer can be a conventional perceptron that learns to recognize your grandmother from the high-level features provided by the layer below it-much easier than using only the crude information provided by a single hidden layer or than trying tobackpropagate through all the layers at once. The Google Brain network ofNew York Times fame is a nine-layer sandwich of autoencoders and other ingredients that learns to recognize cats from YouTube videos. At one billion connections, it was at the time the largest network ever learned. ItвЂ™s no surprise that Andrew Ng, one of the projectвЂ™s principals, is also one of the leading proponents of the idea that human intelligence boils down to a single algorithm, and all we need to do is figure it out. Ng, whose affability belies a fierce ambition, believes that stacked sparse autoencoders can take us closer to solving AI than anything that came before.. If we want to evolve a whole set of spam-filtering rules, not just one, we can represent a candidate set ofn rules by a string ofnГ— 20,000 bits (20,000 for each rule, assuming ten thousand different words in the data, as before). Rules containing 00 for some word effectively disappear from the rule set, since they donвЂ™t match any e-mails, as we saw before. If an e-mail matches any rule in the set, itвЂ™s classified as spam; otherwise itвЂ™s legit. We can still let fitness be the percentage of correctly classified e-mails, but to combat overfitting, weвЂ™ll probably want to subtract from it a penalty proportional to the total number of active conditions in the rule set.. One of the most important problems in machine learning-and life-is the exploration-exploitation dilemma. If youвЂ™ve found something that works, should you just keep doing it? Or is it better to try new things, knowing it could be a waste of time but also might lead to a better solution? Would you rather be a cowboy or a farmer? Start a company or run an existing one? Go steady or play the field? A midlife crisis is the yearning to explore after many years spent exploiting. On an impulse, you fly to Vegas, ready to gamble away your lifeвЂ™s savings on the chance of becoming a millionaire. You enter the first casino and face a row of slot machines. The one to play is the one that gives you the best payoff on average, but you donвЂ™t know which that is. You have to try each one enough times to figure it out. But if you do this for too long, you waste your money on losing machines. Conversely, if you jump the gun and pick a machine that looked good by chance on the first few turns but is in fact not the best one, you waste your money playing it for the rest of the night. ThatвЂ™s the exploration-exploitation dilemma. Each time you play, you have to choose between repeating the best move youвЂ™ve found so far, which gives you the best payoff, or trying other moves, which gather information that may lead to even better payoffs. With two slot machines, Holland showed that the optimal strategy is to flip a biased coin each time, where the coin becomes exponentially more biased as you go along. (DonвЂ™t sue me if it doesnвЂ™t work for you, though. Remember the house always wins in the end.) The better a slot machine looks, the more you should play it, but never completely give up on the other one, in case it turns out to be the best one after all.. The single most surprising property of SVMs, however, is that no matter how curvy the frontiers they form, those frontiers are always just straight lines (or hyperplanes, in general). The reason thatвЂ™s not a contradiction is that the straight lines are in a different space. Suppose the examples live on the (x,y) plane, and the boundary between the positive and negative regions is the parabolay =x2. ThereвЂ™s no way to represent it with a straight line, but if we add a third coordinatez, meaning the data now lives in (x,y,z) space, and we set each exampleвЂ™sz coordinate to the square of itsx coordinate, the frontier is now just the diagonal plane defined byy =z. In effect, the data points rise up into the third dimension, some rise more than others by just the right amount, and presto-in this new dimension the positive and negative examples can be separated by a plane. It turns out that we can view what SVMs do with kernels, support vectors, and weights as mapping the data to a higher-dimensional space and finding a maximum-margin hyperplane in that space. For some kernels, the derived space has infinite dimensions, but SVMs are completely unfazed by that. Hyperspace may be the Twilight Zone, but SVMs have figured out how to navigate it.. The sobering (or perhaps reassuring) thought is that no learner in the world today has access to all this data (not even the NSA), and even if it did, it wouldnвЂ™t know how to turn it into a real likeness of you. But suppose you took all your data and gave it to the-real, future-Master Algorithm, already seeded with everything we could teach it about human life. It would learn a model of you, and you could carry that model in a thumb drive in your pocket, inspect it at will, and use it for everything you pleased. It would surely be a wonderful tool for introspection, like looking at yourself in the mirror, but it would be a digital mirror that showed not just your looks but all things observable about you-a mirror that could come alive and conversewith you. What would you ask it? Some of the answers you might not like, but that would be all the more reason to ponder them. And some would give you new ideas, new directions. The Master AlgorithmвЂ™s model of you might even help you become a better person.. Soldiering is harder to automate than science, but it will be as well. One of the prime uses of robots is to do things that are too dangerous for humans, and fighting wars is about as dangerous as it gets. Robots already defuse bombs, and drones allow a platoon to see over the hill. Self-driving supply trucks and robotic mules are on the way. Soon we will need to decide whether robots are allowed to pull the trigger on their own. The argument for doing this is that we want to get humans out of harmвЂ™s way, and remote control is not viable in fast-moving, shoot-or-be-shot situations. The argument against is that robots donвЂ™t understand ethics, and so canвЂ™t be entrusted with life-or-death decisions. But we can teach them. The deeper question is whether weвЂ™re ready to.. To sidestep the problem that infinitely dense points donвЂ™t exist, Kurzweil proposes to instead equate the Singularity with a black holeвЂ™s event horizon, the region within which gravity is so strong that not even light can escape. Similarly, he says, the Singularity is the point beyond which technological evolution is so fast that humans cannot predict or understand what will happen. If thatвЂ™s what the Singularity is, then weвЂ™re already inside it. We canвЂ™t predict in advance what a learner will come up with, and often we canвЂ™t even understand it in retrospect. As a matter of fact, weвЂ™ve always lived in a world that we only partly understood. The main difference is that our world is now partly created by us, which is surely an improvement. The world beyond the Turing point will not be incomprehensible to us, any more than the Pleistocene was. WeвЂ™ll focus on what we can understand, as we always have, and call the rest random (ordivine).. The history of attempts to combine probability and logic is surveyed in a 2003 special issue* of theJournal of Applied Logic devoted to the subject, edited by Jon Williamson and Dov Gabbay.вЂњFrom knowledge bases to decision models,вЂќ* by Michael Wellman, John Breese, and Robert Goldman (Knowledge Engineering Review, 1992), discusses some of the early AI approaches to the problem..