Machine learning plays a part in every stage of your life. If you studied online for the SAT college admission exam, a learning algorithm graded your practice essays. And if you applied to business school and took the GMAT exam recently, one of your essay graders was a learning system. Perhaps when you applied for your job, a learning algorithm picked your rГ©sumГ© from the virtual pile and told your prospective employer: hereвЂ™s a strong candidate; take a look. Your latest raise may have come courtesy of another learning algorithm. If youвЂ™re looking to buy a house, Zillow.com will estimate what each one youвЂ™re considering is worth. When youвЂ™vesettled on one, you apply for a home loan, and a learning algorithm studies your application and recommends accepting it (or not). Perhaps most important, if youвЂ™ve used an online dating service, machine learning may even have helped you find the love of your life.. I had a number of different but overlapping audiences in mind when writing this book.. Designing an algorithm is not easy. Pitfalls abound, and nothing can be taken for granted. Some of your intuitions will turn out to have been wrong, and youвЂ™ll have to find another way. On top of designing the algorithm, you have to write it down in a language computers can understand, like Java or Python (at which point itвЂ™s called a program). Then you have to debug it: find every error and fix it until the computer runs your program without screwing up. But once you have a program that does what you want, you can really go to town. Computers will do your bidding millions of times, at ultrahigh speed, without complaint. Everyone in the world can use your creation. The cost can be zero, if you so choose, or enough to make you a billionaire, if the problem you solved is important enough. A programmer-someone who creates algorithms and codes them up-is a minor god, creating universes at will. You could even say that the God of Genesis himself is a programmer: language, not manipulation, is his tool of creation. Words become worlds. Today, sitting on the couch with your laptop, you too can be a god. Imagine a universe and make it real. The laws of physics are optional.. Figuring out how proteins fold into their characteristic shapes; reconstructing the evolutionary history of a set of species from their DNA; proving theorems in propositional logic; detecting arbitrage opportunities in markets with transaction costs; inferring a three-dimensional shape from two-dimensional views; compressing data on a disk; forming a stable coalition in politics; modeling turbulence in sheared flows; finding the safest portfolio of investments with a given return, the shortest route to visit a set of cities, the best layout of components on a microchip, the best placement of sensors in an ecosystem, or the lowest energy state of a spin glass; scheduling flights, classes, and factory jobs; optimizing resource allocation, urban traffic flow, social welfare, and (most important) your Tetris score: these are all NP-complete problems, meaning that if you can efficiently solve one of them you can efficiently solve all problems in the class NP, including each other. Who would have guessed that all these problems, superficially so different, are really the same? But if they are, it makes sense that one algorithm could learn to solve all of them (or, more precisely, all efficiently solvable instances).. Machine learners versus knowledge engineers. The second problem is that, even if we had complete knowledge of the world at some point in time, the laws of physics would still not allow us to determine its past and future. This is because the sheer amount of computation required to make those predictions would be beyond the capabilities of any imaginable computer. In effect, to perfectly simulate the universe we would need another, identical universe. This is why string theory is mostly irrelevant outside of physics. The theories we have in biology, psychology, sociology, or economics are not corollaries of the laws of physics; they had to be created from scratch. We assume that they are approximations of what the laws of physics would predict when applied at the scale of cells, brains, and societies, but thereвЂ™s no way to know.. A game of twenty questions. Because of its origins and guiding principles, symbolist machine learning is still closer to the rest of AI than the other schools. If computer science were a continent, symbolist learning would share a long border with knowledge engineering. Knowledge is traded in both directions-manually entered knowledge for use in learners, induced knowledge for addition to knowledge bases-but at the end of the day the rationalist-empiricist fault line runs right down that border, and crossing it is not easy.. Inverse deduction is like having a superscientist systematically looking at the evidence, considering possible inductions, collating the strongest, and using those along with other evidence to construct yet further hypotheses-all at the speed of computers. ItвЂ™s clean and beautiful, at least for the symbolist taste. On the other hand, it has some serious shortcomings. The number of possible inductions is vast, and unless we stay close to our initial knowledge, itвЂ™s easy to get lost in space. Inverse deduction is easily confused by noise: how do we figure out what the missing deductive steps are, if the premises or conclusions are themselves wrong? Most seriously, real concepts can seldom be concisely defined by a set of rules. TheyвЂ™re not black and white: thereвЂ™s a large gray area between, say, spam and nonspam. They require weighing and accumulating weak evidence until a clear picture emerges. Diagnosing an illness involves giving more weight to some symptoms than others, and being OK with incomplete evidence. No one has ever succeeded in learning a set of rules that will recognize a cat by looking at the pixels in an image, and probably no one ever will.. Today, however, connectionism is resurgent. WeвЂ™re learning deeper networks than ever before, and theyвЂ™re setting new standards in vision, speech recognition, drug discovery, and other areas. The new field of deep learning is on the front page of theNew York Times. Look under the hood, andвЂ¦ surprise: itвЂ™s the trusty old backprop engine, still humming. What changed? Nothing much, say the critics: just faster computers and bigger data. To which Hinton and others reply: exactly, we were right all along!. In many cases we can do this and avoid the exponential blowup. Suppose youвЂ™re leading a platoon in single file through enemy territory in the dead of night, and you want to make sure that all your soldiers are still with you. You could stop and count them yourself, but that wastes too much time. A cleverer solution is to just ask the first soldier behind you: вЂњHow many soldiers are behind you?вЂќ Each soldier asks the next the same question, until the last one says вЂњNone.вЂќ The next-to-last soldier can now say вЂњOne,вЂќ and so on all the way back to the first soldier, with each soldier adding one to the number of soldiers behind him. Now you know how many soldiers are still with you, and you didnвЂ™t even have to stop.. You can probably tell just by looking at this plot that the main street in Palo Alto runs southwest-northeast. You didnвЂ™t draw a street, but you can intuit that itвЂ™s there from the fact that all the points fall along a straight line (or close to it-they can be on different sides of the street). Indeed, the street is University Avenue, and if you want to shop or eat out in Palo Alto, thatвЂ™s the place to go. Asa bonus, once you know that the shops are on University Avenue, you donвЂ™t need two numbers to locate them, just one: the street number (or, if you wanted to be really precise, the distance from the shop to the Caltrain station, on the southwest corner, which is where University Avenue begins).. Suppose youвЂ™re moving along a tunnel, Indiana Jones-like, and you come to a fork. Your map says the left tunnel leads to a treasure and the right one to a snake pit. The value of where youвЂ™re standing-right before the fork-is the value of the treasure because youвЂ™ll choose to go left. If you always choose the best possible action, then the value of a state differs from the value of the succeeding state only by the immediate reward (if any) that youвЂ™ll get by performing that action. If we know each stateвЂ™s immediate reward, we can use this observation to update the values of neighboring states,and so on, until all states have consistent values. The treasureвЂ™s value propagates backward along the tunnel until it reaches the fork and beyond. Once you know the value of each state, you also know which action to choose in each state (the one that maximizes the combination of immediate rewardand value of the resulting state). This much was worked out in the 1950s by the control theorist Richard Bellman. But the real problem in reinforcement learning is when you donвЂ™t have a map of the territory. Then your only choice is to explore and discover what rewards are where. Sometimes youвЂ™ll discover a treasure, and other times youвЂ™ll fall into a snake pit. Every time you take an action, you note the immediate reward and the resulting state. That much could be done by supervised learning. But you also update the value of the state you just came from to bring it into line with the value you just observed, namely the reward you got plus the value of the new state youвЂ™re in. Of course, that value may not yet be the correct one, but if you wander around doing this for long enough, youвЂ™ll eventually settle on the right values for all the states and the corresponding actions. ThatвЂ™s reinforcement learning in a nutshell.. After an arduous climb, you reach the top. A wedding is in progress. Praedicatus, First Lord of Logic, ruler of the symbolic realm and Protector of the Programs, says to Markovia, Princess of Probability, Empress of Networks:вЂњLet us unite our realms. To my rules thou shalt add weights, begetting a new representation that will spread far across the land.вЂќ The princess says, вЂњAnd we shall call our progeny Markov logic networks.вЂќ. Another property of the world that makes learning and inference easier is that the entities in it donвЂ™t come in arbitrary forms. Rather, they fall into classes and subclasses, with members of the same class being more alike than members of different ones. Alive or inanimate, animal or plant, bird or mammal, human or not: if we know all the distinctions relevant to the question at hand, we can lump together all the entities that lack them and that can save a lot of time. As before, the MLN doesnвЂ™t have to know a priori what the classes in the world are; it can learn them from data by hierarchical clustering..