Homo sapiens is the species that adapts the world to itself instead of adapting itself to the world. Machine learning is the newest chapter in this million-year saga: with it, the world senses what you want and changes accordingly, without you having to lift a finger. Like a magic forest, your surroundings-virtual today, physical tomorrow-rearrange themselves as you move through them. The path you picked out between the trees and bushes grows into a road. Signs pointing the way spring up in the places where you got lost.. This bookвЂ™s first goal is to let you in on the secrets of machine learning. Only engineers and mechanics need to know how a carвЂ™s engine works, but every driver needs to know that turning the steering wheel changes the carвЂ™s direction and stepping on the brake brings it to a stop. Few people today know what the corresponding elements of a learner even are, let alone how to use them. The psychologist Don Norman coined the termconceptual model to refer to the rough knowledge of a technology we need to have in order to use it effectively. This book provides you with a conceptual model of machine learning.. To date or not to date?. YouвЂ™re not the only one in dire straits-so are we. WeвЂ™ve only just set out on our road to the Master Algorithm and already we seem to have run into an insurmountable obstacle. Is thereany way to learn something from the past that we can be confident will apply in the future? And if there isnвЂ™t, isnвЂ™t machine learning a hopeless enterprise? For that matter, isnвЂ™t all of science, even all of human knowledge, on rather shaky ground?. A decision tree is like playing a game of twenty questions with an instance. Starting at the root, each node asks about the value of one attribute, and depending on the answer, we follow one or another branch. When we arrive at a leaf, we read off the predicted concept. Each path from the root to a leaf corresponds to a rule. If this reminds you of those annoying phone menus you have to get through when you call customer service, itвЂ™s not an accident: a phone menu is a decision tree. The computer on the other end of the line is playing a game of twenty questions with you to figure out what you want, and each menu is a question.. Another notable early success of neural networks was learning to drive a car. Driverless cars first broke into the public consciousness with the DARPA Grand Challenges in 2004 and 2005, but a over a decade earlier, researchers at Carnegie Mellon had already successfully trained a multilayer perceptron to drive a car by detecting the road in video images and appropriately turning the steering wheel. Carnegie MellonвЂ™s car managed to drive coast to coast across America with very blurry vision (thirty by thirty-two pixels), a brain smaller than a wormвЂ™s, and only a few assists from the human copilot. (The project was dubbed вЂњNo Hands Across America.вЂќ) It may not have been the first truly self-driving car, but it did compare favorably with most teenage drivers.. One consequence of crossing over program trees instead of bit strings is that the resulting programs can have any size, making the learning more flexible. The overall tendency is for bloat, however, with larger and larger trees growing as evolution goes on longer (also known asвЂњsurvival of the fattestвЂќ). Evolutionaries can take comfort from the fact that human-written programs are no different (Microsoft Windows: forty-five million lines of code and counting), and that human-made code doesnвЂ™t allow a solution as simple as adding a complexity penalty to the fitness function.. One of the greatest mathematicians of all time, Laplace is perhaps best known for his dream of Newtonian determinism:. BayesвЂ™ theorem is useful because what we usually know is the probability of the effects given the causes, but what we want to know is the probability of the causes given the effects. For example, we know what percentage of flu patients have a fever, but what we really want to know is how likely a patient with a fever is to have the flu. BayesвЂ™ theorem lets us go from one to the other. Its significance extends far beyond that, however. For Bayesians, this innocent-looking formula is theF = ma of machine learning, the foundation from which a vast number of results and applications flow. And whatever the Master Algorithm is, it must beвЂњjustвЂќ a computational implementation of BayesвЂ™ theorem. I putjust in quotes because implementing BayesвЂ™ theorem on a computer turns out to be fiendishly hard for all but the simplest problems, for reasons that weвЂ™re about to see.. ThereвЂ™s an interesting twist, though. Suppose Lee and Ken have very similar tastes, but Lee is grumpier than Ken. Whenever Ken gives a movie five stars, Lee gives three; when Ken gives three, Lee gives one, and so on. WeвЂ™d like to use LeeвЂ™s ratings to predict KenвЂ™s, but if we just do it directly, weвЂ™ll always be off by two stars. Instead, what we need to do is predict how much KenвЂ™s ratings will be above or below his average, based on how much LeeвЂ™s are. And now, since Ken is always two stars above his average when Lee is two stars above his, and so on, our predictions will be spot on.. Whether itвЂ™s data pouring into RobbyвЂ™s brain through his senses or the click streams of millions of Amazon customers, grouping a large number of entities into a smaller number of clusters is only half the battle. The other half is shortening the description of each entity. The very first picture of Mom that Robby sees comprises perhaps a million pixels, each with its own color, but you hardly need a million variables to describe a face. Likewise, each thing you click on at Amazon provides an atom of information about you, but what Amazon would really like to know is your likes and dislikes, not your clicks. The former, which are fairly stable, are somehow immanent in the latter, which grow without limit as you use the site. Little by little, all those clicks should add up to a picture of your taste, in the same way that all those pixels add up to a picture of your face. The question is how todo the adding.. [РљР°СЂС‚РёРЅРєР°: pic_29.jpg]. An important precursor of reinforcement learning was a checkers-playing program created by Arthur Samuel, an IBM researcher, in the 1950s. Board games are a great example of a reinforcement learning problem: you have to make a long series of moves without any feedback, and the whole reward or punishment comes at the very end, in the form of a win or loss. Yet SamuelвЂ™s program was able to teach itself to play as well as most humans. It did not directly learn which move to make in each board position because that would have been too difficult. Rather, it learned how to evaluate each board position-how likely am I to win starting from this position?-and chose the move that led to the best position. Initially, the only positions it knew how to evaluate were the final ones: a win, a tie, or a loss. But once it knew that a certain position was a win, it also knew that positions from which it could move to it were good, and so on. Thomas J. Watson Sr., IBMвЂ™s president, predicted that when the program was demonstrated IBM stock would go up by fifteen points. It did. The lesson was not lost on IBM, which went on to build a chess champion and aJeopardy! one.. Our unified learner is perhaps best introduced through an extended allegory. If machine learning is a continent divided into the territories of the five tribes, the Master Algorithm is its capital city, standing on the unique spot where the five territories meet. As you approach it from a distance, you can see that the city is made up of three concentric circles, each bounded by a wall. The outer and by far widest circle is Optimization Town. Each house here is an algorithm, and they come in all shapes and sizes. Some are under construction, the locals busy around them; some are gleaming new; and some look old and abandoned. Higher up the hill lies the Citadel of Evaluation. From its mansions and palaces orders issue continuously to the algorithms below. Above all, silhouetted against the sky, rise the Towers of Representation. Here live the rulers of the city. Their immutable laws set forth what can and cannot be done not just in the city but throughout the continent. Atop the central, tallest tower flies the flag of the Master Algorithm, red and black, with a five-pointed star surrounding an inscription that you cannot yet make out.. The dense ranks of instances end abruptly, and you find yourself in the inverse deduction district, a place of broad avenues and ancient stone buildings. The architecture here is geometric, austere, made of straight lines and right angles. Even the severely pruned trees have rectangular trunks, and their leaves are meticulously labeled with class predictions. The denizens of this district seem to build their houses in a peculiar way: they start with the roof, which they labelвЂњConclusions,вЂќ and gradually fill in the gaps between it and the ground, which they label вЂњPremises.вЂќ One by one, they find a stone block thatвЂ™s the right shape to fill in a particular gap and hoist it up to its place. But, you notice, many gaps have the same shape, and it would be fasterto cut and combine blocks until they form that shape, and then repeat the process as many times as necessary. In other words, you could use genetic search to do inverse deduction. Neat. It looks like youвЂ™ve boiled down the five optimizers to a simple recipe: genetic search for structure and gradient descent for parameters. And even that may be overkill. For a lot of problems, you can whittle genetic search down to hill climbing if you do three things: leave out crossover, try all possible point mutations in each generation, and always select the single best hypothesis to seed the next generation..