Once we look at machine learning this way, two things immediately jump out. The first is that the more data we have, the more we can learn. No data? Nothing to learn. Big data? Lots to learn. ThatвЂ™s why machine learning has been turning up everywhere, driven by exponentially growing mountains of data. If machine learning was something you bought in the supermarket, its carton would say: вЂњJust add data.вЂќ. The other reason machine learners are theГјber-geeks is that the world has far fewer of them than it needs, even by the already dire standards of computer science. According to tech guru Tim OвЂ™Reilly, вЂњdata scientistвЂќ is the hottest job title in Silicon Valley. The McKinsey Global Institute estimates that by 2018 the United States alone will need 140,000 to 190,000 more machine-learning experts than will be available, and 1.5 million more data-savvy managers. Machine learningвЂ™s applications have exploded too suddenly for education to keep up, and it has a reputation for being a difficult subject. Textbooks are liable to giveyou math indigestion. This difficulty is more apparent than real, however. All of the important ideas in machine learning can be expressed math-free. As you read this book, you may even find yourself inventing your own learning algorithms, with nary an equation in sight.. With big data and machine learning, you can understand much more complex phenomena than before. In most fields, scientists have traditionally used only very limited kinds of models, like linear regression, where the curve you fit to the data is always a straight line. Unfortunately, most phenomena in the world are nonlinear. (Or fortunately, since otherwise life would be very boring-in fact, there would be no life.) Machine learning opens up a vast new world of nonlinear models. ItвЂ™s like turning on the lights in a room where only a sliver of moonlight filtered before.. Another line of argument for the unity of the cortex comes from what might be called the poverty of the genome. The number of connections in your brain is over a million times the number of letters in your genome, so itвЂ™s not physically possible for the genome to specify in detail how the brain is wired.. The algorithm weвЂ™ll arrive at is not yet the Master Algorithm, for reasons weвЂ™ll see, but itвЂ™s the closest anyone has come. And weвЂ™ll gather enough riches along the way to make Croesus envious. Nevertheless, this book is only part one of the Master Algorithm saga. Part twoвЂ™s protagonist is you, dear reader. Your mission, should you choose to accept it, is to go the rest of the way and bring back the prize. I will be your humble guide in part one, from here to the edge of the known world. Do I hear you protest that you donвЂ™t know enough, or algorithms are not your forte? Fear not. Computer scienceis still young, and unlike in physics or biology, you donвЂ™t need a PhD to start a revolution. (Just ask Bill Gates, Messrs. Sergey Brin and Larry Page, or Mark Zuckerberg.) Insight and persistence are what counts.. To date or not to date?. (If youвЂ™re wondering about the last rule, credit-card thieves used to routinely buy one dollar of gas to check that a stolen credit card was good before data miners caught on to the tactic.). EinsteinвЂ™s general relativity was only widely accepted once Arthur Eddington empirically confirmed its prediction that the sun bends the light of distant stars. But you donвЂ™t need to wait around for new data to arrive to decide whether you can trust your learner. Rather, you take the data you have and randomly divide it into a training set, which you give to the learner, and a test set, which you hide from it and use to verify its accuracy. Accuracy on held-out data is the gold standard in machine learning. You can write a paper about a great new learning algorithm youвЂ™ve invented, but if your algorithm is not significantly more accurate than previous ones on held-out data, the paper is not publishable.. Spin glasses are not actually glasses, although they have some glass-like properties. Rather, they are magnetic materials. Every electron is a tiny magnet by virtue of its spin, which can pointвЂњupвЂќ or вЂњdown.вЂќ In materials like iron, electronsвЂ™ spins tend to line up: if an electron with down spin is surrounded by electrons with up spins, it will probably flip to up. When most of the spins in a chunk of iron line up, it turns into a magnet. In ordinary magnets, the strength of interaction between adjacent spins is the same for all pairs, but in a spin glass it can vary; it may even be negative, causing nearby spins to point in opposite directions. The energy of an ordinary magnet is lowest when all its spins align, but in a spin glass, itвЂ™s not so simple. Indeed, finding the lowest-energy state of a spin glass is an NP-complete problem, meaning that just about every other difficult optimization problem can be reduced to it. Because of this, a spin glass doesnвЂ™t necessarily settle into its overall lowest energy state; much like rainwater may flow downhill into a lake instead of reaching the ocean, a spin glass may get stuck in a local minimum, a state with lower energy than all the states that can be reached from it by flipping a spin, rather than evolve to the global one.. [РљР°СЂС‚РёРЅРєР°: pic_12.jpg]. Pearl realized that itвЂ™s OK to have a complex network of dependencies among random variables, provided each variable depends directly on only a few others. We can represent these dependencies with a graph like the ones we saw for Markov chains and HMMs, except now the graph can have any structure (as long as the arrows donвЂ™t form closed loops). One of PearlвЂ™s favorite examples is burglar alarms. The alarm at your house should go off if a burglar attempts to break in, but it could also be triggered by an earthquake. (In Los Angeles, where Pearl lives, earthquakes are almost as frequent as burglaries.) If youвЂ™re working late one night and your neighbor Bob calls to say he just heard your alarm go off, but your neighbor Claire doesnвЂ™t, should you call the police? HereвЂ™s the graph of dependencies:. The single most surprising property of SVMs, however, is that no matter how curvy the frontiers they form, those frontiers are always just straight lines (or hyperplanes, in general). The reason thatвЂ™s not a contradiction is that the straight lines are in a different space. Suppose the examples live on the (x,y) plane, and the boundary between the positive and negative regions is the parabolay =x2. ThereвЂ™s no way to represent it with a straight line, but if we add a third coordinatez, meaning the data now lives in (x,y,z) space, and we set each exampleвЂ™sz coordinate to the square of itsx coordinate, the frontier is now just the diagonal plane defined byy =z. In effect, the data points rise up into the third dimension, some rise more than others by just the right amount, and presto-in this new dimension the positive and negative examples can be separated by a plane. It turns out that we can view what SVMs do with kernels, support vectors, and weights as mapping the data to a higher-dimensional space and finding a maximum-margin hyperplane in that space. For some kernels, the derived space has infinite dimensions, but SVMs are completely unfazed by that. Hyperspace may be the Twilight Zone, but SVMs have figured out how to navigate it.. Evolutionaries use genetic algorithms to simulate natural selection. A genetic algorithm maintains a population of hypotheses and in each generation crosses over and mutates the fittest ones to produce the next generation. Alchemy maintains a population of hypotheses in the form of weighted formulas, modifies them in various ways at each step, and keeps the variations that most increase the posterior probability of the data (or some other score function). If the population is a single hypothesis, this reduces to hill climbing. The current open-source implementation of Alchemy does not include crossover, but this would be a straightforward addition. The evolutionariesвЂ™ master algorithm is genetic programming, which applies crossover and mutation to computer programs represented as trees of subroutines. Trees of subroutines can be represented by sets of logical rules, and the Prolog programming language does just that. In Prolog, each rule corresponds to a subroutine, and its antecedents are the subroutines it calls. So we can think of Alchemy with crossover as genetic programming using a Prolog-like programming language, with the added advantage that the rules can be probabilistic.. The kind of company IвЂ™m envisaging would do several things in return for a subscription fee. It would anonymize your online interactions, routing them through its servers and aggregating them with its other usersвЂ™. It would store all the data from all your life in one place-down to your 24/7 Google Glass video stream, if you ever get one. It would learn a complete model of you and your world and continually update it. And it would use the model on your behalf, always doing exactly what you would, to the best of the modelвЂ™s ability. The companyвЂ™s basic commitment to you is that your data and your model will never be used against your interests. Such a guarantee can never be foolproof-you yourself are not guaranteed to never do anything against your interests, after all. But the companyвЂ™s life would depend on it as much as a bankвЂ™s depends on the guarantee that it wonвЂ™t lose your money, so you should be able to trust it as much as you trust your bank.. A neural network stole my job.