Otherwise, play on any empty square.. In the information-processing ecosystem, learners are the superpredators. Databases, crawlers, indexers, and so on are the herbivores, patiently munging on endless fields of data. Statistical algorithms, online analytical processing, and so on are the predators. Herbivores are necessary, since without them the others couldnвЂ™t exist, but superpredators have a more exciting life. A crawler is like a cow, the web is its worldwide meadow, each page is a blade of grass. When the crawler is done munging, a copy of the web is sitting on its hard disks. An indexer then makes a list of the pages where each word appears, much like the index at the end of a book. Databases, like elephants, are big and heavy and never forget. Among these patient beasts dart statistical and analytical algorithms, compacting and selecting, turning data into information. Learners eat up this information, digest it, and turn it into knowledge.. An example of this closer to home is whatвЂ™s known as predictive policing. By forecasting crime trends and strategically focusing patrols where theyвЂ™re most likely to be needed, as well as taking other preventive measures, a cityвЂ™s police force can effectively do the job of a much larger one. In many ways, law enforcement is similar to asymmetric warfare, and many of the same learning techniques apply, whether itвЂ™s in fraud detection, uncovering criminal networks, or plain old beat policing.. Our search for the Master Algorithm is complicated, but also enlivened, by the rival schools of thought that exist within machine learning. The main ones are the symbolists, connectionists, evolutionaries, Bayesians, and analogizers. Each tribe has a set of core beliefs, and a particular problem that it cares most about. It has found a solution to that problem, based on ideas from its allied fields of science, and it has a master algorithm that embodies it.. Socrates is human.. Adam, the robot scientist we met in Chapter 1, gives a preview. AdamвЂ™s goal is to figure out how yeast cells work. It starts with basic knowledge of yeast genetics and metabolism and a trove of gene expression data from yeast cells. It then uses inverse deduction to hypothesize which genes are expressed as which proteins, designs microarray experiments to test them, revises its hypotheses, and repeats. Whether each gene is expressed depends on other genes and conditions in the environment, and the resulting web of interactions can be represented as a set of rules, such as:. These can be organized into the following decision tree:. Spin glasses are not actually glasses, although they have some glass-like properties. Rather, they are magnetic materials. Every electron is a tiny magnet by virtue of its spin, which can pointвЂњupвЂќ or вЂњdown.вЂќ In materials like iron, electronsвЂ™ spins tend to line up: if an electron with down spin is surrounded by electrons with up spins, it will probably flip to up. When most of the spins in a chunk of iron line up, it turns into a magnet. In ordinary magnets, the strength of interaction between adjacent spins is the same for all pairs, but in a spin glass it can vary; it may even be negative, causing nearby spins to point in opposite directions. The energy of an ordinary magnet is lowest when all its spins align, but in a spin glass, itвЂ™s not so simple. Indeed, finding the lowest-energy state of a spin glass is an NP-complete problem, meaning that just about every other difficult optimization problem can be reduced to it. Because of this, a spin glass doesnвЂ™t necessarily settle into its overall lowest energy state; much like rainwater may flow downhill into a lake instead of reaching the ocean, a spin glass may get stuck in a local minimum, a state with lower energy than all the states that can be reached from it by flipping a spin, rather than evolve to the global one.. A genetic algorithm works by mimicking this process. In each generation, it mates the fittest individuals, producing two offspring from each pair of parents by crossing over their bit strings at a random point. After applying point mutations to the new strings, it lets them loose in its virtual world. Each one returns with a fitness score, and the process repeats. Each generation is fitter than the previous one, and the process terminates when the desired fitness is reached or time runs out.. One of the most important problems in machine learning-and life-is the exploration-exploitation dilemma. If youвЂ™ve found something that works, should you just keep doing it? Or is it better to try new things, knowing it could be a waste of time but also might lead to a better solution? Would you rather be a cowboy or a farmer? Start a company or run an existing one? Go steady or play the field? A midlife crisis is the yearning to explore after many years spent exploiting. On an impulse, you fly to Vegas, ready to gamble away your lifeвЂ™s savings on the chance of becoming a millionaire. You enter the first casino and face a row of slot machines. The one to play is the one that gives you the best payoff on average, but you donвЂ™t know which that is. You have to try each one enough times to figure it out. But if you do this for too long, you waste your money on losing machines. Conversely, if you jump the gun and pick a machine that looked good by chance on the first few turns but is in fact not the best one, you waste your money playing it for the rest of the night. ThatвЂ™s the exploration-exploitation dilemma. Each time you play, you have to choose between repeating the best move youвЂ™ve found so far, which gives you the best payoff, or trying other moves, which gather information that may lead to even better payoffs. With two slot machines, Holland showed that the optimal strategy is to flip a biased coin each time, where the coin becomes exponentially more biased as you go along. (DonвЂ™t sue me if it doesnвЂ™t work for you, though. Remember the house always wins in the end.) The better a slot machine looks, the more you should play it, but never completely give up on the other one, in case it turns out to be the best one after all.. Evolutionaries and connectionists have something important in common: they both design learning algorithms inspired by nature. But then they part ways. Evolutionaries focus on learning structure; to them, fine-tuning an evolved structure by optimizing parameters is of secondary importance. In contrast, connectionists prefer to take a simple, hand-coded structure with lots of connections and let weight learning do all the work. This is machine learningвЂ™s version of the nature versus nurture controversy, and there are good arguments on both sides.. [РљР°СЂС‚РёРЅРєР°: pic_16.jpg]. As you stare uncomprehendingly at it, your Google Glass helpfully flashes:вЂњBayesвЂ™ theorem.вЂќ Now the crowd starts to chant вЂњMore data! More data!вЂќ A stream of sacrificial victims is being inexorably pushed toward the altar. Suddenly, you realize that youвЂ™re in the middle of it-too late. As the crank looms over you, you scream, вЂњNo! I donвЂ™t want to be a data point! Let me gooooo!вЂќ. WhatвЂ™s that statue up ahead? Aristotle, looking rather disapprovingly toward the tangled mess of the gradient descent quarter. YouвЂ™ve come full circle. You have the unified optimizer you need for the Master Algorithm, but this is no time to congratulate yourself. Night has fallen, and you still have much to do. You enter the Citadel of Evaluation through the imposing but rather narrow Accuracy Gate. The inscription above it says вЂњAbandon all hope of overfitting, ye who enter here.вЂќ As you circle past the palaces of the five tribesвЂ™ evaluators, you mentally snap the pieces into place. You use accuracy to evaluate yes-or-no predictions and squared error for continuous ones. Fitness is just the evolutionariesвЂ™ name for the scoring function; you can make it anything you want, including accuracy and squared error. Posterior probability reduces to squared error if you ignore the priorprobability and the errors follow a normal distribution. The margin, if you allow it to be violated for a price, becomes a softer version of accuracy: instead of paying no penalty for a correct prediction and a penalty of one for an incorrect prediction, the penalty is zero until you get inside themargin, at which point it starts to steadily go up. Whew! Combining the evaluators was a lot easier than combining the optimizers. But the Towers of Representation, looming above you, fill you with a sense of foreboding.. Chapter Two.