Rationalists and empiricists have probably been around since the dawn ofHomo sapiens. Before setting out on a hunt, Caveman Bob spent a long time sitting in his cave figuring out where the game would be. In the meantime, Cavewoman Alice was out systematically surveying the territory. Since both kinds are still with us, itвЂ™s probably safe to say that neither approach was better. You might think that machine learning is the final triumph of the empiricists, but the truth is more subtle, as weвЂ™ll soon see.. Sets of rules are vastly more powerful than conjunctive concepts. TheyвЂ™re so powerful, in fact, that you can representany concept using them. ItвЂ™s not hard to see why. If you give me a complete list of all the instances of a concept, I can just turn each instance into a rule that specifies all attributes of that instance, and the set of all those rules is the definition of the concept. Going back to the dating example, one rule would be:If itвЂ™s a warm weekend night, thereвЂ™s nothing good on TV, and you propose going to a club, sheвЂ™ll say yes. The table only contains a few examples, but if it contained all 2Г— 2 Г— 2 Г— 2 = 16 possible ones, with each labeled вЂњDateвЂќ or вЂњNo date,вЂќ turning each positive example into a rule in this way would do the trick.. Accuracy you can believe in. HebbвЂ™s rule was a confluence of ideas from psychology and neuroscience, with a healthy dose of speculation thrown in. Learning by association was a favorite theme of the British empiricists, from Locke and Hume to John Stuart Mill. In hisPrinciples of Psychology, William James enunciates a general principle of association thatвЂ™s remarkably similar to HebbвЂ™s rule, with neurons replaced by brain processes and firing efficiency by propagation of excitement. Around the same time, the great Spanish neuroscientist Santiago RamГіn y Cajal was making the first detailed observations of the brain, staining individual neurons using the recently invented Golgi method and cataloguing what he saw like a botanist classifying new species of trees. By HebbвЂ™s time, neuroscientists had a rough understanding of how neurons work, but he was the first to propose a mechanism by which they could encode associations.. In HebbвЂ™s time there was no way to measure synaptic strength or change in it, let alone figure out the molecular biology of synaptic change. Today, we know that synapses do grow (or form anew) when the postsynaptic neuron fires soon after the presynaptic one. Like all cells, neurons have different concentrations of ions inside and outside, creating a voltage across their membrane. When the presynaptic neuron fires, tiny sacs release neurotransmitter molecules into the synaptic cleft. These cause channels in the postsynaptic neuronвЂ™s membrane to open, letting in potassium and sodium ions and changing the voltage across the membrane as a result. If enough presynaptic neurons fire close together, the voltage suddenly spikes, and an action potential travels down the postsynaptic neuronвЂ™s axon. This also causes the ion channels to become more responsive and new channels to appear, strengthening the synapse. To the best of our knowledge, this is how neurons learn.. Pearl realized that itвЂ™s OK to have a complex network of dependencies among random variables, provided each variable depends directly on only a few others. We can represent these dependencies with a graph like the ones we saw for Markov chains and HMMs, except now the graph can have any structure (as long as the arrows donвЂ™t form closed loops). One of PearlвЂ™s favorite examples is burglar alarms. The alarm at your house should go off if a burglar attempts to break in, but it could also be triggered by an earthquake. (In Los Angeles, where Pearl lives, earthquakes are almost as frequent as burglaries.) If youвЂ™re working late one night and your neighbor Bob calls to say he just heard your alarm go off, but your neighbor Claire doesnвЂ™t, should you call the police? HereвЂ™s the graph of dependencies:. This is a radical departure from the way science is usually done. ItвЂ™s like saying, вЂњActually, neither Copernicus nor Ptolemy was right; letвЂ™s just predict the planetsвЂ™ future trajectories assuming Earth goes round the sun and vice versa and average the results.вЂќ. Match me if you can. Clearly, if Robby did know them, it would be smooth sailing: as in NaГЇve Bayes, each cluster would be defined by its probability (17 percent of the objects generated were toys), and by the probability distribution of each attribute among the clusterвЂ™s members (for example, 80 percent of the toys are brown). Robby could estimate these probabilities just by counting the number of toys in the data, the number of brown toys, and so on. But in order to do that, we would need to know which objects are toys. This seems like a tough nut to crack, but it turns out we already know how to do it as well. If Robby has a NaГЇve Bayes classifier and needs to figure out the class of a new object, all he needs to do is apply the classifier and compute the probability of each class given the objectвЂ™s attributes. (Small, fluffy, brown, bear-like, with big eyes, and a bow tie? Probably a toy but possibly an animal.). Discovering the shape of the data. Principal-component analysis (PCA), as this process is known, is one of the key tools in the scientistвЂ™s toolkit. You could say PCA is to unsupervised learning what linear regression is to the supervised variety. The famous hockey-stick curve of global warming, for example, is the result of finding the principal component of various temperature-related data series (tree rings, ice cores, etc.) and assuming itвЂ™s the temperature. Biologists use PCA to summarize the expression levels of thousands of different genes into a few pathways. Psychologists have found that personality boils down to five dimensions-extroversion, agreeableness, conscientiousness, neuroticism, and openness to experience-which they can infer from your tweets and blog posts. (Chimps supposedly have one more dimension-reactivity-but Twitter data for them is not available.) Applying PCA to congressional votes and poll data shows that, contrary to popular belief, politics is not mainly about liberals versus conservatives. Rather, people differ along two main dimensions: one for economic issues and one for social ones. Collapsing these into a single axis mixes together populists and libertarians, who are polar opposites, and creates the illusion of lots of moderates in the middle. Trying to appeal to them is an unlikely winning strategy. On the other hand, if liberals and libertarians overcame their mutual aversion, they could ally themselves on social issues, where both favor individual freedom.. The notion that not all states have rewards (positive or negative) but every state has a value is central to reinforcement learning. In board games, only final positions have a reward (1, 0, orв€’1 for a win, tie, or loss, say). Other positions give no immediate reward, but they have value in that they can lead to rewards later. A chess position from which you can force checkmate in some number of moves is practically as good as a win and therefore has high value. We can propagate this kind of reasoning all the way to good and bad opening moves, even if at that distance the connection is far from obvious. In video games, the rewards are usually points, and the value of a state is the number of points you can accumulate starting from that state. In real life, a reward now is better than a reward later, so future rewards can be discounted by some rate of return, like investments. Of course, the rewards depend on what actions you choose, and the goal of reinforcement learning is to always choose the action that leads to the greatest rewards. Should you pick up the phone and ask your friend for a date? It could be the start of a beautiful relationship or just the route to a painful rejection. Even if your friend agrees to go on a date, that date may turn out well or not. Somehow, you have to abstract over all the infinite paths the future could take and make a decision now.Reinforcement learning does that by estimating the value of each state-the sum total of the rewards you can expect to get starting from that state-and choosing the actions that maximize it.. Rosenbloom and Newell set their chunking program to work on a series of problems, measured the time it took in each trial, and lo and behold, out popped a series of power law curves. But that was only the beginning. Next they incorporated chunking into Soar, a general theory of cognition that Newell had been working on with John Laird, another one of his students. Instead of working only within a predefined hierarchy of goals, the Soar program could define and solve a new subproblem every time it hit a snag. Once it formed a new chunk, Soar generalized it to apply to similar problems, in a manner similar to inverse deduction. Chunking in Soar turned out to be a good model of lots of learning phenomena besides the power law of practice. It could even be applied to learning new knowledge by chunking data and analogies. This led Newell, Rosenbloom, and Laird to hypothesize that chunking is theonly mechanism needed for learning-in other words, the Master Algorithm.. The cure for cancer is a program that inputs the cancerвЂ™s genome and outputs the drug to kill it with. We can now picture what such a program-letвЂ™s call it CanceRx-will look like. Despite its outward simplicity, CanceRx is one of the largest and most complex programs ever built-indeed, so large and complex that it could only have been built with the help of machine learning. It is based on a detailed model of how living cells work, with a subclass for each type of cell in the human body and an overarching model of how they interact. This model, in the form of an MLN or something akin to it, combines knowledge of molecular biology with vast amounts of data from DNA sequencers, microarrays, and many other sources. Some of the knowledge was manually encoded, but most was automatically extracted from the biomedical literature. The model is continually evolving, incorporating the results of new experiments, data sources, and patient histories. Ultimately, it will know every pathway, regulatory mechanism, and chemical reaction in every type of human cell-the sum total of human molecular biology.. In the world of the Master Algorithm,вЂњmy people will call your peopleвЂќ becomes вЂњmy program will call your program.вЂќ Everyone has an entourage of bots, smoothing his or her way through the world. Deals get pitched, terms negotiated, arrangements made, all before you lift a finger. Today, drug companies target your doctor, because he decides what drugs to prescribe to you. Tomorrow, the purveyors of every product and service you use, or might use, will target your model, because your model will screen them for you. Their botsвЂ™ job is to get your bot to buy. Your botвЂ™s job is to see through their claims, just as you seethrough TV commercials, but at a much finer level of detail, one that youвЂ™d never have the time or patience for. Before you buy a car, the digital you will go over every one of its specs, discuss them with the manufacturer, and study everything anyone in the world has said about that car and its alternatives. Your digital half will be like power steering for your life: it goes where you want to go but with less effort from you. This does not mean that youвЂ™ll end up in a вЂњfilter bubble,вЂќ seeing only what you reliably like, with no room for the unexpected; the digital you knows better than that. Part of its brief is to leave some things open to chance, to expose you to new experiences, and to look for serendipity..