Otherwise, if thereвЂ™s an empty corner, play there.. Big data is no use if you canвЂ™t turn it into knowledge, however, and there arenвЂ™t enough scientists in the world for the task. Edwin Hubble discovered new galaxies by poring over photographic plates, but you can bet the half-billion sky objects in the Sloan Digital Sky Survey werenвЂ™t identified that way. It would be liketrying to count the grains of sand on a beach by hand. You can write rules to distinguish galaxies from stars from noise objects (such as birds, planes, Superman), but theyвЂ™re not very accurate. Instead, the SKICAT (sky image cataloging and analysis tool) project used a learning algorithm. Starting from plates where objects were labeled with the correct categories, it figured out what characterizes each one and applied the result to all the unlabeled plates. Even better, it could classify objects that were too faint for humans to label, and these comprise the majority of the survey.. Each tribeвЂ™s solution to its central problem is a brilliant, hard-won advance. But the true Master Algorithm must solve all five problems, not just one. For example, to cure cancer we need to understand the metabolic networks in the cell: which genes regulate which others, which chemical reactions the resulting proteins control, and how adding a new molecule to the mix would affect the network. It would be silly to try to learn all of this from scratch, ignoring all the knowledge that biologists have painstakingly accumulated over the decades. Symbolists know how to combine this knowledge with data from DNA sequencers, gene expression microarrays, and so on, to produce results that you couldnвЂ™t get with either alone. But the knowledge we obtain by inverse deduction is purely qualitative; we need to learn not just who interacts with whom, but how much, and backpropagation can do that. Nevertheless, both inverse deduction and backpropagation would be lost in space without some basic structure on which to hang the interactions and parameters they find, and genetic programming can discover it. At this point, if we had complete knowledge of the metabolism and all the data relevant to a givenpatient, we could figure out a treatment for her. But in reality the information we have is always very incomplete, and even incorrect in places; we need to make headway despite that, and thatвЂ™s what probabilistic inference is for. In the hardest cases, the patientвЂ™s cancer looks very differentfrom previous ones, and all our learned knowledge fails. Similarity-based algorithms can save the day by seeing analogies between superficially very different situations, zeroing in on their essential similarities and ignoring the rest.. The deeper problem, however, is that most learners start out knowing too little, and no amount of knob-twiddling will get them to the finish line. Without the guidance of an adult brainвЂ™s worth of knowledge, they can easily go astray. Even though itвЂ™s what most learners do, just assuming you know the form of the truth (for example, that itвЂ™s a small set of rules) is not much to hang your hat on. A strict empiricist would say that thatвЂ™s all a newborn has, encoded in her brainвЂ™s architecture, and indeed children overfit more than adults do, but we would like to learn faster than a child does. (Eighteen years is a long time, and thatвЂ™s not counting college.) The Master Algorithm should be able to start with a large body of knowledge, whether it was provided by humans or learned in previous runs, and use it to guide new generalizations from data. ThatвЂ™s what scientists do, and itвЂ™s as far as it gets from a blank slate. The вЂњdivide and conquerвЂќ rule induction algorithm canвЂ™t do it, but thereвЂ™s another way to learn rules that can.. If gene C is expressed, gene D is not.. Symbolist machine learning is an offshoot of the knowledge engineering school of AI. In the 1970s, so-called knowledge-based systems scored some impressive successes, and in the 1980s they spread rapidly, but then they died out. The main reason they did was the infamous knowledge acquisition bottleneck: extracting knowledge from experts and encoding it as rules is just too difficult, labor-intensive, and failure-prone to be viable for most problems. Letting the computer automatically learn to, say, diagnose diseases by looking at databases of past patientsвЂ™ symptoms and the corresponding outcomes turned out to be much easier than endlessly interviewing doctors. Suddenly, the work of pioneers like Ryszard Michalski, Tom Mitchell, and Ross Quinlan had a new relevance, and the field hasnвЂ™t stopped growing since. (Another important problem was that knowledge-based systems had trouble dealing with uncertainty, of which more in Chapter 6.). So does backprop solve the machine-learning problem? Can we just throw together a big pile of neurons, wait for it to do its magic, and on the way to the bank collect a Nobel Prize for figuring out how the brain works? Alas, life is not that easy. Suppose your network
has only one weight, and this is the graph of the error as a function of it:. This may change as our understanding of the brain improves. Inspired by the human genome project, the new field of connectomics seeks to map every synapse in the brain. The European Union is investing a billion euros to build a soup-to-nuts model of it. AmericaвЂ™s BRAIN initiative, with $100 million in funding in 2014 alone, has similar aims. Nevertheless, symbolists are very skeptical of this path to the Master Algorithm. Even if we can image the whole brain at the level of individual synapses, we (ironically) need better machine-learning algorithms toturn those images into wiring diagrams; doing it by hand is out of the question. Worse than that, even if we had a complete map of the brain, we would still be at a loss to figure out what it does. The nervous system of theC. elegans worm consists of only 302 neurons and was completely mapped in 1986, but we still have only a fragmentary understanding of what it does. We need higher-level concepts to make sense of the morass of low-level details, weeding out the ones that are specific to wetware or just quirks of evolution. We donвЂ™t build airplanes by reverse engineering feathers, and airplanes donвЂ™t flap their wings. Rather, airplane designs are based on the principles of aerodynamics, which all flying objects must obey. We still do not understand those analogous principles of thought.. We can think of a Bayesian network as aвЂњgenerative model,вЂќ a recipe for probabilistically generating a state of the world: first decide independently whether thereвЂ™s a burglary and/or an earthquake, then based on that decide whether the alarm goes off, and then based on that whether Bob and Claire call. A Bayesian network tells a story: A happened, and it led to B; at the same time, C also happened, and B and C together caused D. To compute the probability of a particular story, we just multiply the probabilities of all of its different strands.. This is a radical departure from the way science is usually done. ItвЂ™s like saying, вЂњActually, neither Copernicus nor Ptolemy was right; letвЂ™s just predict the planetsвЂ™ future trajectories assuming Earth goes round the sun and vice versa and average the results.вЂќ. ConnectionistsвЂ™ models are inspired by the brain, with networks of S curves that correspond to neurons and weighted connections between them corresponding to synapses. In Alchemy, two variables are connected if they appear together in some formula, and the probability of a variable given its neighbors is an S curve. (Although I wonвЂ™t show why, itвЂ™s a direct consequence of the master equation we saw in the previous section.) The connectionistsвЂ™ master algorithm is backpropagation, which they use to figure out which neurons are responsible for which errors and adjust their weights accordingly. Backpropagation is a form of gradient descent, which Alchemy uses to optimize the weights of a Markov logic network.. The world has parts, and parts belong to classes: combining these two gives us most of what we need to make inference in Alchemy tractable. We can learn the worldвЂ™s MLN by breaking it into parts and subparts, such that most interactions are between subparts of the same part, and then grouping the parts into classes and subclasses. If the world is a Lego toy, we can break it up into individual bricks, remembering which attaches to which, and group the bricks by shape and color. If the world is Wikipedia, we can extract the entities it talks about, group them into classes, and learn how classes relate to each other. Then if someone asks us вЂњIs Arnold Schwarzenegger an action star?вЂќ we can answer yes, because heвЂ™s a star and heвЂ™s in action movies. Step-by-step, we can learn larger and larger MLNs, until weвЂ™re doing what a friend of mine at Google calls вЂњplanetary-scale machine learningвЂќ: modeling everyone in the world at once, with data continually streaming in and answers streaming out.. Three Algorithms for the Scientists under the sky,. Chapter Four. Neural Networks in Finance and Investing,* edited by Robert Trippi and Efraim Turban (McGraw-Hill, 1992), is a collection of articles on financial applications of neural networks.вЂњLife in the fast lane: The evolution of an adaptive vehicle control system,вЂќ by Todd Jochem and Dean Pomerleau (AI Magazine, 1996), describes the ALVINN self-driving car project. Paul WerbosвЂ™s PhD thesis isBeyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences* (Harvard University, 1974). Arthur Bryson and Yu-Chi Ho describe their early version of backprop inApplied Optimal Control* (Blaisdell, 1969)..