Enter the learner. Out in cyberspace, learning algorithms man the nationвЂ™s ramparts. Every day, foreign attackers attempt to break into computers at the Pentagon, defense contractors, and other companies and government agencies. Their tactics change continually; what worked against yesterdayвЂ™s attacks is powerless against todayвЂ™s. Writing code to detect and blockeach one would be as effective as the Maginot Line, and the PentagonвЂ™s Cyber Command knows it. But machine learning runs into a problem if an attack is the first of its kind and there arenвЂ™t any previous examples of it to learn from. Instead, learners build models of normal behavior, of which thereвЂ™s plenty, and flag anomalies. Then they call in the cavalry (aka system administrators). If cyberwar ever comes to pass, the generals will be human, but the foot soldiers will be algorithms. Humans are too slow and too few and would be quickly swamped by an army of bots. We need our own bot army, and machine learning is like West Point for bots.. Some may say that seeking a universal learner is the epitome of techno-hubris. But dreaming is not hubris. Maybe the Master Algorithm will take its place among the great chimeras, alongside the philosopherвЂ™s stone and the perpetual motion machine. Or perhaps it will be more like finding the longitude at sea, given up as too difficult until a lone genius solved it. More likely, it will be the work of generations, raised stone by stone like a cathedral. The only way to find out is to get up early one day and set out on the journey.. In practice, Valiant-style analysis tends to be very pessimistic and to call for more data than you have. So how do you decide whether to believe what a learner tells you? Simple: you donвЂ™t believe anything until youвЂ™ve verified it on data thatthe learner didnвЂ™t see. If the patterns the learner hypothesized also hold true on new data, you can be pretty confident that theyвЂ™re real. Otherwise you know the learner overfit. This is just the scientific method applied to machine learning: itвЂ™s not enough for a new theory to explain past evidence because itвЂ™s easy to concoct a theory that does that; the theory must also make new predictions, and you only accept it after theyвЂ™ve been experimentally verified. (And even then only provisionally, because future evidence could still falsify it.). Socrates is human.. Symbolist machine learning is an offshoot of the knowledge engineering school of AI. In the 1970s, so-called knowledge-based systems scored some impressive successes, and in the 1980s they spread rapidly, but then they died out. The main reason they did was the infamous knowledge acquisition bottleneck: extracting knowledge from experts and encoding it as rules is just too difficult, labor-intensive, and failure-prone to be viable for most problems. Letting the computer automatically learn to, say, diagnose diseases by looking at databases of past patientsвЂ™ symptoms and the corresponding outcomes turned out to be much easier than endlessly interviewing doctors. Suddenly, the work of pioneers like Ryszard Michalski, Tom Mitchell, and Ross Quinlan had a new relevance, and the field hasnвЂ™t stopped growing since. (Another important problem was that knowledge-based systems had trouble dealing with uncertainty, of which more in Chapter 6.). The number of transistors in a computer is catching up with the number of neurons in a human brain, but the brain wins hands down in the number of connections. In a microprocessor, a typical transistor is directly connected to only a few others, and the planar semiconductor technology used severely limits how much better a computer can do. In contrast, a neuron has thousands of synapses. If youвЂ™re walking down the street and come across an acquaintance, it takes you only about a tenth of a second to recognize her. At neuron switching speeds, this is barely enough time for a hundred processing steps, but in those hundred steps your brain manages to scan your entire memory, find the bestmatch, and adapt it to the new context (different clothes, different lighting, and so on). In a brain, each processing step can be very complex and involve a lot of information, consonant with a distributed representation.. The theorem that runs the world. In 1913, on the eve of World War I, the Russian mathematician Andrei Markov published a paper applying probability to, of all things, poetry. In it, he modeled a classic of Russian literature, PushkinвЂ™sEugene Onegin, using what we now call a Markov chain. Rather than assume that each letter was generated at random independently of the rest, he introduced a bare minimum of sequential structure: he let the probability of each letter depend on the letter immediately preceding it. He showed that, for example, vowels and consonants tend to alternate, so if you see a consonant, the next letter (ignoring punctuation and white space) is much more likely to be a vowel than it would be if letters were independent. This may not seem like much, but in the days before computers, it required spending hours manually counting characters, and MarkovвЂ™s idea was quite new. IfVoweli is a Boolean variable thatвЂ™s true if theith letter ofEugene Onegin is a vowel and false if itвЂ™s a consonant, we can represent MarkovвЂ™s model with a chain-like graph like this, with an arrow between two nodes indicating a direct dependency between the corresponding variables:. Of course, frequentists are aware of this issue, and their answer is to, for example,
multiply the likelihood by a factor that penalizes more complex networks. But at this point frequentism and Bayesianism have become indistinguishable, and whether you call the scoring functionвЂњpenalized likelihoodвЂќ or вЂњposterior probabilityвЂќ is really just a matter of taste.. Imagine for a moment trying to pull off such a stunt. You sneak into an absent doctorвЂ™s office, and before long a patient comes in and tells you all his symptoms. Now you have to diagnose him, except you know nothing about medicine. All you have is a cabinet full of patient files: their symptoms, diagnoses, treatments undergone, and so on. What do you do? The easiest way out is to look in the files for the patient whose symptoms most closely resemble your current oneвЂ™s and make the same diagnosis. If your bedside manner is as convincing as AbagnaleвЂ™s, that might just do the trick. The same idea applies well beyond medicine. If youвЂ™re a young president faced with a world crisis, as Kennedy was when a US spy plane revealed Soviet nuclear missiles being deployed in Cuba, chances are thereвЂ™s no script ready to follow. Instead, you look for historical analogs of the current situation and try to learn from them. The Joint Chiefs of Staff urged an attack on Cuba, butKennedy, having just readThe Guns of August, a best-selling account of the outbreak of World War I, was keenly aware of how easily that could escalate into all-out war. So he opted for a naval blockade instead, perhaps saving the world from nuclear war.. Decision trees are not immune to the curse of dimensionality either. LetвЂ™s say the concept youвЂ™re trying to learn is a sphere: points inside it are positive, and points outside it are negative. A decision tree can approximate a sphere by the smallest cube it fits inside. Not perfect, but not too bad either: only the corners of the cube get misclassified. But in high dimensions, almost the entire volume of the hypercube lies outside the hypersphere. For every example you correctly classify as positive, you incorrectly classify many negative ones as positive, causing your accuracy to plummet.. The effort to build what will ultimately become CanceRx is already under way. Researchers in the new field of systems biology model whole metabolic networks rather than individual genes and proteins. One group at Stanford has built a model of a whole cell. The Global Alliance for Genomics and Health promotes data sharing among researchers and oncologists, with a view to large-scale analysis. CancerCommons.org assembles cancer models and lets patients pool their histories and learn from similar cases. Foundation Medicine pinpoints the mutations in a patientвЂ™s tumor cells and suggests the most appropriate drugs. A decade ago, it wasnвЂ™t clear if, or how, cancer would ever be cured. Now we can see how to get there. The road is long, but we have found it.. Online dating is in fact a tough example because chemistry is hard to predict. Two people who hit it off on a date may wind up falling in love and believing passionately that they were made for each other, but if their initial conversation takes a different turn, they might instead find each other annoying and never want to meet again. What a really sophisticated learner would do is run a thousand Monte Carlo simulations of a date between each pair of plausible matches and rank the matches by the fraction of dates that turned out well. Short of that, dating sites can organize parties and invite people who are each a likely match for many of the others, letting them accomplish in a few hours what would otherwise take weeks.. Picture two strands of DNA going for a swim in their private pool, aka a bacteriumвЂ™s cytoplasm, two billion years ago. TheyвЂ™re pondering a momentous decision. вЂњIвЂ™m worried, Diana,вЂќ says one. вЂњIf we start making multicellular creatures, will they take over?вЂќ Fast-forward to the twenty-first century, and DNA is still alive and well. Better than ever, in fact, with anincreasing fraction living safely in bipedal organisms comprising trillions of cells. ItвЂ™s been quite a ride for our tiny double-stranded friends since they made their momentous decision. Humans are their trickiest creation yet; weвЂ™ve invented things like contraception that let us have fun without spreading our DNA, and we have-or seem to have-free will. But itвЂ™s still DNA that shapes our notions of fun, and we use our free will to pursue pleasure and avoid pain, which, for the most part, still coincides with whatвЂ™s best for our DNAвЂ™s survival. We may yet be DNAвЂ™s demise if we choose to transmute ourselves into silicon, but even then, itвЂ™s been a great two billion years. The decision we face today is similar: if we start making AIs-vast, interconnected, superhuman, unfathomable AIs-will they take over? Not any more than multicellular organisms took over from genes, vast and unfathomable as we may be to them. AIs are our survival machines, in the same way that we are our genesвЂ™..