Every computer scientist does battle with the complexity monster every day. When computer scientists lose the battle, complexity seeps into our lives. YouвЂ™ve probably noticed that many a battle has been lost. Nevertheless, we continue to build our tower of algorithms, with greater and greater difficulty. Each new generation of algorithms has to be built on top of the previous ones and has to deal with their complexities in addition to its own. Thetower grows taller and taller, and it covers the whole world, but itвЂ™s also increasingly fragile, like a house of cards waiting to collapse. One tiny error in an algorithm and a billion-dollar rocket explodes, or the power goes out for millions. Algorithms interact in unexpected ways, and the stock market crashes.. Enter the learner. The argument from physics. To this Chomsky might reply that engineering successes are not proof of scientific validity. On the other hand, if your buildings collapse and your engines donвЂ™t run, perhaps something is wrong with your theory of physics. Chomsky thinks linguists should focus on вЂњidealвЂќ speaker-listeners, as defined by him, and this gives him license to ignore things like the need for statistics in language learning. Perhaps itвЂ™s not surprising, then, that few experimentalists take his theories seriously any more.. Induction is the inverse of deduction. вЂ¦?вЂ¦. Inverting an operation is often difficult because the inverse is not unique. For example, a positive number has two square roots, one positive and one negative (22 = (-2)2 = 4). Most famously, integrating the derivative of a function only recovers the function up to a constant. The derivative of a function tells us how much that function goes up or down at each point. Adding up all those changes gives us the function back, except we donвЂ™t know where it started; we can вЂњslideвЂќ the integrated function up or down without changing the derivative. To make life easy, we can вЂњclamp downвЂќ the function by assuming the additive constant is zero. Inverse deduction has a similar problem, and NewtonвЂ™s principle is one solution. For example, fromAll Greek philosophers are human andAll Greek philosophers are mortal we can induce thatAll humans are mortal, or just thatAll Greeks are mortal. But why settle for the more modest generalization? Instead, we can assume that all humans are mortal until we meet an exception. (Which, according to Ray Kurzweil, will be soon.). More generally, inverse deduction is a great way to discover new knowledge in biology, and doing that is the first step in curing cancer. According to the Central Dogma, everything that happens in a living cell is ultimately controlled by its genes, via the proteins whose synthesis they initiate. In effect, a cell is like a tiny computer, and DNA is the program running on it: change the DNA, and a skin cell can become a neuron or a mouse cell can turn into a human one. In a computer program, all bugs are the programmerвЂ™s fault. But in a cell, bugs can arise spontaneously, when radiation or a copying error changes a gene into a different one, a gene is accidentally copied twice, and so on. Most of the time these mutations cause the cell to die silently, but sometimes the cell starts to grow and divide uncontrollably and a cancer is born.. KozaвЂ™s confidence stands out even in a field not known for its shrinking violets. He sees genetic programming as an invention machine, a silicon Edison for the twenty-first century. He and other evolutionaries believe it can learn any program, making it their entry in the Master Algorithm sweepstakes. In 2004, they instituted the annual Humie Awards to recognize вЂњhuman-competitiveвЂќ genetic creations; thirty-nine have been awarded to date.. The theorem that runs the world. Climbing the ladder. The first difficulty we face is that, when the data is all one big network, we no longer seem to have many examples to learn from, just one-and thatвЂ™s not enough. NaГЇve Bayes learns that a fever is a symptom of the flu by counting the number of fever-stricken flu patients. If it could only see one patient, it would either conclude that flu always causes fever or that it never does, both of which are wrong. We would like to learn that the flu is contagious by looking at the pattern of infections in a social network-a clump of infected people here, a clump of uninfected ones there-but we only have one pattern to look at, even if itвЂ™s in a network of seven billion people, so itвЂ™s not clear how to generalize. The key is to notice that, embedded in that big network, we have many examples ofpairs of people. If acquaintances are more likely to both have the flu
than pairs of people who have never met, then being acquainted with a flu patient makes you more likely to be one as well. Unfortunately, however, we canвЂ™t just count how many pairs of acquaintances in the data both have the flu and turn those counts into probabilities. This is because a person has many acquaintances, and all the pairwise probabilities donвЂ™t add up to a coherent model that lets us, for example, compute how likely someone is to have the flu given which of their acquaintances do. We didnвЂ™t have this problem when the examples were all separate, and we wouldnвЂ™t have it in, say, a society of childless couples, each living on their own desert island. But thatвЂ™s not the real world, and there wouldnвЂ™t be any epidemics in it, anyway.. Notice that the network has a separate feature for each pair of people:Alice and Bob both have the flu, Alice and Chris both have the flu, and so on. But we canвЂ™t learn a separate weight for each pair, because we only have one data point per pair (whether itвЂ™s infected or not), and we wouldnвЂ™t be able to generalize to members of the network we havenвЂ™t diagnosed yet (do Yvette and Zach both have the flu?). What we can do instead is learn a single weight for all features of the same form, based on all the instances of it that weвЂ™ve seen. In effect,X and Y have the flu is a template for features that can be instantiated with each pair of acquaintances (Alice and Bob, Alice and Chris, etc.). The weights for all the instances of a template areвЂњtied together,вЂќ in the sense that they all have the same value, and thatвЂ™s how we can generalize despite having only one example (the whole network). In nonrelational learning, the parameters of a model are tied in only one way: across all the independent examples (e.g., all the patients weвЂ™ve diagnosed). In relational learning, every feature template we create ties the parameters of all its instances.. Computational complexity is one thing, but human complexity is another. If computers are like idiot savants, learning algorithms can sometimes come across like child prodigies prone to temper tantrums. ThatвЂ™s one reason humans who can wrangle them into submission are so highly paid. If you know how to expertly tweak the control knobs until theyвЂ™re just right, magic can ensue, in the form of a stream of insights beyond the learnerвЂ™s years. And, not unlike the Delphic oracle, interpreting the learnerвЂ™s pronouncements can itself require considerable skill. Turn the knobs wrong, though, and the learner may spew out a torrent of gibberish or clam up in defiance. Unfortunately, in this regard Alchemy is no better than most. Writing down what you know in logic, feeding in the data, and pushingthe button is the fun part. When Alchemy returns a beautifully accurate and efficient MLN, you go down to the pub and celebrate. When it doesnвЂ™t-which is most of the time-the battle begins. Is the problem in the knowledge, the learning, or the inference? On the one hand, because of the learning and probabilistic inference, a simple MLN can do the job of a complex program. On the other, when it doesnвЂ™t work, itвЂ™s much harder to debug. The solution is to make it more interactive, able to introspect and explain its reasoning. That will take us another step closer to the Master Algorithm.. Hahahaha! Seriously, though, should we worry that machines will take over? The signs seem ominous. With every passing year, computers donвЂ™t just do more of the worldвЂ™s work; they make more of the decisions. Who gets credit, who buys what, who gets what job and what raise, which stocks will go up and down, how much insurance costs, where police officers patrol and therefore who gets arrested, how long their prison terms will be, who dates whom and therefore who will be born: machine-learned models already play a part in all of these. The point where we could turn off all our computers without causing the collapse of modern civilization has long passed. Machine learning is the last straw: if computers can start programming themselves, all hope of controlling them is surely lost. Distinguished scientists like Stephen Hawking have called for urgent research on this issue before itвЂ™s too late..