Every computer scientist does battle with the complexity monster every day. When computer scientists lose the battle, complexity seeps into our lives. YouвЂ™ve probably noticed that many a battle has been lost. Nevertheless, we continue to build our tower of algorithms, with greater and greater difficulty. Each new generation of algorithms has to be built on top of the previous ones and has to deal with their complexities in addition to its own. Thetower grows taller and taller, and it covers the whole world, but itвЂ™s also increasingly fragile, like a house of cards waiting to collapse. One tiny error in an algorithm and a billion-dollar rocket explodes, or the power goes out for millions. Algorithms interact in unexpected ways, and the stock market crashes.. Machine learning also has a growing role on the battlefield. Learners can help dissipate the fog of war, sifting through reconnaissance imagery, processing after-action reports, and piecing together a picture of the situation for the commander. Learning powers the brains of military robots, helping them keep their bearings, adapt to the terrain, distinguish enemy vehicles from civilian ones, and home in on their targets. DARPAвЂ™s AlphaDog carries soldiersвЂ™ gear for them. Drones can fly autonomously with the help of learning algorithms; although they are still partly controlled by human pilots, the trend is for one pilot to oversee larger and larger swarms. In the army of the future, learners will greatly outnumber soldiers, saving countless lives.. ItвЂ™s not an exaggeration to say that this innocuous-sounding statement is at the heart of the Newtonian revolution and of modern science. KeplerвЂ™s laws applied to exactly six entities: the planets of the solar system known in his time. NewtonвЂ™s laws apply to every last speck of matter in the universe. The leap in generality between the two is staggering, and itвЂ™s a direct consequence of NewtonвЂ™s principle. This one principle is all by itself a knowledge pump of phenomenal power. Without it there would be no laws of nature, only a forever incomplete patchwork of small regularities.. An example of a useless rule set is one that just covers the exact positive examples youвЂ™ve seen and nothing else. This rule set looks like itвЂ™s 100 percent accurate, but thatвЂ™s an illusion: it will predict that every new example is negative, and therefore get every positive one wrong. If there are more positive than negative examples overall, this will be even worse than flipping coins. Imagine a spam filter that decides an e-mail is spam only if itвЂ™s an exact copy of a previously labeled spam message. ItвЂ™s easy to learn and looks great on the labeled data, but you might as well have no spam filter at all. Unfortunately, our вЂњdivide and conquerвЂќ algorithm could easily learn a rule set like that.. In practice, Valiant-style analysis tends to be very pessimistic and to call for more data than you have. So how do you decide whether to believe what a learner tells you? Simple: you donвЂ™t believe anything until youвЂ™ve verified it on data thatthe learner didnвЂ™t see. If the patterns the learner hypothesized also hold true on new data, you can be pretty confident that theyвЂ™re real. Otherwise you know the learner overfit. This is just the scientific method applied to machine learning: itвЂ™s not enough for a new theory to explain past evidence because itвЂ™s easy to concoct a theory that does that; the theory must also make new predictions, and you only accept it after theyвЂ™ve been experimentally verified. (And even then only provisionally, because future evidence could still falsify it.). The molecular biology of living cells is such a mess that molecular biologists often quip that only people who donвЂ™t know any of it could believe in intelligent design. The architecture of the brain may well have similar faults-the brain has many constraints that computers donвЂ™t, like very limited short-term memory-and thereвЂ™s no reason to stay within them. Moreover, we know of many situations where humans seem to consistently do the wrong thing, as Daniel Kahneman illustrates at length in his bookThinking, Fast and Slow.. The problem is worse than it seems, because Bayesian networks in effect haveвЂњinvisibleвЂќ arrows to go along with the visible ones.Burglary andEarthquake are a priori independent, but the alarm going off entangles them: the alarm makes you suspect a burglary, but if now you hear on the radio that thereвЂ™s been an earthquake, you assume thatвЂ™s what caused the alarm. The earthquake hasexplained away the alarm, making a burglary less likely, and the two are therefore dependent. In a Bayesian network, all parents of the same variable are interdependent in this way, and this in turn introduces further dependencies, making the resulting graph often much denser than the original one.. Clearly, we need both logic and probability. Curing cancer is a good example. A Bayesian network can model a single aspect of how cells function, like gene regulation or protein folding, but only logic can put all the pieces together into a coherent picture. On the other hand, logic canвЂ™t deal with incomplete or noisy information, which is pervasive in experimental biology, but Bayesian networks can handle it with aplomb.. Analogy was the spark that ignited many of historyвЂ™s greatest scientific advances. The theory of natural selection was born when Darwin, on reading MalthusвЂ™sEssay on Population, was struck by the parallels between the struggle for survival in the economy and in nature. BohrвЂ™s model of the atom arose from seeing it as a miniature solar system, with electrons as the planets and the nucleus as the sun. KekulГ© discovered the ring shape of the benzene molecule after daydreaming of a snake eating its own tail.. Nearest-neighbor is the simplest and fastest learning algorithm ever invented. In fact, you could even say itвЂ™s the fastest algorithm of any kind that could ever be invented. It consists of doing exactly nothing, and therefore takes zero time to run. CanвЂ™t beat that. If you want to learn to recognize faces and have a vast database of images labeled face/not face, just let it sit there. DonвЂ™t worry, be happy. Without knowing it, those images already implicitly form a model of what a face is. Suppose youвЂ™re Facebook and you want to automatically identify faces in photos people upload as a prelude to tagging them with their friendsвЂ™ names. ItвЂ™s nice to not have to do anything, given that Facebook users upload upward of three hundred million photos per day. Applying any of the learners weвЂ™ve seen so far to them, with the possible exception of NaГЇve Bayes, would take a truckload of computers. And NaГЇve Bayes is not smart enough to recognize faces.. Decision trees are not immune to the curse of dimensionality either. LetвЂ™s say the concept youвЂ™re trying to learn is a sphere: points inside it are positive, and points outside it are negative. A decision tree can approximate a sphere by the smallest cube it fits inside. Not perfect, but not too bad either: only the corners of the cube get misclassified. But in high dimensions, almost the entire volume of the hypercube lies outside the hypersphere. For every example you correctly classify as positive, you incorrectly classify many negative ones as positive, causing your accuracy to plummet.. A society of models. Another objection to robot armies is that they make war too easy. But if we unilaterally relinquish them, that could cost us the next war. The logical response, advocated by the United Nations and Human Rights Watch, is a treaty banning robot warfare, similar to the Geneva Protocol of 1925 banning chemical and biological warfare. This misses a crucial distinction, however. Chemical and biological warfare can only increase human suffering, but robot warfare can greatly decrease it. If a war is fought by machines, with humans only in command positions, no one is killed or wounded. Perhaps, then, what we should do, instead of outlawing robot soldiers, is-when weвЂ™re ready-outlaw human soldiers.. Of course, robot armies also raise a whole different specter. According to Hollywood, the future of humanity is to be snuffed out by a gargantuan AI and its vast army of machine minions. (Unless, of course, a plucky hero saves the day in the last five minutes of the movie.) Google already has the gargantuan hardware such an AI would need, and itвЂ™s recently acquired an army of robotics startups to go with it. If we drop the Master Algorithm into its servers, is it game over for humanity? Why yes, of course. ItвЂ™s time to reveal my true agenda, with apologies to Tolkien:. Judea PearlвЂ™s pioneering work on Bayesian networks appears in his bookProbabilistic Reasoning in Intelligent Systems* (Morgan Kaufmann, 1988).вЂњBayesian networks without tears,вЂќ* by Eugene Charniak (AI Magazine, 1991), is a largely nonmathematical introduction to them.вЂњProbabilistic interpretation for MYCINвЂ™s certainty factors,вЂќ* by David Heckerman (Proceedings of the Second Conference on Uncertainty in Artificial Intelligence, 1986), explains when sets of rules with confidence estimates are and arenвЂ™t a reasonable approximation to Bayesian networks. вЂњModule networks: Identifying regulatory modules and their condition-specific regulators from gene expression data,вЂќ by Eran Segal et al. (Nature Genetics, 2003), is an example of using Bayesian networks to model gene regulation.вЂњMicrosoft virus fighter: Spam may be more difficult to stop than HIV,вЂќ by Ben Paynter (Fast Company, 2012), tells how David Heckerman took inspiration from spam filters and used Bayesian networks to design a potential AIDS vaccine. The probabilistic orвЂњnoisyвЂќ OR is explained in PearlвЂ™s book.* вЂњProbabilistic diagnosis using a reformulation of the INTERNIST-1/QMR knowledge base,вЂќ by M. A. Shwe et al. (Parts I and II,Methods of Information in Medicine, 1991), describes a noisy-OR Bayesian network for medical diagnosis. GoogleвЂ™s Bayesian network for ad placement is described in Section 26.5.4 of Kevin MurphyвЂ™sMachine Learning* (MIT Press, 2012). MicrosoftвЂ™s player rating system is described in вЂњTrueSkillTM: A Bayesian skill rating system,вЂќ* by Ralf Herbrich, Tom Minka, and Thore Graepel (Advances in Neural Information Processing Systems 19, 2007)..