Why is Google worth so much more than Yahoo? They both make their money from showing ads on the web, and theyвЂ™re both top destinations. Both use auctions to sell ads and machine learning to predict how likely a user is to click on an ad (the higher the probability, the more valuable the ad). But GoogleвЂ™s learning algorithms are much better than YahooвЂ™s. This is not the only reason for the differencein their market caps, of course, but itвЂ™s a big one. Every predicted click that doesnвЂ™t happen is a wasted opportunity for the advertiser and lost revenue for the website. With GoogleвЂ™s annual revenue of $50 billion, every 1 percent improvement in click prediction potentially means another half billion dollars in the bank, every year, for the company. No wonder Google is a big fan of machine learning, and Yahoo and others are trying hard to catch up.. YouвЂ™re not the only one in dire straits-so are we. WeвЂ™ve only just set out on our road to the Master Algorithm and already we seem to have run into an insurmountable obstacle. Is thereany way to learn something from the past that we can be confident will apply in the future? And if there isnвЂ™t, isnвЂ™t machine learning a hopeless enterprise? For that matter, isnвЂ™t all of science, even all of human knowledge, on rather shaky ground?. Starting with restrictive assumptions and gradually relaxing them if they fail to explain the data is typical of machine learning, and the process is usually carried out automatically by the learner, without any help from you. First, it tries all single factors, then all conjunctions of two factors, then all conjunctions of three, and so on. But now we run into a problem: there area lot of conjunctive concepts and not enough time to try them all out.. The number of transistors in a computer is catching up with the number of neurons in a human brain, but the brain wins hands down in the number of connections. In a microprocessor, a typical transistor is directly connected to only a few others, and the planar semiconductor technology used severely limits how much better a computer can do. In contrast, a neuron has thousands of synapses. If youвЂ™re walking down the street and come across an acquaintance, it takes you only about a tenth of a second to recognize her. At neuron switching speeds, this is barely enough time for a hundred processing steps, but in those hundred steps your brain manages to scan your entire memory, find the bestmatch, and adapt it to the new context (different clothes, different lighting, and so on). In a brain, each processing step can be very complex and involve a lot of information, consonant with a distributed representation.. In a perceptron, a positive weight represents an excitatory connection, and a negative weight an inhibitory one. The perceptron outputs 1 if the weighted sum of its inputs is above threshold, and 0 if itвЂ™s below. By varying the weights and threshold, we can change the function that the perceptron computes. This ignores a lot of the details of how neurons work, of course, but we want to keep things as simple as possible; our goal is to develop a general-purpose learning algorithm, not to build a realistic model of the brain. If some of the details we ignored turn out to be important, we can always add them in later. Despite our simplifying abstractions, however, we can still see how each part of this model corresponds to a part of the neuron:. But then the perceptron hit a brick wall. The knowledge engineers were irritated by RosenblattвЂ™s claims and envious of all the attention and funding neural networks, and perceptrons in particular, were getting. One of them was Marvin Minsky, a former classmate of RosenblattвЂ™s at the Bronx High School of Science and by then the leader of the AI group at MIT. (Ironically, his PhD had beenon neural networks, but he had grown disillusioned with them.) In 1969, Minsky and his colleague Seymour Papert publishedPerceptrons, a book detailing the shortcomings of the eponymous algorithm, with example after example of simple things it couldnвЂ™t learn. The simplest one-and therefore the most damning-was the exclusive-OR function, or XOR for short, which is true if one of its inputs is true but not both. For example, NikeвЂ™s two most loyal demographics are supposedly teenage boys and middle-aged women. In other words, youвЂ™re likely to buy Nike shoes if youвЂ™re young XOR female. Young is good, female is good, but both is not. YouвЂ™re also an unpromising target for Nike advertising if youвЂ™re neither young nor female. The problem with XOR is that there is no straight line capable of separating the positive from the negative examples. This figure shows two failed candidates:. Physicist makes brain out of glass. Climbing mountains in hyperspace. A complete model of a cell. As in the nature versus nurture debate, neither side has the whole answer; the key is figuring out how to combine the two. The Master Algorithm is neither genetic programming nor backprop, but it has to include the key elements of both: structure learning and weight learning. In the conventional view, nature does its part first-evolving a brain-and then nurture takes it from there, filling the brain with information. We can easily reproduce this in learning algorithms. First, learn the structure of the network, using (for example) hill climbing to decide which neurons connect to which: try adding each possible new connection to the network, keep the one that most improves performance, and repeat. Then learn the connection weights using backprop, and your brand-new brain is ready to use.. Markov networks can be trained to maximize either the likelihood of the whole data or the conditional likelihood of what we want to predict given what we know. For Siri, the likelihood of the whole data isP(words, sounds), and the conditional likelihood weвЂ™re interested in isP(words | sounds). By optimizing the latter, we can ignoreP(sounds), which is only a distraction from our goal. And since we ignore it, it can be arbitrarily complex. This is much better than HMMsвЂ™ unrealistic assumption that sounds depend solely on the corresponding words, without any influence from the surroundings. In fact, if all Siri cares about is figuring out which words you just spoke, perhaps it doesnвЂ™t even need to worry about probabilities; it just needs to make sure the correct words score higher than incorrect ones when it tots up the weights of their features-ideally a lot higher, just to be safe.. Rise and shine. Machine learners call this process dimensionality reduction because it reduces a large number of visible dimensions (the pixels) to a few implicit ones (expression, facial features). Dimensionality reduction is essential for coping with big data-like the data coming in through your senses every second. A picture may be worth a thousand words, but itвЂ™s also a million times more costly to process and remember. Yet somehow your visual cortex does a pretty good job of whittling it down to a manageable amount of information, enough to navigate the world, recognize people and things, and remember what you saw. ItвЂ™s one of the great miracles of cognition and so natural youвЂ™re not even conscious of doing it.. Hahahaha! Seriously, though, should we worry that machines will take over? The signs seem ominous. With every passing year, computers donвЂ™t just do more of the worldвЂ™s work; they make more of the decisions. Who gets credit, who buys what, who gets what job and what raise, which stocks will go up and down, how much insurance costs, where police officers patrol and therefore who gets arrested, how long their prison terms will be, who dates whom and therefore who will be born: machine-learned models already play a part in all of these. The point where we could turn off all our computers without causing the collapse of modern civilization has long passed. Machine learning is the last straw: if computers can start programming themselves, all hope of controlling them is surely lost. Distinguished scientists like Stephen Hawking have called for urgent research on this issue before itвЂ™s too late.. вЂњLove, actuarially,вЂќ by Kevin Poulsen (Wired, 2014), tells the story of how one man used machine learning to find love on the OkCupid dating site.Dataclysm, by Christian Rudder (Crown, 2014), mines OkCupidвЂ™s data for sundry insights.Total Recall, by Gordon Moore and Jim Gemmell (Dutton, 2009), explores the implications of digitally recording everything we do.The Naked Future, by Patrick Tucker (Current, 2014), surveys the use and abuse of data for prediction in our world. Craig Mundie argues for a balanced approach to data collection and use inвЂњPrivacy pragmatismвЂќ (Foreign Affairs, 2014).The Second Machine Age, by Erik Brynjolfsson and Andrew McAfee (Norton, 2014), discusses how progress in AI will shape the future of work and the economy.вЂњWorld War R,вЂќ by Chris Baraniuk (New Scientist, 2014) reports on the debate surrounding the use of robots in battle.вЂњTranscending complacency on superintelligent machines,вЂќ by Stephen Hawking et al. (Huffington Post, 2014), argues that now is the time to worry about AIвЂ™s risks. Nick BostromвЂ™sSuperintelligence (Oxford University Press, 2014) considers those dangers and what to do about them..