Why businesses embrace machine learning. An even more extreme candidate is the humble NOR gate: a logic switch whose output is 1 only if its inputs are both 0. Recall that all computers are made of logic gates built out of transistors, and all computations can be reduced to combinations of AND, OR, and NOT gates. A NOR gate is just an OR gate followed by a NOT gate: the negation of a disjunction, as inвЂњIвЂ™m happy as long as IвЂ™m not starving or sick.вЂќ AND, OR and NOT can all be implemented using NOR gates, so NOR can do everything, and in fact itвЂ™s all some microprocessors use. So why canвЂ™t it be the Master Algorithm? ItвЂ™s certainly unbeatable for simplicity. Unfortunately, a NOR gate is not the Master Algorithm any more than a Lego brick is the universal toy. It can certainly be a universal building block for toys, but a pile of Legos doesnвЂ™t spontaneously assemble itself into a toy. The same applies to other simple computation schemes, like Petri nets or cellular automata.. An example of a useless rule set is one that just covers the exact positive examples youвЂ™ve seen and nothing else. This rule set looks like itвЂ™s 100 percent accurate, but thatвЂ™s an illusion: it will predict that every new example is negative, and therefore get every positive one wrong. If there are more positive than negative examples overall, this will be even worse than flipping coins. Imagine a spam filter that decides an e-mail is spam only if itвЂ™s an exact copy of a previously labeled spam message. ItвЂ™s easy to learn and looks great on the labeled data, but you might as well have no spam filter at all. Unfortunately, our вЂњdivide and conquerвЂќ algorithm could easily learn a rule set like that.. Spin glasses are not actually glasses, although they have some glass-like properties. Rather, they are magnetic materials. Every electron is a tiny magnet by virtue of its spin, which can pointвЂњupвЂќ or вЂњdown.вЂќ In materials like iron, electronsвЂ™ spins tend to line up: if an electron with down spin is surrounded by electrons with up spins, it will probably flip to up. When most of the spins in a chunk of iron line up, it turns into a magnet. In ordinary magnets, the strength of interaction between adjacent spins is the same for all pairs, but in a spin glass it can vary; it may even be negative, causing nearby spins to point in opposite directions. The energy of an ordinary magnet is lowest when all its spins align, but in a spin glass, itвЂ™s not so simple. Indeed, finding the lowest-energy state of a spin glass is an NP-complete problem, meaning that just about every other difficult optimization problem can be reduced to it. Because of this, a spin glass doesnвЂ™t necessarily settle into its overall lowest energy state; much like rainwater may flow downhill into a lake instead of reaching the ocean, a spin glass may get stuck in a local minimum, a state with lower energy than all the states that can be reached from it by flipping a spin, rather than evolve to the global one.. [РљР°СЂС‚РёРЅРєР°: pic_12.jpg]. Wecould do away with the problem of local optima by taking out the S curves and just letting each neuron output the weighted sum of its inputs. That would make the error surface very smooth, leaving only one minimum-the global one. The problem, though, is that a linear function of linear functions is still just a linear function, so a network of linear neurons is no better than a single neuron. A linear brain, no matter how large, is dumber than a roundworm. S curves are a nice halfway house between the dumbness of linear functions and the hardness of step functions.. The breakthrough came in the early 1980s, when Judea Pearl, a professor of computer science at the University of California, Los Angeles, invented a new representation: Bayesian networks. Pearl is one of the most distinguished computer scientists in the world, his methods having swept through machine learning, AI, and many other fields. He won the Turing Award, the Nobel Prize of computer science, in 2012.. Driverless cars and other robots are a prime example of probabilistic inference in action. As the car drives around, it simultaneously builds up a map of the territory and figures out its location on it with increasing certainty. According to a recent study, London taxi drivers grow a larger posterior hippocampus, a brain region involved in memory and map making, as they learn the layout of the city. Perhaps they use similar probabilistic inference algorithms, with the notable difference that in the case of humans, drinking doesnвЂ™t seem to help.. Generally, the fewer support vectors an SVM selects, the better it generalizes. Any training example that is not a support vector would be correctly classified if it showed up as a test example instead because the frontier between positive and negative examples would still be in the same place. So the expected error rate of an SVM is at most the fraction of examples that are support vectors. As the number of dimensions goes up, this fraction tends to go up as well, so SVMs are not immune to the curse of dimensionality. But theyвЂ™re more resistant to it than most.. However, that doesnвЂ™t mean their chemical behavior is similar. Methane is a gas, while methanol is an alcohol. The second part of analogical reasoning is figuring out what we can infer about the new object based on similar ones weвЂ™ve found. This can be very simple or very complex. In nearest-neighbor or SVMs, it just consists of predicting the new objectвЂ™s class based on the classes of the nearest neighbors or support vectors. But in case-based reasoning, another type of analogical learning, the output can be a complex structure formed by composing parts of the retrieved objects. Suppose your HP printer is spewing out gibberish, and you call up their help desk. Chances are theyвЂ™ve seen your problem many times before, so a good strategy is to find those records and piece together a potential solution for your problem from them. This is not just a matter of finding complaints with many similar attributes to yours: for example, whether youвЂ™re using your printer with Windows or Mac OS X may cause very different settings of the system and the printer to become relevant. And once youвЂ™ve found the most relevant cases, the sequence of steps needed to solve your problem may be a combination of steps from different cases, with some further tweaks specific to yours.. You rack your brains for a solution, but the more you try, the harder it gets. Perhaps unifying logic and probability is just beyond human ability. Exhausted, you fall asleep. A deep growl jolts you awake. The hydra-headed complexity monster pounces on you, jaws snapping, but you duck at the last moment. Slashing desperately at the monster with the sword of learning, the only one that can slay it, you finally succeed in cutting off all its heads. Before it can grow new ones, you run up the stairs.. Of course, donвЂ™t be deceived by the simple MLN above for predicting the spread of flu. Picture instead an MLN for diagnosing and curing cancer. The MLN represents a probability distribution over the states of a cell. Every part of the cell, every organelle, every metabolic pathway, every gene and protein is anentity in the MLN, and the MLNвЂ™s formulas encode the dependencies between them. We can ask the MLN, вЂњIs this cell cancerous?вЂќ and probe it with different drugs and see what happens. We donвЂ™t have an MLN like this yet, but later in this chapter IвЂ™ll envisage how it might come about.. Sex, lies, and machine learning. The distinction between descriptive and normative theories was articulated by John Neville Keynes inThe Scope and Method of Political Economy (Macmillan, 1891).. The NaГЇve Bayes algorithm is first mentioned inPattern Classification and Scene Analysis,* by Richard Duda and Peter Hart (Wiley, 1973). Milton Friedman argues for oversimplified theories inвЂњThe methodology of positive economics,вЂќ which appears inEssays in Positive Economics (University of Chicago Press, 1966). The use of NaГЇve Bayes in spam filtering is described in вЂњStopping spam,вЂќ by Joshua Goodman, David Heckerman, and Robert Rounthwaite (Scientific American, 2005).вЂњRelevance weighting of search terms,вЂќ* by Stephen Robertson and Karen Sparck Jones (Journal of the American Society for Information Science, 1976), explains the use of NaГЇve Bayes-like methods in information retrieval..