Inevitably, however, there is a serpent in this Eden. ItвЂ™s called the complexity monster. Like the Hydra, the complexity monster has many heads. One of them is space complexity: the number of bits of information an algorithm needs to store in the computerвЂ™s memory. If the algorithm needs more memory than the computer can provide, itвЂ™s useless and must be discarded. Then thereвЂ™s the evil sister, time complexity: how long the algorithm takes to run, that is, how many steps of using and reusing the transistors it has to go through before it produces the desired results. If itвЂ™s longer than we can wait, the algorithm is again useless. But the scariest face of the complexity monster is human complexity. When algorithms become too intricate for our poor human brains to understand, when the interactions between different parts of the algorithm are too many and too involved, errors creep in, we canвЂ™t find them and fix them, and the algorithm doesnвЂ™t do what we want. Even if we somehow make it work, it winds up being needlessly complicated for the people using it and doesnвЂ™t play well with other algorithms, storing up trouble for later.. The same dynamic happens in any market where thereвЂ™s lots of choice and lots of data. The race is on, and whoever learns fastest wins. It doesnвЂ™t stop with understanding customers better: companies can apply machine learning to every aspect of their operations, provided data is available, and data is pouring in from computers, communication devices, and ever-cheaper and more ubiquitous sensors. вЂњData is the new oilвЂќ is a popular refrain, and as with oil, refining it is big business. IBM, as well plugged into the corporate world as anyone, has organized its growth strategy around providing analytics to companies. Businesses look at data as a strategic asset: What data do I have that my competitors donвЂ™t? How can I take advantage of it? What data do my competitors have that I donвЂ™t?. In a perceptron, a positive weight represents an excitatory connection, and a negative weight an inhibitory one. The perceptron outputs 1 if the weighted sum of its inputs is above threshold, and 0 if itвЂ™s below. By varying the weights and threshold, we can change the function that the perceptron computes. This ignores a lot of the details of how neurons work, of course, but we want to keep things as simple as possible; our goal is to develop a general-purpose learning algorithm, not to build a realistic model of the brain. If some of the details we ignored turn out to be important, we can always add them in later. Despite our simplifying abstractions, however, we can still see how each part of this model corresponds to a part of the neuron:. In the perceptron algorithm, the error signal is all or none: you got it either right or wrong. ThatвЂ™s not much to go on, particularly if you have a network of many neurons. You may know that the output neuron is wrong (oops, that wasnвЂ™t your grandmother), but what about some neuron deep inside the brain? What does it even mean for such a neuron to be right or wrong? If the neuronsвЂ™ output is continuous instead of binary, the picture changes. For starters, we now knowhow much the output neuron is wrong by: the difference between it and the desired output. If the neuron should be firing away (вЂњOh hi, Grandma!вЂќ) and is firing a little, thatвЂ™s better than if itвЂ™s not firing at all. More importantly, we can now propagate that error to the hidden neurons: if the output neuron should fire more and neuron A connects to it, then the more A is firing, the more we should strengthen theirconnection; but if A is inhibited by another neuron B, then B should fire less, and so on. Based on the feedback from all the neurons itвЂ™s connected to, each neuron decides how much more or less to fire. Based on that and the activity ofits input neurons, it strengthens or weakens its connections to them. I need to fire more, and neuron B is inhibiting me? Lower its weight. And neuron C is firing away, but its connection to me is weak? Strengthen it. MyвЂњcustomerвЂќ neurons, downstream in the network, will tell me how well IвЂ™m doing in the next round.. Today, however, connectionism is resurgent. WeвЂ™re learning deeper networks than ever before, and theyвЂ™re setting new standards in vision, speech recognition, drug discovery, and other areas. The new field of deep learning is on the front page of theNew York Times. Look under the hood, andвЂ¦ surprise: itвЂ™s the trusty old backprop engine, still humming. What changed? Nothing much, say the critics: just faster computers and bigger data. To which Hinton and others reply: exactly, we were right all along!. In contrast to the connectionists and evolutionaries, symbolists and Bayesians do not believe in emulating nature. Rather, they want to figure out from first principles what learners should do-and that includes us humans. If we want to learn to diagnose cancer, for example, itвЂ™s not enough to say вЂњthis is how nature learns; letвЂ™s do the same.вЂќ ThereвЂ™s too much at stake. Errors cost lives. Doctors should diagnose in the most foolproof way they can, with methods similar to those mathematicians use to prove theorems, or as close to that as they can manage, given that itвЂ™s seldom possible to be that rigorous. They need to weigh the evidence to minimize the chances of a wrong diagnosis; or more precisely, so that the costlier an error is, the less likely they are to make it. (For example,
failing to find a tumor thatвЂ™s really there is potentially much worse than inferring one that isnвЂ™t.) They need to makeoptimal decisions, not just decisions that seem good.. Recommender systems, as theyвЂ™re also called, are big business: a third of AmazonвЂ™s business comes from its recommendations, as does three-quarters of NetflixвЂ™s. ItвЂ™s a far cry from the early days of nearest-neighbor, when it was considered impractical because of its memory requirements. Back then, computer memories were made of small iron rings, one per bit, and storing even a few thousand examples was taxing. How times have changed. Nevertheless, itвЂ™s not necessarily smart to remember all the examples youвЂ™ve seen and then have to search through them, particularly since most are probably irrelevant. If you look back at the map of Posistan and Negaland, you may notice that if Positiville disappeared, nothing would change. The metro areas of nearby cities would expand into the land formerly occupied by Positiville, but since theyвЂ™re all Posistan cities, the border with Negaland would stay the same. Theonly cities that really matter are the ones across the border from a city in the other country; all others we can omit. So a simple way to make nearest-neighbor more efficient is to delete all the examples that are correctly classified by their neighbors. This and other tricks enable nearest-neighbor methods to be used in some surprising areas, like controlling robot arms in real time. But needless to say, theyвЂ™re still not the first choice for things like high-frequency trading, where computers buy and sell stocks in fractions of a second. In a race between a neural network, which can be applied to an example with only a fixed number of additions, multiplications, and sigmoids and an algorithm that needs to search a large database for the exampleвЂ™s nearest neighbors, the neural network is sure to win.. [РљР°СЂС‚РёРЅРєР°: pic_28.jpg]. One of the most popular algorithms for nonlinear dimensionality reduction, called Isomap, does just this. It connects each data point in a high-dimensional space (a face, say) to all nearby points (very similar faces), computes the shortest distances between all pairs of points along the resulting network and finds the reduced coordinates that best approximate these distances. In contrast to PCA, facesвЂ™ coordinates in this space are often quite meaningful: one may represent which direction the face is facing (left profile, three quarters, head on, etc.); another how the face looks (very sad, a little sad, neutral, happy, very happy, etc.); and so on. From understanding motion in video to detecting emotion in speech, Isomap has a surprising ability to zero in on the most important dimensions of complex data.. The evaluation component is a scoring function that says how good a model is. Symbolists use accuracy or information gain. Connectionists use a continuous error measure, such as squared error, which is the sum of the squares of the differences between the predicted values and the true ones. Bayesians use the posterior probability. Analogizers (at least of the SVM stripe) use the margin. In addition to how well the model fits the data, all tribes take into account other desirable properties, such as the modelвЂ™s simplicity.. Planetary-scale machine learning. The last type of data-data you donвЂ™t share-also has a problem, which is that maybe you should share it. Maybe it hasnвЂ™t occurred to you to do so, maybe thereвЂ™s no easy way to, or maybe you just donвЂ™t want to. In the latter case, you should consider whether you have an ethical responsibility to share. One example weвЂ™ve seen is cancer patients, who can contribute to curing cancer by sharing their tumorsвЂ™ genomes and treatment histories. But it goes well beyond that. All sorts of questions about society and policy can potentially be answered by learning from the data we generate in our daily lives. Social science is entering a golden age, where it finally has data commensurate with the complexity of the phenomena it studies, and the benefits to all of us could be enormous-provided the data is accessible to researchers, policy makers, and citizens. This does not mean letting others peek into your private life; it means letting them see the learned models, which should contain only statistical information. So between you and them there needs to be an honest data broker that guarantees your data wonвЂ™t be misused, but also that no free riders share the benefits without sharing the data.. One Algorithm to rule them all, One Algorithm to find them,. ItвЂ™s natural to worry about intelligent machines taking over because the only intelligent entities we know are humans and other animals, and they definitely have a will of their own. But there is no necessary connection between intelligence and autonomous will; or rather, intelligence and will may not inhabit the same body, provided there is a line of control between them. InThe Extended Phenotype, Richard Dawkins shows how nature is replete with examples of an animalвЂ™s genes controlling more than its own body, from cuckoo eggs to beaver dams. Technology is the extended phenotype of man. This means we can continue to control it even if it becomes far more complex than we can understand.. Chapter One.