Your clock radio goes off at 7:00 a.m. ItвЂ™s playing a song you havenвЂ™t heard before, but you really like it. Courtesy of Pandora, itвЂ™s been learning your tastes in music, like your own personal radio jock. Perhaps the song itself was produced with the help of machine learning. You eat breakfast and read the morning paper. It came off the printing press a few hours earlier, the printing process carefully adjusted to avoid streaking using a learning algorithm. The temperature in your house is just right, and your electricity bill noticeably down, since you installed a Nest learning thermostat.. Machine learning is sometimes confused with artificial intelligence (or AI for short). Technically, machine learning is a subfield of AI, but itвЂ™s grown so large and successful that it now eclipses its proud parent. The goal of AI is to teach computers to do what humans currently do better, and learning is arguably the most important of those things: without it, no computer can keep up with a human for long; with it, the rest follows.. The larger outcome is that democracy works better because the bandwidth of communication between voters and politicians increases enormously. In these days of high-speed Internet, the amount of information your elected representatives get from you is still decidedly nineteenth century: a hundred bits or so every two years, as much as fits on a ballot. This is supplemented by polling and perhaps the occasional e-mail or town-hall meeting, but thatвЂ™s still precious little. Big data and machine learning change the equation. In the future, provided voter models are accurate, elected officials will be able to ask voters what they want a thousand times a day and act accordingly-without having to pester the actual flesh-and-blood citizens.. Nevertheless, physics is unique in its simplicity. Outside physics and engineering, the track record of mathematics is more mixed. Sometimes itвЂ™s only reasonably effective, and sometimes its models are too oversimplified to be useful. This tendency to oversimplify stems from the limitations of the human mind, however, not from the limitations of mathematics. Most of the brainвЂ™s hardware (or rather, wetware) is devoted to sensing and moving, and to do math we have to borrow parts of it that evolved for language. Computers have no such limitations and can easily turn big data into very complex models. Machine learning is what you get when the unreasonable effectiveness of mathematics meets the unreasonable effectiveness of data. Biology and sociology will never be as simple as physics, but the method by which we discover their truths can be.. Most of all, we have to worry about what the Master Algorithm could do in the wrong hands. The first line of defense is to make sure the good guys get it first-or, if itвЂ™s not clear who the good guys are, to make sure itвЂ™s open-sourced. The second is to realize that, no matter how good the learning algorithm is, itвЂ™s only as good as the data it gets. He who controls the data controls the learner. Your reaction to the datafication of life should not be to retreat to a log cabin-the woods, too, are full of sensors-but to aggressively seek control of the data that matters to you. ItвЂ™s good to have recommenders that find what you want and bring it to you; youвЂ™d feel lost without them. But they should bring you whatyou want, not what someone else wants you to have. Control of data and ownership of the models learned from it is what many of the twenty-first centuryвЂ™s battles will be about-between governments, corporations, unions, and individuals. But you also have an ethical duty to share data for the common good. Machine learning alone will not cure cancer; cancer patients will, by sharing their data for the benefit of future patients.. The power of a theory lies in how much it simplifies our description of the world. Armed with NewtonвЂ™s laws, we only need to know the masses, positions, and velocities of all objects at one point in time; their positions and velocities at all times follow. So NewtonвЂ™s laws reduce our description of the world by a factor of the number of distinguishable instants in the history of the universe,past and future. Pretty amazing! Of course, NewtonвЂ™s laws are only an approximation of the true laws of physics, so letвЂ™s replace them with string theory, ignoring all its problems and the question of whether it can ever be empirically validated. Can we do better? Yes, for two reasons.. Aristotle said that there is nothing in the intellect that was not first in the senses. Leibniz added,вЂњExcept the intellect itself.вЂќ The human brain is not a blank slate because itвЂ™s not a slate. A slate is passive, something you write on, but the brain actively processes the information it receives. Memory is the slate it writes on, and it does start out blank. On the other hand, a computeris a blank slate until you program it; the active process itself has to be written into memory before anything can happen. Our goal is to figure out the simplest program we can write such that it will continue to write itself by reading data, without limit, until it knows everything there is to know.. HebbвЂ™s rule, as it has come to be known, is the cornerstone of connectionism. Indeed, the field derives its name from the belief that knowledge is stored in the connections between neurons. Donald Hebb, a Canadian psychologist, stated it this way in his 1949 bookThe Organization of Behavior:вЂњWhen an axon of cellA is near enough cellB and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such thatAвЂ™s efficiency, as one of the cells firingB, is increased.вЂќ ItвЂ™s often paraphrased as вЂњNeurons that fire together wire together.вЂќ. Hopfield noticed an interesting similarity between spin glasses and neural networks: an electronвЂ™s spin responds to the behavior of its neighbors much like a neuron does. In the electronвЂ™s case, it flips up if the weighted sum of the neighbors exceeds a threshold and flips (or stays) down otherwise. Inspired by this, he defined a type of neural network that evolves over time in the same way that a spin glass does and postulated that the networkвЂ™s minimum energy states are its memories. Each such state has a вЂњbasin of attractionвЂќ of initial states that converge to it, and in this way the network can do pattern recognition: for example, if one of the memories is the pattern of black-and-white pixels formed by the digit nine and the network sees a distorted nine, it will converge to the вЂњidealвЂќ one and thereby recognize it. Suddenly, a vast body of physical theory was applicable to machine learning, and a flood of statistical physicists poured into the field, helping itbreak out of the local minimum it had been stuck in.. Of all the possible genomes, very few correspond to viable organisms. The typical fitness landscape thus consists of vast flatlands with occasional sharp peaks, making evolution very hard. If you start out blindfolded in Kansas, you have no idea which way the Rockies lie, and youвЂ™ll wander around for a long time before you bump into their foothills and start climbing. But if you combine evolution with neural learning, something interesting happens. If youвЂ™re on flat ground, but not too far from the foothills, neural learning can get you there, and the closer you are tothe foothills, the more likely it will. ItвЂ™s like being able to scan the horizon: it wonвЂ™t help you in Wichita, but in Denver youвЂ™ll see the Rockies in the distance and head that way. Denver now looks a lot fitter than it did when you were blindfolded. The net effect is to widen the fitness peaks, making it possible for you to find your way to them from previously very tough places, like point A in this graph:. The dark hulk of the cathedral rises from the night. Light pours from its stained-glass windows, projecting intricate equations onto the streets and buildings beyond. As you approach, you can hear chanting inside. It seems to be Latin, or perhaps math, but the Babel fish in your ear translates it into English:вЂњTurn the crank! Turn the crank!вЂќ Just as you enter, the chant dissolves into an вЂњAaaah!вЂќ of satisfaction, and a murmur of вЂњThe posterior! The posterior!вЂќ You peek through the crowd. A massive stone tablet towers above the altar with a formula engraved on it in ten-foot letters:. After pioneering the application of machine learning to spam filtering, David Heckerman turned to using Bayesian networks in the fight against AIDS. The AIDS virus is a tough adversary because it mutates rapidly, making it difficult for any one vaccine or drug to pin it down for long. Heckerman noticed that this is the same cat-and-mouse game that spam filters play with spam and decided to apply a lesson he had learned there: attack the weakest link. In the case of spam, weak links include the URLs you have to use to take payment from the customer. In the case of HIV, theyвЂ™re small regions of the virus protein that canвЂ™t change without hurting the virus. If he could train the immune system to recognize these regions and attack the cells displaying them, he just might have an AIDS vaccine. Heckerman and coworkers used a Bayesian network to help identify the vulnerable regions and developed a vaccine delivery mechanism that could teach the immune system to attack just those regions. The delivery mechanism worked in mice, and clinical trials are now in preparation.. The crucial question for inference is whether you can make the filled-in graphвЂњlook like a treeвЂќ without the trunk getting too thick. If the megavariable in the trunk has too many possible values, the tree grows out of control until it covers the whole planet, like the baobabs inThe Little Prince. In the tree of life, each species is a branch, but inside each branch is a graph, with each creature having two parents, four grandparents, some number of offspring, and so on. TheвЂњthicknessвЂќ of a branch is the size of the speciesвЂ™ population. When the branches are too thick, our only choice is to resort to approximate inference.. [РљР°СЂС‚РёРЅРєР°: pic_22.jpg]. WeвЂ™re not limited to pairwise or individual features. Facebook wants to predict who your friends are so it can recommend them to you. It can use the ruleFriends of friends are likely to be friends for that, but each instance of it involves three people: if Alice and Bob are friends, and Bob and Chris are also friends, then Alice and Chris are potential friends. H. L. MenckenвЂ™s quip that a man is wealthy if he makes more than his wifeвЂ™s sisterвЂ™s husband involves four people. Each of these rules can be turned into a feature template in a relational model, and a weight for it can be learned based on how often the feature occurs in the data. As in Markov networks, the features themselves can also be learned from the data..