On the other hand, the following is an algorithm for playing tic-tac-toe:. If so few learners can do so much, the logical question is: Could one learner do everything? In other words, could a single algorithm learn all that can be learned from data? This is a very tall order, since it would ultimately include everything in an adultвЂ™s brain, everything evolution has created, and the sum total of all scientific knowledge. But in fact all the major learners-including nearest-neighbor, decision trees, and Bayesian networks, a generalization of NaГЇve Bayes-are universal in the following sense: if you give the learner enough ofthe appropriate data, it can approximate any function arbitrarily closely-which is math-speak for learning anything. The catch is that вЂњenough dataвЂќ could be infinite. Learning from finite data requires making assumptions, as weвЂ™ll see, and different learners make different assumptions, whichmakes them good for some things but not others.. The argument from physics. If youвЂ™re a member of the Sierra Club and read science-fiction books, youвЂ™ll like Avatar.. The power of rule sets is a double-edged sword. On the upside, you know you can always find a rule set that perfectly matches the data. But before you start feeling lucky, realize that youвЂ™re at severe risk of finding a completely meaningless one. Remember the вЂњno free lunchвЂќ theorem: you canвЂ™t learn without knowledge. And assuming that the concept can be defined by a set of rules is tantamount to assuming nothing.. ThatвЂ™s only the beginning, however. Most cancers involve a combination of mutations, or can only be cured by drugs that havenвЂ™t been discovered yet. The next step is to learn rules with more complex conditions, involving the cancerвЂ™s genome, the patientвЂ™s genome and medical history, known side effects of drugs, and so on. But ultimately what we need is a model of how the entire cell works, enabling us to simulate on the computer the effect of a specific patientвЂ™s mutations, as well as the effect of different combinations of drugs, existing or speculative. Our main sources of informationfor building such models are DNA sequencers, gene expression microarrays, and the biological literature. Combining these is where inverse deduction can shine.. Because of its origins and guiding principles, symbolist machine learning is still closer to the rest of AI than the other schools. If computer science were a continent, symbolist learning would share a long border with knowledge engineering. Knowledge is traded in both directions-manually entered knowledge for use in learners, induced knowledge for addition to knowledge bases-but at the end of the day the rationalist-empiricist fault line runs right down that border, and crossing it is not easy.. But if humans have all these abilities that their brains didnвЂ™t learn by tweaking synapses, where did they come from? Unless you believe in magic, the answer must be evolution. If youвЂ™re a connectionism skeptic and you have the courage of your convictions, it behooves you to figure out how evolution learned everything a baby knows at birth-and the more you think is innate, the taller the order. But if you can figure it out and program a computer to do it, it would be churlish to deny that youвЂ™ve invented at least one version of the Master Algorithm.. NaГЇve Bayes is a good conceptual model of a learner to use when reading the press: it captures the pairwise correlation between each input and the output, which is often all thatвЂ™s needed to understand references to
learning algorithms in news stories. But machine learning is not just pairwise correlations, of course, any more than the brain is just one neuron. The real action begins when we look for more complex patterns.. [РљР°СЂС‚РёРЅРєР°: pic_19.jpg]. Inference in Bayesian networks is not limited to computing probabilities. It also includes finding the most probable explanation for the evidence, such as the disease that best explains the symptoms or the words that best explain the sounds Siri heard. This is not the same as just picking the most probable word at each step, because words that are individually likely given their sounds may be unlikely to occur together, as in theвЂњCall the pleaseвЂќ example. However, similar kinds of algorithms also work for this task (and they are, in fact, what most speech recognizers use). Most importantly, inference includes making the best decisions, guided not just by the probabilities of different outcomes but also by the corresponding costs (or utilities, to use the technical term). The cost of ignoring an e-mail from your boss asking you to do something by tomorrow is much greater than the cost of seeing a piece of spam, so often itвЂ™s better to let an e-mail through even if it does seem fairly likely to be spam.. Clearly, we need both logic and probability. Curing cancer is a good example. A Bayesian network can model a single aspect of how cells function, like gene regulation or protein folding, but only logic can put all the pieces together into a coherent picture. On the other hand, logic canвЂ™t deal with incomplete or noisy information, which is pervasive in experimental biology, but Bayesian networks can handle it with aplomb.. You can probably tell just by looking at this plot that the main street in Palo Alto runs southwest-northeast. You didnвЂ™t draw a street, but you can intuit that itвЂ™s there from the fact that all the points fall along a straight line (or close to it-they can be on different sides of the street). Indeed, the street is University Avenue, and if you want to shop or eat out in Palo Alto, thatвЂ™s the place to go. Asa bonus, once you know that the shops are on University Avenue, you donвЂ™t need two numbers to locate them, just one: the street number (or, if you wanted to be really precise, the distance from the shop to the Caltrain station, on the southwest corner, which is where University Avenue begins).. HereвЂ™s an interesting experiment. Take the video stream from RobbyвЂ™s eyes, treat each frame as a point in the space of images, and reduce that set of images to a single dimension. What will you discover? Time. Like a librarian arranging books on a shelf, time places each image next to its most similar ones. Perhaps our perception of it is just a natural result of our brainsвЂ™ dimensionality reduction prowess. In the road network of memory, time is the main thoroughfare, and we soon find it. Time, in other words, is the principal component of memory.. To share or not to share, and how and where.