In 1943, while the worldâs brightest physicists split atoms for the Manhattan Project, the American psychologist B.F. Skinner led his own secret government project to win World War II.Â
Skinner did not aim to build a new class of larger, more destructive weapons. Rather, he wanted to make conventional bombs more precise. The idea struck him as he gazed out the window of his train on the way to an academic conference. âI saw a flock of birds lifting and wheeling in formation as they flew alongside the train,â he wrote. âSuddenly I saw them as âdevicesâ with excellent vision and maneuverability. Could they not guide a missile?â
Skinner started his missile research with crows, but the brainy black birds proved intractable. So he went to a local shop that sold pigeons to Chinese restaurants, and âProject Pigeonâ was born. Though ordinary pigeons, Columba livia, were no oneâs idea of clever animals, they proved remarkably cooperative subjects in the lab. Skinner rewarded the birds with food for pecking at the right target on aerial photographsâand eventually planned to strap the birds into a device in the nose of a warhead, which they would steer by pecking at the target on a live image projected through a lens onto a screen.Â
The military never deployed Skinnerâs kamikaze pigeons, but his experiments convinced him that the pigeon was âan extremely reliable instrumentâ for studying the underlying processes of learning. âWe have used pigeons, not because the pigeon is an intelligent bird, but because it is a practical one and can be made into a machine,â he said in 1944.
People looking for precursors to artificial intelligence often point to science fiction by authors like Isaac Asimov or thought experiments like the Turing test. But an equally important, if surprising and less appreciated, forerunner is Skinnerâs research with pigeons in the middle of the 20th century. Skinner believed that associationâlearning, through trial and error, to link an action with a punishment or rewardâwas the building block of every behavior, not just in pigeons but in all living organisms, including human beings. His âbehavioristâ theories fell out of favor with psychologists and animal researchers in the 1960s but were taken up by computer scientists who eventually provided the foundation for many of the artificial-intelligence tools from leading firms like Google and OpenAI. Â
These companiesâ programs are increasingly incorporating a kind of machine learning whose core conceptâreinforcementâis taken directly from Skinnerâs school of psychology and whose main architects, the computer scientists Richard Sutton and Andrew Barto, won the 2024 Turing Award, an honor widely considered to be the Nobel Prize of computer science. Reinforcement learning has helped enable computers to drive cars, solve complex math problems, and defeat grandmasters in games like chess and Goâbut it has not done so by emulating the complex workings of the human mind. Rather, it has supercharged the simple associative processes of the pigeon brain.Â
Itâs a âbitter lessonâ of 70 years of AI research, Sutton has written: that human intelligence has not worked as a model for machine learningâinstead, the lowly principles of associative learning are what power the algorithms that can now simulate or outperform humans on a variety of tasks. If artificial intelligence really is close to throwing off the yoke of its creators, as many people fear, then our computer overlords may be less like ourselves than like ârats with wingsââand planet-size brains. And even if itâs not, the pigeon brain can at least help demystify a technology that many worry (or rejoice) is âbecoming human.âÂ
In turn, the recent accomplishments of AI are now prompting some animal researchers to rethink the evolution of natural intelligence. Johan Lind, a biologist at Stockholm University, has written about the âassociative learning paradox,â wherein the process is largely dismissed by biologists as too simplistic to produce complex behaviors in animals but celebrated for producing humanlike behaviors in computers. The research suggests not only a greater role for associative learning in the lives of intelligent animals like chimpanzees and crows, but also far greater complexity in the lives of animals weâve long dismissed as simple-minded, like the ordinary Columba livia.Â
When Sutton began working in AI, he felt as if he had a âsecret weapon,â he told me: He had studied psychology as an undergrad. âI was mining the psychological literature for animals,â he says.
Ivan Pavlov began to uncover the mechanics of associative learning at the end of the 19th century in his famous experiments on âclassical conditioning,â which showed that dogs would salivate at a neutral stimulusâlike a bell or flashing lightâif it was paired predictably with the presentation of food. In the middle of the 20th century, Skinner took Pavlovâs principles of conditioning and extended them from an animalâs involuntary reflexes to its overall behavior.Â
Skinner wrote that âbehavior is shaped and maintained by its consequencesââthat a random action with desirable results, like pressing a lever that releases a food pellet, will be âreinforcedâ so that the animal is likely to repeat it. Skinner reinforced his lab animalsâ behavior step by step, teaching rats to manipulate marbles and pigeons to play simple tunes on four-key pianos. The animals learned chains of behavior, through trial and error, in order to maximize long-term rewards. Skinner argued that this type of associative learning, which he called âoperant conditioningâ (and which other psychologists had called âinstrumental learningâ), was the building block of all behavior. He believed that psychology should study only behaviors that could be observed and measured without ever making reference to an âinner agentâ in the mind.
When Richard Sutton began working in AI, he felt as if he had a âsecret weaponâ: He studied psychology as an undergrad. âI was mining the psychological literature for animals,â he says.
Skinner thought that even human language developed through operant conditioning, with children learning the meanings of words through reinforcement. But his 1957 book on the subject, Verbal Behavior, provoked a brutal review from Noam Chomsky, and psychologyâs focus started to swing from observable behavior to innate âcognitiveâ abilities of the human mind, like logic and symbolic thinking. Biologists soon rebelled against behaviorism also, attacking psychologistsâ quest to explain the diversity of animal behavior through an elementary and universal mechanism. They argued that each species evolved specific behaviors suited to its habitat and lifestyle, and that most behaviors were inherited, not learned.Â
By the â70s, when Sutton started reading about Skinnerâs and similar experiments, many psychologists and researchers interested in intelligence had moved on from pea-brained pigeons, which learn mostly by association, to large-brained animals with more sophisticated behaviors that suggested potential cognitive abilities. âThis was clearly old stuff that was not exciting to people anymore,â he told me. Still, Sutton found these old experiments instructive for machine learning: âI was coming to AI with an animal-learning-theorist mindset and seeing the big lack of anything like instrumental learning in engineering.âÂ
Many engineers in the second half of the 20th century tried to model AI on human intelligence, writing convoluted programs that attempted to mimic human thinking and implement rules that govern human response and behavior. This approachâcommonly called âsymbolic AIââwas severely limited; the programs stumbled over tasks that were easy for people, like recognizing objects and words. It just wasnât possible to write into code the myriad classification rules human beings use to, say, separate apples from oranges or cats from dogsâand without pattern recognition, breakthroughs in more complex tasks like problem solving, game playing, and language translation seemed unlikely too. These computer scientists, the AI skeptic Hubert Dreyfus wrote in 1972, accomplished nothing more than âa small engineering triumph, an ad hoc solution of a specific problem, without general applicability.â
Pigeon research, however, suggested another route. A 1964 study showed that pigeons could learn to discriminate between photographs with people and photographs without people. Researchers simply presented the birds with a series of images and rewarded them with a food pellet for pecking an image showing a person. They pecked randomly at first but quickly learned to identify the right images, including photos where people were partially obscured. The results suggested that you didnât need rules to sort objects; it was possible to learn concepts and use categories through associative learning alone.Â
When Sutton began working with Barto on AI in the late â70s, they wanted to create a âcomplete, interactive goal-seeking agentâ that could explore and influence its environment like a pigeon or rat. âWe always felt the problems we were studying were closer to what animals had to face in evolution to actually survive,â Barto told me. The agent needed two main functions: search, to try out and choose from many actions in a situation, and memory, to associate an action with the situation where it resulted in a reward. Sutton and Barto called their approach âreinforcement learningâ; as Sutton said, âItâs basically instrumental learning.â In 1998, they published the definitive exploration of the concept in a book, Reinforcement Learning: An Introduction.Â
Over the following two decades, as computing power grew exponentially, it became possible to train AI on increasingly complex tasksâthat is, essentially, to run the AI âpigeonâ through millions more trials.Â
Programs trained with a mix of human input and reinforcement learning defeated human experts at chess and Atari. Then, in 2017, engineers at Google DeepMind built the AI program AlphaGo Zero entirely through reinforcement learning, giving it a numerical reward of +1 for every game of Go that it won and â1 for every game that it lost. Programmed to seek the maximum reward, it began without any knowledge of Go but improved over 40 days until it attained what its creators called âsuperhuman performance.â Not only could it defeat the worldâs best human players at Go, a game considered even more complicated than chess, but it actually pioneered new strategies that professional players now use.Â
âHumankind has accumulated Go knowledge from millions of games played over thousands of years,â the programâs builders wrote in Nature in 2017. âIn the space of a few days, starting tabula rasa, AlphaGo Zero was able to rediscover much of this Go knowledge, as well as novel strategies that provide new insights into the oldest of games.â The teamâs lead researcher was David Silver, who studied reinforcement learning under Sutton at the University of Alberta.
Today, more and more tech companies have turned to reinforcement learning in products such as consumer-facing chatbots and agents. The first generation of generative AI, including large language models like OpenAIâs GPT-2 and GPT-3, tapped into a simpler form of associative learning called âsupervised learning,â which trained the model on data sets that had been labeled by people. Programmers often used reinforcement to fine-tune their results by asking people to rate a programâs performance and then giving these ratings back to the program as goals to pursue. (Researchers call this âreinforcement learning from feedback.â)Â
Then, last fall, OpenAI revealed its o-series of large language models, which it classifies as âreasoningâ models. The pioneering AI firm boasted that they are âtrained with reinforcement learning to perform reasoningâ and claimed they are capable of âa long internal chain of thought.â The Chinese startup DeepSeek also used reinforcement learning to train its attention-grabbing âreasoningâ LLM, R1. âRather than explicitly teaching the model on how to solve a problem, we simply provide it with the right incentives, and it autonomously develops advanced problem-Âsolving strategies,â they explained.
These descriptions might impress users, but at least psychologically speaking, they are confused. A computer trained on reinforcement learning needs only search and memory, not reasoning or any other cognitive mechanism, in order to form associations and maximize rewards. Some computer scientists have criticized the tendency to anthropomorphize these modelsâ âthinking,â and a team of Apple engineers recently published a paper noting their failure at certain complex tasks and âraising crucial questions about their true reasoning capabilities.â
Sutton, too, dismissed the claims of reasoning as âmarketingâ in an email, adding that âno serious scholar of mind would use âreasoningâ to describe what is going on in LLMs.â Still, he has argued, with Silver and other coauthors, that the pigeonsâ methodâlearning, through trial and error, which actions will yield rewardsâis âenough to drive behavior that exhibits most if not all abilities that are studied in natural and artificial intelligence,â including human language âin its full richness.âÂ
In a paper published in April, Sutton and Silver stated that âtodayâs technology, with appropriately chosen algorithms, already provides a sufficiently powerful foundation to ⌠rapidly progress AI towards truly superhuman agents.â The key, they argue, is building AI agents that depend less than LLMs on human dialogue and prejudgments to inform their behavior.Â
âPowerful agents should have their own stream of experience that progresses, like humans, over a long time-scale,â they wrote. âUltimately, experiential data will eclipse the scale and quality of human generated data. This paradigm shift, accompanied by algorithmic advancements in RL, will unlock in many domains new capabilities that surpass those possessed by any human.â
If computers can do all that with just a pigeonlike brain, some animal researchers are now wondering if actual pigeons deserve more credit than theyâre commonly given.Â
âWhen considered in light of the accomplishments of AI, the extension of associative learning to purportedly more complicated forms of cognitive performance offers fresh prospects for understanding how biological systems may have evolved,â Ed Wasserman, a psychologist at the University of Iowa, wrote in a recent study in the journal Current Biology.Â
Wasserman trained pigeons to succeed at a complex categorization task, which several undergraduate students failed. The students tried to find a rule that would help them sort various discs; the pigeons simply developed a sense for the group to which any given disc belonged.
In one experiment, Wasserman trained pigeons to succeed at a complex categorization task, which several undergraduate students failed. The students tried, in vain, to find a rule that would help them sort various discs with parallel black lines of various widths and tilts; the pigeons simply developed a sense, through practice and association, for the group to which any given disc belonged.Â
Like Sutton, Wasserman became interested in behaviorist psychology when Skinnerâs theories were out of fashion. He didnât switch to computer science, however: He stuck with pigeons. âThe pigeon lives or dies by these really rudimentary learning rules,â Wasserman told me recently, âbut they are powerful enough to have succeeded colossally in object recognition.â In his most famous experiments, Wasserman trained pigeons to detect cancerous tissue and symptoms of heart disease in medical scans as accurately as experienced doctors with framed diplomas behind their desks. Given his results, Wasserman found it odd that so many psychologists and ethologists regarded associative learning as a crude, mechanical mechanism, incapable of producing the intelligence of clever animals like apes, elephants, dolphins, parrots, and crows.Â
Other researchers also started to reconsider the role of associative learning in animal behavior after AI started besting human professionals in complex games. âWith the progress of artificial intelligence, which in essence is built upon associative processes, it is increasingly ironic that associative learning is considered too simple and insufficient for generating biological intelligence,â Lind, the biologist from Stockholm University, wrote in 2023. He often cites Sutton and Bartoâs computer science in his biological research, and he believes itâs human beingsâ symbolic language and cumulative cultures that really put them in a cognitive category of their own.
Ethologists generally propose cognitive mechanisms, like theory of mind (that is, the ability to attribute mental states to others), to explain remarkable animal behaviors like social learning and tool use. But Lind has built models showing that these flexible behaviors could have developed through associative learning, suggesting that there may be no need to invoke cognitive mechanisms at all. If animals learn to associate a behavior with a reward, then the behavior itself will come to approximate the value of the reward. A new behavior can then become associated with the first behavior, allowing the animal to learn chains of actions that ultimately lead to the reward. In Lindâs view, studies demonstrating self-control and planning in chimpanzees and ravens are probably describing behaviors acquired through experience rather than innate mechanisms of the mind. Â
Lind has been frustrated with what he calls the âlow standard that is accepted in animal cognition studies.â As he wrote in an email, âMany researchers in this field do not seem to worry about excluding alternative hypotheses and they seem happy to neglect a lot of current and historical knowledge.â There are some signs, though, that his arguments are catching on. A group of psychologists not affiliated with Lind referenced his âassociative learning paradoxâ last year in a criticism of a Current Biology study, which purported to show that crows used âtrue statistical inferenceâ and not âlow-level associative learning strategiesâ in an experiment. The psychologists found that they could explain the crowsâ performance with a simple reinforcement-Âlearning modelââexactly the kind of low-level associative learning process that [the original authors] ruled out.âÂ
Skinner might have felt vindicated by such arguments. He lamented psychologyâs cognitive turn until his death in 1990, maintaining that it was scientifically irresponsible to probe the minds of living beings. After âProject Pigeon,â he became increasingly obsessed with âbehavioristâ solutions to societal problems. He went from training pigeons for war to inventions like the âAir Crib,â which aimed to âsimplifyâ baby care by keeping the infant behind glass in a climate-Âcontrolled chamber and eliminating the need for clothing and bedding. Skinner rejected free will, arguing that human behavior is determined by environmental variables, and wrote a novel, Walden II, about a utopian community founded on his ideas.
People who care about animals might feel uneasy about a revival in behaviorist theory. The âcognitive revolutionâ broke with centuries of Western thinking, which had emphasized human supremacy over animals and treated other creatures like stimulus-response machines. But arguing that animals learn by association is not the same as arguing that they are simple-minded. Scientists like Lind and Wasserman do not deny that internal forces like instinct and emotion also influence animal behavior. Sutton, too, believes that animals develop models of the world through their experiences and use them to plan actions. Their point is not that intelligent animals are empty-headed but that associative learning is a much more powerfulâindeed, âcognitiveââmechanism than many of their peers believe. The psychologists who recently criticized the study on crows and statistical inference did not conclude that the birds were stupid. Rather, they argued âthat a reinforcement learning model can produce complex, flexible behaviour.â
This is largely in line with the work of another psychologist, Robert Rescorla, whose work in the â70s and â80s influenced both Wasserman and Sutton. Rescorla encouraged people to think of association not as a âlow-level mechanical processâ but as âthe learning that results from exposure to relations among events in the environmentâ and âa primary means by which the organism represents the structure of its world.âÂ
This is true even of a laboratory pigeon pecking at screens and buttons in a small experimental box, where scientists carefully control and measure stimuli and rewards. But the pigeonâs learning extends outside the box. Wassermanâs students transport the birds between the aviary and the laboratory in bucketsâand experienced pigeons jump immediately into the buckets whenever the students open the doors. Much as Rescorla suggested, they are learning the structure of their world inside the laboratory and the relation of its parts, like the bucket and the box, even though they do not always know the specific task they will face inside.Â
Comparative psychologists and animal researchers have long grappled with a question that suddenly seems urgent because of AI: How do we attribute sentience to other living beings?
The same associative mechanisms through which the pigeon learns the structure of its world can open a window to the kind of inner life that Skinner and many earlier psychologists said did not exist. Pharmaceutical researchers have long used pigeons in drug-discrimination tasks, where theyâre given, say, an amphetamine or a sedative and rewarded with a food pellet for correctly identifying which drug they took. The birdsâ success suggests they both experience and discriminate between internal states. âIs that not tantamount to introspection?â Wasserman asked.
It is hard to imagine AI matching a pigeon on this specific taskâa reminder that, though AI and animals share associative mechanisms, there is more to life than behavior and learning. A pigeon deserves ethical consideration as a living creature not because of how it learns but because of what it feels. A pigeon can experience pain and suffer, while an AI chatbot cannotâeven if some large language models, trained on corpora that include descriptions of human suffering and sci-fi stories of sentient computers, can trick people into believing otherwise.Â
âThe intensive public and private investments into AI research in recent years have resulted in the very technologies that are forcing us to confront the question of AI sentience today,â two philosophers of science wrote in Aeon in 2023. âTo answer these current questions, we need a similar degree of investment into research on animal cognition and behavior.â Indeed, comparative psychologists and animal researchers have long grappled with questions that suddenly seem urgent because of AI: How do we attribute sentience to other living beings? How can we distinguish true sentience from a very convincing performance of sentience?
Such an undertaking would yield knowledge not only about technology and animals but also about ourselves. Most psychologists probably wouldnât go as far as Sutton in arguing that reward is enough to explain most if not all human behavior, but no one would dispute that people often learn by association too. In fact, most of Wassermanâs undergraduate students eventually succeeded at his recent experiment with the striped discs, but only after they gave up searching for rules. They resorted, like the pigeons, to association and couldnât easily explain afterwards what theyâd learned. It was just that with enough practice, they started to get a feel for the categories.Â
It is another irony about associative learning: What has long been considered the most complex form of intelligenceâa cognitive ability like rule-based learningâmay make us human, but we also call on it for the easiest of tasks, like sorting objects by color or size. Meanwhile, some of the most refined demonstrations of human learningâlike, say, a sommelier learning to taste the difference between grapesâare learned not through rules, but only through experience.Â
Learning through experience relies on ancient associative mechanisms that we share with pigeons and countless other creatures, from honeybees to fish. The laboratory pigeon is not only in our computers but in our brainsâand the engine behind some of humankindâs most impressive feats.Â
Ben Crair is a science and travel writer based in Berlin.Â