
A new study led by researchers at the University of Oxford and the Allen Institute for AI (Ai2) has found that large language models (LLMs)—the AI systems behind chatbots like ChatGPT—generalize language patterns in a surprisingly human-like way: through analogy, rather than strict grammatical rules.
The work is published in the journal Proceedings of the National Academy of Sciences.
The research challenges a widespread assumption about LLMs: that they learn how to generate language primarily by inferring rules from their training data. Instead, the models rely heavily on stored examples and draw analogies when dealing with unfamiliar words, much as people do.
To explore how LLMs generate language, the study compared judgments made by humans with those made by GPT-J (an open-source large language model developed by EleutherAI in 2021) on a very common word formation pattern in English, which turns adjectives into nouns by adding the suffix “-ness” or “-ity.” For instance, “happy” becomes “happiness,” and “available” becomes “availability.”
The research team generated 200 made-up English adjectives that the LLM had never encountered before—words such as “cormasive” and “friquish.” GPT-J was asked to turn each one into a noun by choosing between -ness and -ity (for example, deciding between “cormasivity” and “cormasiveness”). The LLM’s responses were compared to the choices made by people, and to predictions made by two well-established cognitive models. One model generalizes using rules, and another uses analogical reasoning based on similarity to stored examples.
The results revealed that the LLM’s behavior resembled human analogical reasoning. Rather than using rules, it based its answers on similarities to real words it had encountered during training—much as people do when thinking about new words. For instance, “friquish” is turned into “friquishness” on the basis of its similarity to words like “selfish,” whereas the outcome for “cormasive” is influenced by word pairs such as sensitive, sensitivity.
The study also found pervasive and subtle influences of how often word forms had appeared in the training data. The LLM’s responses on nearly 50,000 real English adjectives were probed, and its predictions matched the statistical patterns in its training data with striking precision. The LLM behaved as if it had formed a memory trace from every individual example of every word it has encountered during training. Drawing on these stored memories to make linguistic decisions, it appeared to handle anything new by asking itself: “What does this remind me of?”
The study also revealed a key difference between how human beings and LLMs form analogies over examples. Humans acquire a mental dictionary—a mental store of all the word forms that they consider to be meaningful words in their language, regardless of how often they occur. They easily recognize that forms like friquish and cormasive are not words of English at this time. To deal with these potential neologisms, they make analogical generalizations based on the variety of known words in their mental dictionaries.
The LLMs, in contrast, generalize directly over all the specific instances of words in the training set, without unifying instances of the same word into a single dictionary entry.
Senior author Janet Pierrehumbert, Professor of Language Modelling at Oxford University, said, “Although LLMs can generate language in a very impressive manner, it turns out that they do not think as abstractly as humans do. This probably contributes to the fact that their training requires so much more language data than humans need to learn a language.”
Co-lead author Dr. Valentin Hofman (Ai2 and University of Washington) said, “This study is a great example of synergy between linguistics and AI as research areas. The findings give us a clearer picture of what’s going on inside LLMs when they generate language, and will support future advances in robust, efficient, and explainable AI.”
The study also involved researchers from LMU Munich and Carnegie Mellon University.
More information:
Valentin Hofmann et al, Derivational morphology reveals analogical generalization in large language models, Proceedings of the National Academy of Sciences (2025). DOI: 10.1073/pnas.2423232122
Citation:
Like humans, ChatGPT favors examples and ‘memories,’ not rules, to generate language (2025, May 12)
retrieved 12 May 2025
from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.
Leave a comment