
Having humans participate in a study can be time-consuming and expensive for researchers stretching limited budgets on strict deadlines. Sophisticated, generative large language models (LLM) can complete many tasks, so some researchers and companies have explored the idea of using them in studies instead of human participants.
Researchers from Carnegie Mellon University’s School of Computer Science identified fundamental limitations to using LLMs in qualitative research focused on a human’s perspective, including the ways information is gathered and aggregated and issues surrounding consent and data collection.
“We looked into this question of if LLM-based agents can replace human participation in qualitative research, and the high-level answer was no,” said Hoda Heidari, the K&L Gates Career Development Assistant Professor in Ethics and Computational Technologies in CMU’s Software and Societal Systems Department (S3D) and Machine Learning Department.
“There are all sorts of nuances that human participants contribute that you cannot possibly get out of LLM-based agents, no matter how good the technology is.”
The team’s paper, “Simulacrum of Stories: Examining Large Language Models as Qualitative Research Participants,” received an honorable mention award at the Association for Computing Machinery’s Conference on Human Factors in Computing Systems (CHI 2025) last week in Yokohama, Japan.
Team members from SCS included Heidari; Shivani Kapania, a doctoral student in the Human-Computer Interaction Institute (HCII); William Agnew, the Carnegie Bosch Postdoctoral Fellow in the HCII; Motahhare Eslami, an assistant professor in the HCII and S3D; and Sarah Fox, an assistant professor in the HCII.
The paper is available in the Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems.
LLMs are used as tools in training across a variety of fields. In the medical and legal professions, these tools allow professionals to simulate and practice real-life scenarios, such as a therapist training to identify mental health crises. In qualitative research, which is often interview-based, LLMs are being trained to mimic human behavior in their responses to questions and prompts.
In the study, the CMU team interviewed 19 humans with experience in qualitative research. Participants interacted with an LLM chatbot-style tool, typing messages back and forth. The tool allowed researchers to compare LLM-generated data with human-generated data and reflect on ethical concerns.
Researchers identified several ways using LLMs as study participants introduced limitations to scientific inquiry, including the model’s method of gathering and interpreting knowledge. Study participants noted that the LLM tool often compiled its answers from multiple sources and fit them—sometimes unnaturally—into one response.
For example, in a study about factory working conditions, a worker on the floor and a manager would likely have different responses about a variety of aspects of the work and workplace. Yet an LLM participant generating responses might combine these two perspectives into one answer—conflating attitudes in ways not reflective of reality.
Another way the LLM responder introduced problems into the scientific inquiry process was in the form of consent. In the paper, the researchers note that LLMs trained on publicly available data from a social media platform could raise questions about informed consent and if the people whose data the models are trained on have the option to opt out.
Overall, the study raises doubts about using LLMs as study participants, noting ethical concerns and questions about the validity of these tools.
“These models are encoded with the biases, assumptions and power dynamics of model producers and the data and contexts from which they are derived,” the researchers wrote. “As such, their use in research reshapes the nature of the knowledge produced, often in ways that reinforce existing hierarchies and exclusions.”
More information:
Shivani Kapania et al, Simulacrum of Stories: Examining Large Language Models as Qualitative Research Participants, Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (2025). DOI: 10.1145/3706598.3713220
Citation:
Can generative AI replace humans in qualitative research studies? (2025, May 14)
retrieved 14 May 2025
from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.
Leave a comment