Guarding Data Integrity—Tackling Challenges in Online Surveys
Issue 18, January 2025

In this month's Digital Health Research Roundup, guest editor Jeffrey Hardesty delves into the critical topic of data integrity in online surveys. With a research methods focus, this edition explores the challenges and innovations in ensuring reliable, high-quality data collection in the digital age.
Guest Editor's Remarks:
Web-based surveys are an essential tool for health research, providing fast, cost-effective methods for data collection. However, ensuring data integrity in 2025 requires researchers to understand potential threats and vulnerabilities in their methodologies—and implement effective mitigation strategies. Key concerns include inattentive participants who provide low-quality data (e.g., straightlined data), fraudulent submissions from bots and other survey takers, and the emerging issue of large language models (LLMs) like ChatGPT, which can generate inauthentic and homogenized open-ended responses. Employing appropriate recruitment decisions and processes can help minimize the impact of these threats—and their secondary effects—on the data collected, preserving the integrity of study results.
Crowdsourcing sites and online panels such as Mechanical Turk (MTurk) and Prolific are popular options for recruitment of both general population samples and populations with higher prevalence of specific characteristics, but data integrity can vary significantly between platforms. For instance, at the time of assessment, research by Douglas et al. (2023) found Prolific (67.9% of participants submitted surveys with high quality data) and CloudResearch (62.0%) were superior to MTurk (26.4%) and Qualtrics (53.2%) in terms of attentiveness and reducing multiple submissions. Such variation is likely due to differences in vulnerabilities and mitigation strategies in place on these sites, potentially requiring researchers to augment mitigation strategies based on study objectives. For example, although MTurk has been criticized for yielding studies with poor data integrity, Kennedy et al. (2020) showed that the application of fraud detection tools by researchers—such as filtering foreign IP addresses and virtual private networks (VPNs)—can enhance MTurk data integrity.
Bespoke solutions for recruitment of an original sample via social media or Craigslist offer researchers the potential for larger samples of populations with lower prevalence of specific characteristics and more representative samples; however, such solutions require deeper risk mitigation expertise when compared to working with crowdsourcing sites and online panels, as mentioned above. Essential steps include careful consideration of threats (e.g., fraudulent submissions from bots and other survey takers) and vulnerabilities (e.g., anonymity), the implementation of an appropriate array of mitigation strategies (e.g., authenticating personal information and requiring photo submission of relevant items unique to the population), and effective monitoring procedures (e.g., reviewing open-ended responses and data quality on a regular basis during data collection). Our study, “From Doubt to Confidence,” demonstrates that original samples with high data integrity, while more labor-intensive than crowdsourcing, can be achieved. Most of the strategies we describe can also be applied to surveys on crowdsourcing sites and online panels when vulnerabilities are identified.
A growing concern for all web-based survey approaches is the use of LLMs by participants to complete open-ended survey questions. Zhang et al. (2024) found that over one-third of participants in a Prolific survey admitted to using LLMs, citing reasons like improved writing quality and speed. Unfortunately, participants’ prompts rarely incorporated their pre-existing views and social positions—suggesting a potential lack of original thought in survey responses. Furthermore, LLM-generated responses tend to be less emotional, more analytical, and less varied than human responses, suggesting participant responses written or aided by LLMs may not capture the full scope of human expression. To mitigate this, researchers can request that participants refrain from using LLMs and implement measures to prevent copy-pasting (e.g., JavaScript code for surveys built on Qualtrics). As LLMs evolve and become more commonplace, more advanced detection tools may be needed to identify and address LLM-generated responses.
Improving data integrity in online surveys
CGDHI key takeaways and comments on the research articles hand-picked by our guest editor:
Data quality in human subjects research
B. D. Douglas et al., Data quality in online human-subjects research: Comparisons between MTurk, Prolific, CloudResearch, Qualtrics, and SONA, PLoS ONE, 2023
This study examines the quality of data collected from five popular online survey platforms: MTurk, Prolific, CloudResearch, Qualtrics, and SONA (used in university surveys). It evaluates data quality based on participant attention, and response consistency, comparing the platforms in terms of reliability and cost-effectiveness.
Key Takeaways:
- At the time platforms were assessed, Prolific and CloudResearch provided the highest-quality data, outperforming MTurk, Qualtrics, and SONA on key metrics such as attention check passage and meaningful response rates. Prolific and CloudResearch had 67.9% and 61.9% high-quality respondents, respectively, compared to MTurk's 26.4%.
- In terms of value for money, Prolific ($1.90) and CloudResearch ($2.00) offered the best cost per high-quality respondent, significantly outperforming MTurk ($4.36) and Qualtrics ($8.17). While SONA participants were unpaid, data collection took significantly longer.
- MTurk demonstrated lower-quality responses and higher conspiracy belief scores, while Qualtrics faced demographic and time-to-collection limitations. SONA's high attrition and duplicate submissions posed additional challenges.
Comment from the Center for Global Digital Health Innovation: This comprehensive analysis provides valuable insights into optimizing online data collection strategies. Variations in data quality between platforms highlights the importance of integrity features such as robust participant vetting and attention checks. However, the study could benefit from exploring how platform-specific demographic quotas—predefined participant selection criteria set by survey platforms—impact data quality and generalizability. Future research should examine the scalability of these findings across diverse study designs, while also integrating privacy-preserving measures to address ethical concerns associated with participant tracking. Additionally, investigating the impact of various payment incentives on data quality across platforms could offer practical guidance for researchers managing tight budgets.
Investigating a quality crisis
R. Kennedy et al., The Shape of and Solutions to the MTurk Quality Crisis, Political Science Research and Methods, 2020
This study investigates the quality crisis on Amazon’s Mechanical Turk (MTurk), driven by fraudulent responses from users employing virtual private servers (VPS) or participating from non-eligible locations. It explores the origins, scale, and impact of fraudulent responses and introduces tools for detection and prevention.
Key Takeaways:
- An analysis of 38 studies revealed a significant increase in fraudulent respondents since 2018, with many using VPS and/or being located outside the U.S., impacting data quality through low attention and nonsensical responses.
- Fraudulent respondents diluted experimental treatment effects and contributed to lower-quality data, failing attention checks and cultural knowledge questions at much higher rates than legitimate respondents.
- The authors introduced tools, including R and Stata packages as well as a Qualtrics protocol, to detect and block fraudulent respondents, improving data integrity with minimal disruption to legitimate users.
Comment from the Center for Global Digital Health Innovation: This article provides an insightful exploration of data quality challenges on MTurk and offers practical tools for addressing them. Its strength lies in the dual approach of retrospective analysis and proactive prevention. However, the reliance on IP-based detection may raise concerns about ethical implications for privacy-conscious users and risks excluding valid participants inadvertently. For example, individuals using virtual private servers for privacy reasons, may be classified as fraudulent despite their legitimate intent. Future research should evaluate the effectiveness of these solutions across diverse demographic and geographic contexts, and explore alternative, non-IP-based methods for fraud detection that maintain participant privacy while ensuring data quality, if reliable and robust fraud detection tools are available. Additionally, further studies could assess the long-term viability of MTurk as a research platform in light of these challenges.
Overcoming fraudulent survey submissions
Hardesty et al., From Doubt to Confidence: How We Overcame Fraudulent Survey Submissions from Bots and Other Survey Takers of a Web-based Survey, Journal of Medical Internet Research, 2023
This study chronicles the challenges and solutions encountered during a longitudinal web-based survey on e-cigarette use. It focuses on overcoming issues with fraudulent submissions, including bots and individuals exploiting the survey system. The research details key vulnerabilities, mitigation strategies, and lessons learned from the successful completion of five survey waves.
Key Takeaways:
- The study's first survey wave experienced significant fraudulent submissions due to reliance on anonymity, lack of robust fraud detection, and vulnerabilities such as cookie clearing and VPN use. Only 22.4% of the 1,624 responses were deemed “likely valid”.
- Future survey waves implemented robust measures, including personal identity verification, CAPTCHA, shortened data collection windows, and mandatory photo submissions of e-cigarette devices, leading to high data integrity across five waves.
- Even with enhanced protocols, new threats, such as coordinated fraudulent submissions, required adaptive solutions, such as manual photo verification and open-ended response reviews prior to sending incentives.
Comment from the Center for Global Digital Health Innovation: This study highlights a comprehensive approach to safeguarding data integrity in web-based surveys. The detailed account of mitigation strategies offers a valuable blueprint for researchers designing similar studies. However, the reliance on personal data for identity verification raises ethical concerns about participant privacy. Additionally, while effective for a specific population (e-cigarette users), the scalability of such methods across diverse populations and studies is uncertain. Future research should explore cost-effective, privacy-preserving solutions and assess the long-term impact of such mitigation strategies on participant recruitment and engagement.
Generative AI and homogenization
S. Zhang et al., Generative AI Meets Open-Ended Survey Responses: Participant Use of AI and Homogenization, Journal of Medical Internet Research, 2024
This study examines how participants use generative AI tools, such as ChatGPT, to assist with answering open-ended survey questions. It explores the implications of AI-generated responses for data quality, highlighting concerns about homogenization and loss of authentic variability in participant input.
Key Takeaways:
- The study found that 34% of online survey participants reported using AI tools to answer open-ended survey questions, citing speed, writing quality, and assistance in expressing thoughts as primary motivations.
- AI-generated responses were significantly more homogeneous and positive compared to human responses. This raised concerns about masking social variation, particularly in sensitive topics such as intergroup attitudes.
- While many participants refrained from using AI due to ethical concerns about authenticity and cheating, the study noted challenges in balancing AI utility with the need for genuine, unbiased responses.
Comment from the Center for Global Digital Health Innovation: This paper provides a timely analysis of the challenges posed by generative AI in survey research. Its innovative approach to simulating AI responses based on participant-generated prompts is a key strength. However, the study could delve deeper into real-world scenarios where participants revise or selectively use AI-generated content. Additionally, future research should address strategies for integrating AI responsibly, assessing the use of explicit ethical guidelines, for example. Adaptive survey designs are another strategy to explore further. By collecting survey responses from humans through chatbots or voice interviews, this intervention counters homogenization while leveraging AI's benefits. A critical next step involves examining longitudinal effects of AI integration on data integrity and representativeness in digital health studies.
Click the links below to read previous editions of Research Roundup and to receive the latest updates in global digital health!
Meet Our Guest Editor

Jeffrey Hardesty is an Assistant Scientist on Faculty at the Johns Hopkins Bloomberg School of Public Health. His work at the Institute for Global Tobacco Control focuses on the benefits and unintended consequences of e-cigarette and tobacco policies. His current research interests include India’s e-cigarette ban, web-based survey data integrity, and understanding the role of AI models in tobacco control research and policy.