An interview with Jason Radford: Lessons Learned about Bots in Online Research
![]() | We interviewed Dr. Radford about his experience with bots and bad actors in internet research. Jason Radford, PhD, is the Managing Director of the National Internet Observatory and founder of the Social Design Lab at Northeastern. After earning his PhD in sociology from the University of Chicago, he has since helped establish Volunteer Science, a platform for conducting online behavioral studies. In this interview, Jason draws on his firsthand experience to discuss how researchers can identify and address fraudulent responses in online studies — from recognizing the warning signs to building practical data quality checks. |
How do researchers realize they have a bot problem?
One way is you start seeing participation shoot up. The National Internet Observatory had a major issue with fraud in early 2025. Some people had discovered they could get paid for doing surveys without using our apps. They were creating a lot of accounts and simulating devices to get into our software just to do our $1 survey. We only found out because we stopped spending money on online ads and were still getting 100 new participants each day.
How did you figure out who was a bot?
There’s no single way. When we started looking at the data from these participants, nothing seemed out of place at first. But when we looked closer, a couple of things stuck out. There were a lot of conservative democrats and liberal republicans. All of the email addresses had a similar format. None by itself was conclusive, but you start seeing patterns. We implemented multi-feature data quality checks. We use instruction and attention checks. We also look for markers of inconsistency, like the conservative democrats. We also ask the same question across surveys to see if people’s answers change. Sometimes people can be conservative democrats and everyone checks the wrong box every once in a while. But if you’re a conservative democrat who can’t remember your mothers name, then you’re probably fraudulent.
We’ve tried a lot of things that haven’t really helped as well. We looked at duplicate IP addresses. We’ve had the same IP address do a survey 100 times but few fail our data quality checks. And it’s because IP address sharing is a lot more common than you realize, especially on mobile networks. In contrast, IP location lookups tend to be much better. VPNs can mask a user’s location, but many fraudulent users do not use them. Unfortunately, a lot of the fraud services out there like ReCAPTCHA just aren’t reliable either. There’s some signal there, but it’s very weak and, because we don’t know what signal it sends, we can’t use it in making decisions about what data is legitimate.
How do you ensure data integrity?
You have two basic questions – when do you exclude data from the analysis and when do you not pay a participant because they’re not providing authentic data. For payment, you have to give people the benefit of the doubt because we know people from disadvantaged backgrounds are more likely to make these mistakes or exhibit irregular data. You have to come up with policies for what types of errors you will and won’t tolerate. Failing easy attention checks or instruction checks, as well as clear violations of consistency checks are all good rules for treating a participant as inauthentic. When it comes to including or excluding data, we use a scoring system for data quality and do robustness checking on our results. So, we’ll run our analysis only using data with high quality scores and then run it again but adding in data with low quality scores to see whether our data quality may affect our analysis.
