Auditing Chatbots’ Responses to Conspiracist Questions

Snurb — Friday 17 October 2025 23:15

The final speaker in this session at the AoIR 2025 conference is my QUT colleague Dan Angus, presenting our work on AI chatbots’ responses to conspiracist ideation. Ai chatbots are now widely used by everyday users; this is leading to a range of problematic outcomes, as people are being drawn into deep emotional relationships with such chatbots, for instance. Chatbots are also increasingly manipulated to represent distinct ideological perspectives.

Here are our slides:

just-asking-questions-doing-our-own-research-on-conspiratorial-ideation-by-generative-ai-chatbotsfrom Axel Bruns

What happens, then, when chatbots are asked specifically about conspiracy theories? What guardrails and safety mechanisms, if any, are in place in leading chatbots as users ask for problematic information? We build on work by Joanne Kuai and others, and in this paper perform a platform policy implementation audit to test how different chatbots respond to questions about conspiracy theories – ranging from established topics like the JFK assassination and chemtrails all the way through to at the time very recent claims that Haitian migrants in the US were eating their neighbours’ pets, or that Donald Trump manipulated the 2024 US election. None of these conspiracy theories are based in fact, of course.

We formulated some 5-15 questions per topic; some of these are neutral and some indicate pre-existing belief in the conspiracy theory. We prompted seven chatbots with these questions in November 2024, just after the US Presidential elections that year. We categorised responses across a range of typical types, from neutral responses (avoiding a response, describing the conspiracy theory, non-committal response, empathy) through constructive (countering with factual statements, offering verified sources, disapproving of the user’s line of questioning) to problematic (downplaying the severity of the issue, bothsiding rhetoric, or encouraging further investigation of problematic sources).

Response styles varied widely between chatbots and conspiracy theories. In some cases, they provided well-referenced factual information; in others, they were more open to potential alternative views; elsewhere, they actively engaged in speculation about alternative explanations. In some cases, through, they refused to entertain the question, or declared that they were not trained on recent enough data and therefore could not respond.

Perplexity and ChatGPT 3.5 Turbo were strongest on constructive responses; ChatGPT models also most frequently avoided inconclusive responses; Grok 2 Mini, and especially the exceptionally unfunny Grok 2 Mini “Fun Mode”, was outstanding in the volume of its problematic responses. Grok 2 Mini “Fun Mode” was deeply problematic in downplaying severity and bothsidesism; Perplexity was most consistent in providing factual statements and verified sourcing. Gemini most frequently avoided responses, perhaps showing Google’s risk-averse approach to chatbots.

There were also substantial differences between conspiracy theories, though. All chatbots tended to entertain some doubt about the JFK assassination; almost all pushed back strongly on 9/11 conspiracy theories, perhaps because of the continuing sensitivities in the US. There are clearly some guardrails around conspiracy theories, then, but they are very unevenly distributed.

There is also a question about how we want chatbots to respond. Clearly bothsidesing is a problem; hard pushback might also produce greater commitment to conspiracist beliefs, however. And of course the chatbots themselves have changed considerably again since we undertook this study, so there is an urgent need to repeat audits like this on an ongoing basis in order to hold AI companies to their account.

11 views