BBC Study Finds “AI” Chatbots Routinely Incapable Of Basic News Synopses
Automation can be helpful, yes. But the story told to date by large tech companies like OpenAI has been that these new language learning models would be utterly transformative, utterly world-changing, and quickly approaching some kind of sentient superintelligence. Yet time and time again, data seems to show they’re failing to accomplish even the bare basics.
Case in point: Last December Apple faced widespread criticism after its Apple Intelligence “AI” feature was found to be sending inaccurate news synopses to phone owners. And not just minor errors: At one point Apple’s “AI” falsely told millions of people that Luigi Mangione, the man arrested following the murder of healthcare insurance CEO Brian Thompson in New York, had shot himself.
Now the BBC has done a follow up study of the top AI assistants (ChatGPT, Perplexity, Microsoft Copilot and Google Gemini) and found that they routinely can’t be relied on to even communicate basic news synopses.
The BBC fed all four major assistants access to the BBC website, then asked them relatively basic questions based on the data. The team found ‘significant issues’ with just over half of the answers generated by the assistants, and clear factual errors into around a fifth of their answers. 1 in 10 responses either altered real quotations or made them up completely.
Microsoft’s Copilot and Google’s Gemini had more significant problems than OpenAI’s ChatGPT and Perplexity, but they all “struggled to differentiate between opinion and fact, editorialised, and often failed to include essential context,” the BBC researchers found.
BBC’s Deborah Turness had this to say:
“This new phenomenon of distortion – an unwelcome sibling to disinformation – threatens to undermine people’s ability to trust any information whatsoever So I’ll end with a question: how can we work urgently together to ensure that this nascent technology is designed to help people find trusted information, rather than add to the chaos and confusion?”
Language learning models are useful and will improve. But this is not what we were sold. These energy-sucking products are dangerously undercooked, and they shouldn’t have been rushed into journalism, much less mental health care support systems or automated Medicare rejection systems. We once again prioritized making money over ethics and common sense.
The undercooked tech is one thing, but the kind of folks in charge of dictating its implementation and trajectory without any sort of ethical guard rails are something else entirely.
As a result, “AI’s” rushed deployment in journalism has been a keystone-cops-esque mess. The fail-upward brunchlords in charge of most media companies were so excited to get to work undermining unionized workers, cutting corners, and obtaining funding that they immediately implemented the technology without making sure it actually works. The result: plagiarism, bullshit, lower quality product, and chaos.
Automation is obviously useful and language learning models have great potential. But the rushed implementation of undercooked and overhyped technology by a rotating crop of people with hugely questionable judgement is creating almost as many problems as it purports to fix, and when the bubble pops — and it is going to pop — the scurrying to defend shaky executive leadership will be a real treat.