top of page

Can AI Think Without Us? Lessons from Deloitte’s Costly Experiment


ree

The promise of artificial intelligence (AI) to revolutionise consulting is undeniable. Sleek dashboards, instant insights, and rapid analysis make AI appear as the ultimate efficiency tool. However, Deloitte Australia's recent experience serves as a stark warning: AI is far from infallible, and its limitations can have serious consequences. 

When Australia's Department of Employment and Workplace Relations discovered that its AU$440,000 assurance review contained fabricated academic scholars, misattributed research, and a fictional case titled "Deanna Amato v Commonwealth" that never existed in Australian legal history, it marked more than just another AI hallucination incident.   

 

The Australian government commissioned Deloitte to review the Targeted Compliance Framework, a key part of the welfare system. The review assessed the integrity of welfare payments affecting thousands of citizens. Deloitte relied heavily on Azure OpenAI GPT-4o to generate the final report, a fact initially omitted but later disclosed in a revised version. 

 

The resulting errors were not minor oversights but serious breaches of research integrity. Such inaccuracies reflect a known limitation of large language models, which generate text based on patterns and probability rather than verified facts. 

 

This incident occurs amid a surge in AI consulting services, with the sector experiencing a CAGR of 26.2% as enterprises rush to integrate AI into their operations. Research reveals that even the most advanced language models hallucinate between 0.7% and 29.9%, with legal information particularly vulnerable at a 6.4% error rate. When applied to high-stakes government contracts worth hundreds of thousands of dollars, these percentages translate into significant financial and reputational risks. 

 

This is not the first time the limitations of generative AI have surfaced. Similar cautionary examples have appeared across industries, reaffirming a point explored in our earlier piece, "From What to Why: Why AI Alone Can't Decode Market Research," that technology can accelerate discovery but cannot replace the human reasoning, interpretation, and judgment process. 

 

A Growing Pattern: AI Hallucinations Across Industries 

The Deloitte case is not an isolated misstep. Across industries, AI-generated content has repeatedly exposed the risks of relying on automation without human verification. 

 

  • Anthropic's Claude and the Legal Filing Error  

Anthropic's flagship AI, Claude, introduced false citations in a legal filing, modifying the real source's title, author, and page numbers. These alterations made it appear as though the referenced document did not exist. The company later admitted it was an "embarrassing and unintentional" error due to a lack of manual oversight. 

 

  • ChatGPT and the New York Court Failure 

In one of the most widely reported cases, attorneys in New York submitted court filings that included fake legal precedents entirely created by ChatGPT. The attorneys were later fined when it was revealed that the cited cases were entirely fictitious. 

 

  • Deepfake and Hallucination Challenges in Minnesota (2024) 


In Minnesota, researchers and lawmakers engaged in deepfake legislation found that AI tools used for evidence analysis generated false attributions and misidentified media sources. The confusion complicated legislative efforts and underscored how AI hallucinations undermine well-intentioned governance. 

 

  • Cursor AI's Fabricated Policy Incident (2025) 


A customer support chatbot powered by Cursor AI gained widespread attention after it fabricated a nonexistent company policy during a live interaction with a user. The made-up policy incited user outrage, prompting the company to issue a public apology and reevaluate its AI deployment strategies. 

 

  • JP Morgan's AI Trading Model Glitch (2024) 

An AI-based trading model at JP Morgan produced incorrect stock recommendations due to misreading market signals. This error led to short-term financial setbacks, underscoring the importance of ongoing human oversight of automated financial systems. 

 

  • IBM Watson's Oncology Recommendations (2018–2019) 

IBM Watson for Oncology, designed to offer treatment suggestions, was discovered to sometimes recommend unsafe or inaccurate cancer therapies. Internal reviews indicated that specific recommendations relied on synthetic data instead of validated clinical evidence. 

 

Across these examples, one truth stands out: AI hallucinations are not random glitches; they are inherent to how generative systems work. Without structured validation, every automated output carries a non-zero risk of error. 

Why These Failures Matter 

From Deloitte Australia's flawed assurance report to Anthropic's citation mix-up, ChatGPT's bogus court filings, and Cursor AI's fabricated company policy - each case highlights a critical truth: automation without oversight undermines trust. 

 

  • Erosion of Trust and Credibility: When AI-generated reports or briefs contain errors, the reputational fallout is instant. Clients, regulators, and citizens lose confidence in institutions that rely on AI without human review. 

 

  • High Stakes in Real Decisions: Such mistakes have real-world consequences in consulting, law, and public policy. A fabricated precedent or welfare case can mislead judges, policymakers, and entire systems. 

 

  • Lack of Transparency: Many failures stem from undisclosed or poorly explained AI involvement. Stakeholders deserve to know when AI is used, how it's applied, and who ensures its accuracy. 

 

  • The Human Gap Remains Essential: AI cannot verify sources, assess legal weight, or understand nuance. Human expertise remains indispensable for validating facts, interpreting context, and ensuring ethical rigour. 

 

AI can accelerate analysis, but cannot replace discernment or accountability. These incidents collectively show that hallucination risk is inherent to generative models. Only skilled professionals can identify such lapses and uphold credibility. 

 

Transparency is non-negotiable. Disclosing when and how AI is used, and where human oversight intervenes, is not optional. It's the foundation of trust in an increasingly automated world. 

 

Guiding Principles for Responsible AI in Consulting 

 

As AI becomes more embedded in research, policy, and consulting workflows, firms must establish clear principles that balance innovation with integrity. Responsible AI use is not about restriction. It is about ensuring accountability, transparency, and human oversight at every stage. 

 

  • Full Disclosure: Clarify to clients exactly where and how AI contributed. 

  • Layered Validation: Every AI output, whether citations, quotations, or data interpretation, should be reviewed and approved by domain specialists. 

  • Reframe the Role of AI: Treat AI as a tool, not a replacement. It should function as assistive intelligence, not an autonomous author. 

  • Industry Standards and Best Practices: The consulting community must collaborate to define ethical norms, auditing mechanisms, and transparency standards. 

 

Balancing Speed and Integrity: The Future of AI in Consulting 

 

If a leading consulting firm can issue a refund after its AI-assisted report contains falsehoods, and if legal tools can invent citations, we must face reality: complete trust in AI is premature. The allure of efficiency is undeniable, yet speed cannot substitute for discernment. Human judgment remains irreplaceable in domains where integrity, accuracy, and accountability are non-negotiable. 

 

The challenge is not to abandon AI, but to integrate it responsibly. AI should augment expertise, providing insights, pattern recognition, and analytical power, while humans remain the ultimate arbiters of truth, context, and ethical responsibility.  

 

In a world increasingly dominated by automated intelligence, this balance between the speed of AI and the wisdom of human judgment will define the future of responsible consulting. 

 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating

Recent Posts

Subscribe to our newsletter

Get the latest insights and research delivered to your inbox

bottom of page