Research

Confidently Wrong: How People Rank AI’s Strengths and Weaknesses by Model

Would you trust AI more than your own doctor? AI has officially transcended being a simple chatbot. People are even trying it as an on-demand therapist, with headlines and studies describing chatbot “empathy.”

Daniel Anderson

Would you trust AI more than your own doctor? AI has officially transcended being a simple chatbot. People are even trying it as an on-demand therapist, with headlines and studies describing chatbot “empathy.” On Reddit, users say daily conversations with ChatGPT have helped them “more than 15 years of therapy,” while researchers at the University of Toronto claim chatbots can appear more empathetic than humans. 

Against this backdrop, we asked a straightforward question: which AI models do people actually trust for everyday tasks?

Phrasly.ai surveyed 1,000 U.S. adults who regularly use tools like ChatGPT, Gemini, Meta AI, Claude, Grok, Perplexity, and Phrasly. Participants compared model outputs across writing, coding, schoolwork, medical information, travel planning, financial advice, and companionship. Then they told us which tools they trust, where they feel comfortable using AI, and where it goes very wrong.

Key Takeaways

  • 58% of users have believed an AI answer was correct, only to learn later it was wrong.
  • 1 in 4 users say ChatGPT makes the most mistakes, yet it is still the most trusted tool overall.
  • 28% of users do not believe AI makes mistakes or hallucinations at all, a dangerous assumption.
  • 13% rarely or never fact-check AI outputs.
  • AI for companionship is the least trusted use overall.

The People’s AI Power Rankings

When it comes to trust, there’s one clear winner. ChatGPT ranks #1 across every category tested, from writing and work tasks to coding and travel. On average, more than one in four users (27%) say it’s the AI they rely on most, regardless of task.

Its strongest showing is in writing and communication, where 38.9% of users selected it, which nearly double the next-closest competitor, Gemini (19.9%). Gemini consistently ranks second, earning between 16% and 24% of user preference across tasks, particularly in work and travel, where people describe it as practical, structured, and analytical.

In technical tasks like coding and troubleshooting, the field tightens. ChatGPT still leads with 25%, but Gemini (18.2%) remains close behind. 

Meta AI holds steady between 8% and 12% in all categories, with slightly higher trust in companionship (12.3%) and travel (11.0%), but it never cracks the top three overall. Smaller models like Phrasly and Perplexity stay under 14% across the board, though Phrasly stands out for academic and creative niches like studying (18.1%), suggesting emerging specialization.

ChatGPT, it seems, has become the default AI. It may not always be the most precise, but it is the most trusted and mainstream. ChatGPT had the clear early market share lead with their consumer chat bot, familiarity seems to drive trust. As the first AI most people have ever used, it feels approachable, reliable, and culturally dominant. Its confidence and humanlike tone can create an illusion of authority that users instinctively believe. OpenAI is not alone in producing models that are overconfident. These tools may not always be the smartest or provide the most accurate answers, but that doesn’t stop them from acting confident, luring users into a false belief that they have just read the accurate truth.

Comfort Zones: Where People Feel Safe Using AI

Most users are at ease letting AI help with creative or communicative tasks, but their confidence drops sharply when real-world stakes enter the picture.

Writing and communication top the list, with 83.5% of respondents saying they’re very or somewhat comfortable using AI to craft emails, reports, or creative projects. Work and career tasks follow close behind at 78.4%, reflecting how mainstream AI has become in professional productivity. Nearly three-quarters feel comfortable using AI for travel planning (75.3%) and image generation (76%), showing that people are happy to let machines handle logistics and creativity alike.

But the comfort fades when accuracy or ethics matter most. Only 30% of users feel very comfortable using AI for coding, and nearly a third (27.9%) say they’re neutral — unsure if they can rely on it. Financial and medical topics are even shakier ground: just 27% feel very comfortable taking financial advice from AI, and 24% say the same for health information.

The most polarizing category? AI companionship. Roughly two-thirds (63.5%) are open to chatting with AI for advice or entertainment, but 16.6% admit they’re uneasy, the highest discomfort level of any task. Even as users experiment with AI as a sounding board, there’s lingering hesitation about how far that relationship should go.

Confidently Wrong: The Hallucination Report

For all its charm, AI still has a habit of confidently faking it. Nearly 3 in 10 (28.8%) users believe no AI tools make mistakes or hallucinations, an optimistic assumption that says more about user perception than machine precision. But among those who’ve noticed the cracks, ChatGPT (20.5%), Meta AI (19.6%), and Gemini (11.3%) top the list of most error-prone models, according to AI users.

Ironically, these same tools are also the most trusted. The result? A paradox worthy of therapy itself: users know their favorite AI sometimes lies, but they keep coming back for reassurance (talk about toxic attachment). 

More than half (58%) of respondents have believed an AI answer was correct, only to later find out it wasn’t. A third (33.4%) say it’s happened “more than once.” 

Still, fewer than half (48%) regularly verify AI outputs, and 13% admit they rarely or never fact-check. Call it what you will: laziness, digital denial, or just good old-fashioned blissful ignorance.

Conclusion

AI trust, it turns out, is complicated. Users will gladly let a chatbot write their emails or plan their vacations, but the moment money, health, or emotion enters the equation, confidence starts to crack. ChatGPT may be the people’s favorite, but even its biggest fans know it sometimes gets it wrong and they still come back anyway. In the end, the “best” AI for many users may not be one that is the most accurate, but the one they are most comfortable believing.

Methodology

The data for this study were collected through an online survey administered to a sample of 1,000 adults aged 18 years and older residing in the United States. All participants reported regular use of artificial intelligence (AI) tools, including ChatGPT, Claude, Gemini, Meta AI, Phrasly, Grok, and Perplexity. Frequency of use among respondents was distributed as follows: 50.5% reported daily use, 39.1% weekly use, and 10.4% monthly use. To enhance representativeness, select survey items were normalized to align with the broader population of active AI users. The final sample consisted of 57.5% male and 42.5% female respondents.

Fair Use Statement: This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.