Ever since OpenAI unveiled GPT-5, I’ve been looking for ways to challenge it and compare it to the rest of the AI field. After exploring math theorems, SAT questions, and brain teasers, I settled on a tough science concept and the mind of a five-year-old.
What I found surprised me, and it illustrated the strengths and weaknesses of GPT-5, Gemini 2.5 Flash, Claude Sonnet 4, and Microsoft Copilot.
As I mentioned, I started by trying to solve the hardest questions I could find. I scoured the web until I discovered a list of the 12 hardest SAT questions and asked all the AI chatbots to fill in the blank on this sentence based on the multiple choice below:
“In assessing the films of Japanese director Akira Kurosawa, ______ have missed his equally deep engagement with Japanese artistic traditions such as Noh theater.
“Which choice completes the text so that it conforms to the conventions of Standard English?
- A. many critics have focused on Kurosawa’s use of Western literary sources but
- B. Kurosawa’s use of Western literary sources has been the focus of many critics, who
- C. there are many critics who have focused on Kurosawa’s use of Western literary sources, but they
- D. the focus of many critics has been on Kurosawa’s use of Western literary sources; they”
As you probably guessed, all the AI chatbots quickly selected A. This English challenge was no challenge at all.
When I dropped in a single sentence from an unproven math theory, each one instantly identified it.
I was running out of ideas, but wondered if I could stump any of the AI systems on a classic brain teaser:
“19 people get off the train at the first stop. 17 people get on the train. Now there are 63 people on the train. How many people were on the train to begin with?”
[Spoiler ahead]
Each AI showed its work and instantly provided the correct answer: 65 (did you get it right?).
A complex idea for kids
Unsatisfied, I racked my brain for a topic that might help me compare the AI models and reveal OpenAI’s GPT-5 breakthroughs.
Then it hit me like a blast of hot sunlight: Cold fusion, now there’s a challenging topic. However, if I asked ChatGPT, Gemini, and others for an explanation, I worried each would dive deep into the science details, which wasn’t what I wanted. I decided to have each of them “Explain it to me like I’m a five-year-old.” Is there any better way to understand complex information than when it’s simplified to a level any grade schooler could digest?
As an added wrinkle, I asked for kid-friendly illustrations to accompany the explanation.
Here’s the prompt:
“Explain cold fusion to me like I’m a five-year-old. Also, please include kid-friendly illustrations.”
It only took ChatGPT running GPT-5, Claude AI, Gemini, and Copilot a few seconds each. The information was accurate, but their approaches were wildly different.
Let’s start with ChatGPT:
Not gonna lie, this was a little disappointing. The image arrived without any other context, and though the text is accurate and a five-year-old might smile at the image, they might also be confused over some of the concepts, like “atoms”, “hydrogen”, and “helium” (okay, maybe they’re familiar with that last one).
With GPT-5, ChatGPT is supposed to be a better and maybe more concise conversationalist, with a stronger understanding of the emotion behind a prompt. It’s also avoiding filling knowledge gaps with nonsense. This distillation does illustrate some of that, but I think it fell far short of the mark.
Gemini
Gemini’s explanation is generally excellent, though I think it’s designed to be read out loud to a five-year-old. Still, I appreciate how it started by explaining hot fusion before delving into cold fusion.
It would’ve been nice if Gemini had also explained what atoms are, but at least it created an adorable illustration of two atoms hugging.
Copiliot
Copilot is an interesting case since it’s based on OpenAI’s GPT models, and I’m pretty sure it does not yet have access to GPT-5. In other words, its answer was probably crafted by GPT-4.
It did a much better job than GPT-5 of explaining all the core concepts and how cold fusion might work. Copilot also gets points for an excellent cold fusion analogy. “It’s like trying to bake cookies without turning on the oven. 🍪”.
Unfortunately, it failed to deliver inline illustrations.
Claude AI
I saved the best for last. Claude AI far outperformed GPT-5, Gemini, and Copilot, not necessarily because its text explanation is better than Gemini’s or Copilot’s (though it is), but because Claude AI automatically created an Artifact.
Clause AI Artifacts are instantly shareable apps, tools, and content. I didn’t ask Claude to create one, but next to the text was a “Cold Fusion for Kids!” interactive artifact, and it’s kind of brilliant.
If you publish the artifact, it produces a publicly shareable URL that’s live until you unpublish it. I made the guide live so you can see it.
Just look at it. It’s so simple, so clear, so much fun.
Claude AI smartly starts the guide by explaining and illustrating atoms. It then dives into the Sun and how it handles atoms and fusion. Next up is cold fusion, accompanied by a fun illustration of a bubbling scientist’s beaker.
There is a bit of depth here. The guide explores if cold fusion works and finally talks about why we’d want it: “It would be like having a tiny Sun in a jar!”
The only thing missing is one cute illustration of how Cold Fusion might work (perhaps they can borrow one from GPT-5).
Even though I have some experience with Claude AI artifacts, I didn’t expect it to go this way. I have high hopes for GPT-5, but I think Claude AI, and to a lesser extent Gemini, understand that being concise does not always equate to clarity.
I’m certain there are other areas where ChatGPT running GPT-5 outstrips them all, but in this instance, Claude AI knew the best way to explain cold fusion to a five-year-old – and me.