I compared ChatGPT 4.1 to o3 and 4o to find the most logical AI model – the result seems almost irrational

I compared ChatGPT 4.1 to o3 and 4o to find the most logical AI model – the result seems almost irrational

I compared ChatGPT 4.1 to o3 and 4o to find the most logical AI model – the result seems almost irrational

OpenAI‘s release of GPT-4.1 for ChatGPT came quietly but represents an impressive upgrade, albeit one focused specifically on logical reasoning and coding. Its enormous context window and grasp of structured thinking could open doors for a lot of new programming and puzzle solving. But OpenAI often brags about the coding abilities of its models in ways that the not-so technically minded find tedious at best.

I decided it might be more interesting to apply the natural extension of logical coding to more human interests – specifically, riddles and logical puzzles. Rather than simply see how GPT-4.1 performed on its own, I decided to run it against a couple of other ChatGPT models. I picked GPT-4o, the default choice available to every ChatGPT user, as well as o3, OpenAI’s high-octane reasoning model designed to chew through math, code, and puzzles using reason like a scalpel. This Logic Olympics is not particularly scientific, but it would show at least a flavor of how the models stack up.

Cat in a box



Source link

Back To Top