I thought the AI hype train was beginning to cool down. However, a few days ago, I was proven catastrophically wrong when my news and social media feeds were flooded with articles regarding OpenAI’s new AI Large Language Model (LLM), code-named “Strawberry,” or o1. According to OpenAI, this model doesn’t just squeeze out demonstrably false knee-jerk responses to your prompts, like its predecessor ChatGPT-4o; instead, it can “think” or “reason” before responding. OpenAI claims this makes “Strawberry” far more accurate than any other LLM. In fact, they claim the model has greatly improved its performance in mathematics, science, and programming and can even score 83% on the qualifying exam for the International Mathematics Olympiad, compared with 13% for ChatGPT-4o. Surely this represents a significant leap forward in AI technology? After all, the machine can think, reason, and produce more accurate responses than most other AI systems. Well, unfortunately, no. “Strawberry” is closer in nature to parlour tricks and marketing gimmicks than actual revolutionary technology.
“Strawberry,” or o1 (which are both equally stupid names), wasn’t birthed from some fresh revolutionary algorithm or entirely new approach to AI that allowed the system to develop heightened cognitive abilities. Instead, it just automates a relatively successful pre-existing prompt technique, known as “chain-of-thought” prompting, on top of an AI training method that has existed for even longer.
Chain-of-thought prompting is when you ask a Large Language Model (LLM) to explain step-by-step what it is doing. Why is this useful? Well, it helps you, the user, to understand what the AI is doing, allowing you to identify where it goes wrong and enabling you to create better prompts by addressing these issues. It also means the AI has the opportunity to correct itself. It calculates each step in the chain based on your prompt and its previous step. As such, if said step has an incorrect fact or a fault in its logic, the AI gets another chance to make it right. Research shows this prompting technique helps AI models become slightly more accurate.
All “Strawberry” does is automate this prompting technique. It uses an AI to take your prompt and break it down into simple steps. It then feeds that information into the main AI and makes it process each individual step, mimicking the non-automated “chain-of-thought” prompts. OpenAI claims this is similar to the AI “thinking” about your prompt or applying reasoning, but that is just marketing bumph. The AI still doesn’t cognitively understand what is going on — it’s just using statistics. “Strawberry’s” main AI is doing exactly the same thing as every other model out there; the only thing that makes it more accurate is that it has a slightly more refined front-end than the other AI.
Well, to be fair, that isn’t entirely true. “Strawberry” isn’t just a shiny front-end slapped onto OpenAI’s previous flagship AI system, ChatGPT-4o. Instead, the AI powering “Strawberry” was trained differently, and this helped it solve more complex problems. But there is a catch.
ChatGPT-4o was programmed to mimic patterns from its training data (which OpenAI stole from newspapers, books, and social media). In fact, most LLMs are trained this way, as it is the best way to mimic human-like text. However, for “Strawberry,” OpenAI used a different training method known as reinforcement learning, which teaches the system through rewards and penalties. You have likely seen this training method before in animations of an AI system learning to walk over thousands of successive generations. In theory, this should make the AI better at solving text-based problems compared to the likes of ChatGPT-4o.
But only in theory. Often with AI, as its ability to solve one problem increases, its ability to solve other problems decreases. What’s more, training an AI in this way can erode its ability to understand text, its ability to mimic human-like speech, or massively impact its “alignment.” “Alignment” simply means the AI is trying to solve the problem the user is asking it to in the way they want. For example, people have used reinforcement learning to make AI systems play video games, but the artificial intelligence ends up exploiting loopholes, like glitches or hacks, in order to win rather than playing within the rules. Meanwhile, AIs that are trained in the same way as ChatGPT-4o tend to have better alignment as they try to solve problems similarly to their training data, which was generated by us humans. As such, “Strawberry” will undoubtedly suffer from this alignment issue, taking shortcuts to answer questions and leading to errors.
In fact, OpenAI admits “Strawberry” has these problems. Their own account of the AI states that “Strawberry” does better at answering questions involving science, programming, and mathematics than ChatGPT-4o, but regarding many other queries, ChatGPT-4o does better.
You might think this means OpenAI could combine the two models to get the best of both worlds, but because they are training in totally different ways, this is practically impossible.
There is also the giant murderous elephant in the room. While Strawberry isn’t the giant leap forward OpenAI’s misleading marketing would like you to believe it is, it also doesn’t solve the giant pernicious problems OpenAI is facing. OpenAI’s models like ChatGPT-4o are already high enough quality to be used by millions, so instead of becoming more capable, these models just need to become more efficient! OpenAI is haemorrhaging money and is set to post an annual loss of $5 billion by the end of the year, thanks to the cost of training and maintaining its AI models. As such, “Strawberry” is a giant misstep in completely the wrong direction. Rather than actually solving their problems, OpenAI seems to be focused on using misleading marketing and parlour tricks to drum up more investment to keep their hollow hype train running.
Thanks for reading! Content like this doesn’t happen without your support. So, if you want to see more like this, don’t forget to Subscribe and help get the word out by hitting the share button below.
Sources: Vox, The Guardian, The Verge, Forbes