AI-Generated Code Has A Staggeringly Stupid Flaw

It simply doesn't work.

Jul 23, 2024

AI promises to revolutionise almost every aspect of our lives. But, one industry that appears to be particularly susceptible to AI taking over is programming. There’s a virtual tsunami of articles there on how models like ChatGPT can create vast amounts of intricate and complex code from basic prompts. As such, some AI advocates are claiming programming will soon be an obsolete career, with the industry entirely replaced by AI. However, even the most sceptical AI predictions suggest that the global number of programmers will drastically shrink in the coming years as these AI models make the profession drastically more efficient. But, all of these forecasts are wrong. They don’t consider a major flaw inherent to AI that it simply cannot overcome. Let me explain.

So, what is the problem with AI-generated code?

Well, one of the internet’s favourite developers, Jason Thor Hall of Pirates Software fame, described it best in a recent short. He said, “We have talked to people who’re using AI-generated code, and they are like, hey, it would take me about an hour to produce this code and like 15 minutes to debug. And then they are like, oh, the AI could produce it in like 1 minute, and then it would take me like 3 hours to debug it. And they are like, yeah, but it produced it really fast.”

In other words, even though AI can write code way faster than a human programmer, it does such a poor job that making the code useful actually makes it far less efficient than getting a qualified human to just do the job in the first place.

But Thor’s take is far from unfounded. In fact, there’s a recent study that supports his opinion.

Researchers at Princeton and the University of Chicago recently found that generative AIs, such as ChatGPT, and even coding-specific generative AIs, are functionally useless. They took 2,300 common software engineering problems from real GitHub issues, mostly debugging issues or feature requests, and evaluated how well these AIs can solve the coding issues or generate new feature code. What they found was telling. On average, only 4% of the time did AI actually generate solutions that worked, and the vast majority of these were straightforward engineering issues.

Big, dig a little deeper, and it gets so much worse. The AI model that faired the best was Claude 2, which provided a good solution 4.8% of the time. But, ChatGPT-4, by far the most complex and popular generative AI in the world and the one being used to generate the most code, only provided a good solution 1.7% of the time.

It’s no wonder Thor found that using AI to code is incredibly inefficient. Imagine having to debug and re-write over 95% of the code you write.

So, why is AI like this?

Well, AI doesn’t actually understand what it is doing. These generative AI models are basically overly developed predictive text programs. They use statistics based on a stupidly large pool of data to figure out what the next character or word is. As such, No AI actually ‘knows’ how to code. It isn’t cognitively trying to solve the problem, but instead finds an output that matches the statistics of the data it has been trained on. As such, it gets it massively wrong constantly, as the AI isn’t actually trying to solve the problem you think it is. As such, even when the coding problem you are asking the AI to solve is well-represented in its training data, it can still fail to generate a usable solution simply because it doesn’t actually understand the laws and rules of the coding language. This issue gets even worse when you ask it to solve an AI problem it has never seen before, as the statistical models it uses simply can’t be extrapolated out, causing the AI to produce absolute nonsense.

This isn’t just a problem with AI-generated code but every AI product, such as self-driving cars. Moreover, this isn’t a problem that can be easily solved. You can’t just shove more training data into these AIs, and we are starting to hit a point of diminishing returns when it comes to AI training (read more here). So, what is the solution?

Well, when we treat AI as it actually is, a statistical model, we can have tremendous success. For example, AI structural designs, such as those in the Czinger hypercar, are incredibly efficient and effective. But it falls apart when we treat AI as a replacement for human workers. Despite its name, AI isn’t intelligent, and we shouldn’t treat it as such.

Thanks for reading! Content like this doesn’t happen without your support. So, if you want to see more like this, don’t forget to Subscribe and help get the word out by hitting the share button below.

Sources: Wired, LeadDev, Arxiv, Pirate Software

Will Lockett's Newsletter

Discussion about this post