AI Models from Google, OpenAI, Anthropic Solve 0% of ‘Hard’ Coding Problems

cm0002@lemmy.world · 3 months ago

AI Models from Google, OpenAI, Anthropic Solve 0% of ‘Hard’ Coding Problems

atzanteol@sh.itjust.works · 3 months ago

The claims that AI will be surpassing humans in programming are pretty ridiculous. But let’s be honest - most programming is rather mundane.

ulterno@programming.dev · 3 months ago

Never have I had to implement any kind of ridiculous algorithm to pass tests with huge amounts of data in the least amount of memory, as the competitive websites show.

It has been mostly about:

Finding the correct library for a job and understanding it well, to prevent footguns and blocking future features
Design patterns for better build times
Making sane UI options and deciding resource alloc/dealloc points that would match user interaction expectations
cmake

But then again, I haven’t worked in FinTech or Big Data companies, neither have I made an SQL server.

technocrit@lemmy.dbzer0.com · 3 months ago

Pretty sure that autocomplete would be terrible at these tasks too.

ulterno@programming.dev · 3 months ago

There are some times when I wish I were better at regexp and scripting.
Times when I am writing a similar kind of thing again and again, which is just different enough (and small enough number of repetitions) that it doesn’t seem viable to make the script.

At those times, I tend to think - maybe Cursor would have done this part well - but have no real idea since I have never used it.

On the other hand, if I had a scripting endpoint from clang, ^[1], I would have used that to make a batch processor for even a repetition as small as 10 and wouldn’t have thought once about AI.

which would have taggified parts of code (in the same tone as “parts of speech”) like functions declaration, return type, function name, type qualifier etc. ↩︎

wetbeardhairs@lemmy.dbzer0.com · 3 months ago

Well, this kind of AI won’t ever be useful as a programmer. It doesn’t think. It doesn’t reason. It cannot make decisions besides using a ton of computational power and enormous deep neural networks to shit out a series of words that seem like they should follow your prompt. An LLM is just a really, really good next-word guesser.

So when you ask it to solve the Tower of Hanoi problem, great it can do that. Because it saw someone else’s answer. But if you ask it to solve it for a tower than is 20 disks high it will fail because no one ever talks about going that far and it flounders. It’s not actually reasoning to solve the problem - it’s regurgitating answers it has ingested from stolen internet conversations. It’s not even attempting to solve the general case because it’s not trying to solve the problem, it’s responding to your prompt.

That said - an LLM is also great as an interface to allow natural language and code as prompts for other tools. This is where the actually productive advancements will be made. Those tools are garbage today but they’ll certainly improve.

atzanteol@sh.itjust.works · 3 months ago

Well, this kind of AI won’t ever be useful as a programmer

It already is.

childOfMagenta@jlai.lu · 3 months ago

You mean useful to a programmer, or as useful as a programmer?

atzanteol@sh.itjust.works · 3 months ago

Ah - yeah I read that wrong. It’s useful to a programmer.

Ledivin@lemmy.world · 3 months ago

My productivity has at least tripled since I started using Cursor. People are actually underestimating the effects that AI will have in the industry

AlecSadler@lemmy.blahaj.zone · 3 months ago

Tripled is an understatement for me. Cursor and Claude Code are a godsend for OE for me.

technocrit@lemmy.dbzer0.com · 3 months ago

People are actually underestimating the effects that AI autocomplete will have in the industry

PushButton@lemmy.world · 3 months ago

It means the AI is very helpful to you. This also means you are as good as 1/3 of an AI in coding skills…

Which is not a great news for you mate.

atzanteol@sh.itjust.works · 3 months ago

Ah knock it off. Jesus you sound like people in the '90s mocking “intellisense” in the IDE as somehow making programmers “less real programmers”.

It’s all needless gatekeeping and purity test BS. Use tools that are useful. Don’t worry if it makes you less of a man.

Feyd@programming.dev · 3 months ago

It’s not gate keeping it is true. I know devs that say ai tools are useful but all the ones that say it makes them multiples more productive are actually doing negative work because I have to deal with their terrible code they don’t even understand.

atzanteol@sh.itjust.works · 3 months ago

The devs I know use it as a tool and check their work and fully understand the code they’ve produced.

So your experience vs. mine. I suspect you just work with shitty developers who would be producing shitty work whether they were using AI or not.

Endmaker@ani.social · edit-2 3 months ago

In the ‘Medium’ difficulty category, OpenAI’s o4-mini-high model scored the highest at 53.5%.

This fits my observation of such models. o4-mini-high is able to help me with 80-90% of the problems at work. For the remaining problems, it would come up with a nonsensical solution and no matter how much I prompt it, it would tunnel-vision on that specific approach. It could never second guess itself and realise that its initial solution is completely off the mark, and try an entirely differently approach. That’s where I usually step in and do the work myself.

It still saves me time with the trivial stuff though.

I can’t say the same for the rest of the LLMs. They are simply no good at coding and just waste my time.

yogsototh@programming.dev · 3 months ago

I didn’t see Claude 4 Sonnet in the tests and this is the one I use. And it looks like about the same category as o4 mini from my experience.

It is a nice tool to have in my belt. But these LLM based agents are still very far from being able to do advanced and hard tasks. But to me it is probably more important to communicate and learn about the limitations about these tools to not lose tile instead of gaining it.

In fact, I am not even sure they are good enough to be used to really generate production-ready code. But they are nice for pre-reviewing, building simple scripts that don’t need to be highly reliable, analyse a project, ask specific questions etc… The game changer for me was to use Clojure-MCP. Having a REPL at disposal really enhance the quality of most answers.

daniskarma@lemmy.dbzer0.com · 3 months ago

They have their uses. For instance the other day I needed to read some assembly and decompiled C, you know how fun that can be. LLM proved quite good at translating it to english. And really speed up the process.

Writing it back wasn’t that good though, just good enough to point in a direction but I still ended up writing the patcher mostly by myself.

Lemminary@lemmy.world · 3 months ago

the other day I needed to read some assembly and decompiled C

As one casually does lol Jokes aside, that’s pretty cool. I wish I had the technical know-how and, most importantly, the patience for it.

FizzyOrange@programming.dev · edit-2 3 months ago

Assembly is very simple (at least RISC-V assembly is which I mostly work with) but also very tedious to read. It doesn’t help that the people who choose the instruction mnemonics are extremely poor taste - e.g. lb, lh, lw, ld instead of load8, load16, load32, load64. Or j instead of jump. Who needs to save characters that much?

The over-abbreviation is some kind of weird flaw that hardware guys all have. I wondered if it comes from labelling pins on PCB silkscreens (MISO, CLK etc)… Or maybe they just have bad taste.

I once worked on a chip that had nested acronyms.

Lemminary@lemmy.world · 3 months ago

The over-abbreviation is some kind of weird flaw that hardware guys all have

My bet is on the teaching methods in uni. From what I’ve seen, older teaching methods use terrible variable names for a production environment. I think it unfortunately sticks because students get used to it and find it easier & faster than typing things out.

amorpheus@lemmy.world · 3 months ago

Who needs to save characters that much?

Do you realize how old assembly language is?

It predates hard disks by ten years and coincided with the invention of the transistor.

Outsider9042@lemmynsfw.com · 3 months ago

About all they are good for is generating boilerplate code. Just far less efficiently than a snippet library.

Tony Bark@pawb.social · 3 months ago

Yup. All that effort just to be good at basic code scaffolding.

Glitchvid@lemmy.world · 3 months ago

I keep getting told that AI is good at boilerplate code, and like, so is eclipse – if you know the kb shortcuts to autogenerate method stubs, classes, etc.

MTK@lemmy.world · 3 months ago

Please babe! Just one more parameter, then it will be AGI!

katy ✨@piefed.blahaj.zone · 3 months ago

ai is basically just the worst answer on stackexchange

danzania@infosec.pub · 3 months ago

Funny how I never see articles on Lemmy about improvements in LLM capabilities.

Rayquetzalcoatl@lemmy.world · 3 months ago

Probably because nobody really wants to read absolute nonsense.

Nullagon@ani.social · 3 months ago

i would guess a lot of the pro ai stuff is from corpos given the fact good press is money to them.

Modern_medicine_isnt@lemmy.world · 3 months ago

Fortunately, 90% of coding is not hard problems. We write the same crap over and over. How many different creat an account and signin flows do we really need. Yet there seem to be an infinite amount, and each with it’s own bugs.

AI Models from Google, OpenAI, Anthropic Solve 0% of ‘Hard’ Coding Problems

AI Models from Google, OpenAI, Anthropic Solve 0% of ‘Hard’ Coding Problems

AI Models from Google, OpenAI, Anthropic Solve 0% of ‘Hard’ Coding Problems | AIM