AI agents wrong ~70% of time: Carnegie Mellon study

Jaden Norman@lemmy.world · 5 months ago

AI agents wrong ~70% of time: Carnegie Mellon study

Shayeta@feddit.org · 5 months ago

It doesn’t matter if you need a human to review. AI has no way distinguishing between success and failure. Either way a human will have to review 100% of those tasks.

jsomae@lemmy.ml · 5 months ago

Right, so this is really only useful in cases where either it’s vastly easier to verify an answer than posit one, or if a conventional program can verify the result of the AI’s output.

MangoCats@feddit.it · 5 months ago

It’s usually vastly easier to verify an answer than posit one, if you have the patience to do so.

I’m envisioning a world where multiple AI engines create and check each others’ work… the first thing they need to make work to support that scenario is probably fusion power.

zbyte64@awful.systems · 5 months ago

It’s usually vastly easier to verify an answer than posit one, if you have the patience to do so.

I usually write 3x the code to test the code itself. Verification is often harder than implementation.

MangoCats@feddit.it · 5 months ago

Yes, but the test code “writes itself” - the path is clear, you just have to fill in the blanks.

Writing the proper product code in the first place, that’s the valuable challenge.

zbyte64@awful.systems · 5 months ago

Maybe it is because I started out in QA, but I have to strongly disagree. You should assume the code doesn’t work until proven otherwise, AI or not.