• jsomae@lemmy.ml
    link
    fedilink
    English
    arrow-up
    0
    ·
    5 months ago

    yes, that’s generally useless. It should not be shoved down people’s throats. 30% accuracy still has its uses, especially if the result can be programmatically verified.

    • Knock_Knock_Lemmy_In@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      5 months ago

      Run something with a 70% failure rate 10x and you get to a cumulative 98% pass rate. LLMs don’t get tired and they can be run in parallel.

      • jsomae@lemmy.ml
        link
        fedilink
        English
        arrow-up
        1
        ·
        5 months ago

        The problem is they are not i.i.d., so this doesn’t really work. It works a bit, which is in my opinion why chain-of-thought is effective (it gives the LLM a chance to posit a couple answers first). However, we’re already looking at “agents,” so they’re probably already doing chain-of-thought.

        • Knock_Knock_Lemmy_In@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          5 months ago

          Very fair comment. In my experience even increasing the temperature you get stuck in local minimums

          I was just trying to illustrate how 70% failure rates can still be useful.

      • MangoCats@feddit.it
        link
        fedilink
        English
        arrow-up
        1
        ·
        5 months ago

        I have actually been doing this lately: iteratively prompting AI to write software and fix its errors until something useful comes out. It’s a lot like machine translation. I speak fluent C++, but I don’t speak Rust, but I can hammer away on the AI (with English language prompts) until it produces passable Rust for something I could write for myself in C++ in half the time and effort.

        I also don’t speak Finnish, but Google Translate can take what I say in English and put it into at least somewhat comprehensible Finnish without egregious translation errors most of the time.

        Is this useful? When C++ is getting banned for “security concerns” and Rust is the required language, it’s at least a little helpful.

        • jsomae@lemmy.ml
          link
          fedilink
          English
          arrow-up
          1
          ·
          5 months ago

          I’m impressed you can make strides with Rust with AI. I am in a similar boat, except I’ve found LLMs are terrible with Rust.

              • Log in | Sign up@lemmy.world
                link
                fedilink
                English
                arrow-up
                0
                ·
                5 months ago

                Ah, my bad, you’re right, for being consistently correct, I should have done 0.3^10=0.0000059049

                so the chances of it being right ten times in a row are less than one thousandth of a percent.

                No wonder I couldn’t get it to summarise my list of data right and it was always lying by the 7th row.

                • Knock_Knock_Lemmy_In@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  0
                  ·
                  5 months ago

                  That looks better. Even with a fair coin, 10 heads in a row is almost impossible.

                  And if you are feeding the output back into a new instance of a model then the quality is highly likely to degrade.

                  • Log in | Sign up@lemmy.world
                    link
                    fedilink
                    English
                    arrow-up
                    0
                    ·
                    5 months ago

                    Whereas if you ask a human to do the same thing ten times, the probability that they get all ten right is astronomically higher than 0.0000059049.