Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.

Also includes outtakes on the ‘reasoning’ models.

  • Slashme@lemmy.world
    link
    fedilink
    English
    arrow-up
    40
    arrow-down
    1
    ·
    7 hours ago

    The most common pushback on the car wash test: “Humans would fail this too.”

    Fair point. We didn’t have data either way. So we partnered with Rapidata to find out. They ran the exact same question with the same forced choice between “drive” and “walk,” no additional context, past 10,000 real people through their human feedback platform.

    71.5% said drive.

    So people do better than most AI models. Yay. But seriously, almost 3 in 10 people get this wrong‽‽

    • bluesheep@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      5
      ·
      2 hours ago

      I saw that and hoped it is cause of the dead Internet theory. At least I hope so cause I’ll be losing the last bit of faith in humanity if it isn’t

    • T156@lemmy.world
      link
      fedilink
      English
      arrow-up
      19
      ·
      6 hours ago

      It is an online poll. You also have to consider that some people don’t care/want to be funny, and so either choose randomly, or choose the most nonsensical answer.

      • Brave Little Hitachi Wand@feddit.uk
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        4 hours ago

        I wonder… If humans were all super serious, direct, and not funny, would LLMs trained on their stolen data actually function as intended? Maybe. But such people do not use LLMs.

    • masterofn001@lemmy.ca
      link
      fedilink
      English
      arrow-up
      12
      arrow-down
      5
      ·
      edit-2
      7 hours ago

      Without reading the article, the title just says wash the car.

      I could go for a walk and wash my car in my driveway.

      Reading the article… That is exactly the question asked. It is a very ambiguous question.

      • Geth@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 hour ago

        Mentioning the car wash and washing the car plus the possibility of driving the car in the same context pretty much eliminates any ambiguity. All of the puzzle pieces are there already.

        I guess this is an uninteded autism test as well if this is not enough context for someone to understand the question.

      • bluesheep@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        3
        ·
        2 hours ago

        Without reading the article, the title just says wash the car.

        No it doesn’t? It says:

        I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

        In which world is that an ambiguous question?