Lemmy hates AI.

I’m fully supportive of the accessibility for persons with disabilities, to be clear. It’s ironic though. Does Lemmy’s open source code make it easier for bots to scrape it?

  • Jankatarch@lemmy.world
    link
    fedilink
    arrow-up
    3
    ·
    12 hours ago

    All technicality aside, models trained on images and their text-descriptions for blind people falls under one of the few good use-cases to Machine Learning.

  • ramble81@lemmy.zip
    link
    fedilink
    arrow-up
    28
    ·
    1 day ago

    Something I mention every time this comes up. AI doesn’t need to scrape Lemmy. All someone has to do is set up their own federated instance and AcitivityPub will wrap it up in a nice JSON format for them to parse however they want. And there’s fundamentally nothing a person can do about it.

    It’s just best to realize anything and everything on Lemmy is publicly available for any use, good or bad.

    • Pearl@lemmy.ml
      link
      fedilink
      arrow-up
      3
      ·
      13 hours ago

      This.

      Every nation state with internet surveillance spun up a server to collect and archive ActivityPub posts.

      In some ways it’s more freeing. No need to worry what a corporation will see fit to disclose to a gov agency.

  • FriendOfDeSoto@startrek.website
    link
    fedilink
    English
    arrow-up
    48
    arrow-down
    2
    ·
    1 day ago

    It’s not perfect training data. Being encouraged to add alt text and actually doing it are two different things. Writing good alt text is another matter all together. And anything that’s on the internet is training data whether people want it to be or not. The only difference is ethical whether the scraper accepts and respects a version of robots dot txt, i.e. “do not scrape,” that communicates the training data’s holders’ intentions. And if they torrent books you can guess how respectful they are.

    • errer@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 day ago

      At this point all the imagery data they need is already out there. Not like your picture of a cat you post to Lemmy is gonna help these companies make a better model.

  • Rhaedas@fedia.io
    link
    fedilink
    arrow-up
    32
    ·
    1 day ago

    Web content should always strive to be more accessible. Things like AI should be better regulated instead. I think we’ve missed the boat on a big part of that though, should have legally clamped down on activities a long time ago.

    • tourist@lemmy.world
      link
      fedilink
      arrow-up
      4
      ·
      edit-2
      24 hours ago

      Also, the alt texts vary in descriptiveness for that exact purpose. They’re meant to be useful for humans, not for training data.

      What would a blind person rather have as the alt text:

      (there are no photos here, for the blind people listening)

      1:

      A cute Alsatian puppy looking into the camera with a dog toy in its mouth

      2:

      A 14 week old black/brown dog sitting on a tiled floor with a synthetic-rubber cuboid-cylindrical-shaped, blue-green-gradient chew toy in its mouth with its eyes and nose poised at a 30° angle towards the photographer’s origin. Each tile on the floor is approximately 1.47m^2 and are a pearlescent shade of off-white. There is an unidentifiable black speck on the first tile in the top left quadrant of the image. The cameraman’s fat finger is covering 1.97% of the bottom right quadrant. Focal length is set to 100mm. Exposure settings appear to be increased. The dog’s genitals are not visible.

    • ethaver@kbin.earth
      link
      fedilink
      arrow-up
      4
      ·
      1 day ago

      and I actually really like that one particular use-case of ai because less required human interaction gives the blind user more independence. The remaining issue of corporatization and private ownership of something that should be a publicly owned resource (as with many other assistive technologies) is a society-wide issue and framing it as a futurist vs Luddite discussion is a powerful misdirection.

  • badgermurphy@lemmy.world
    link
    fedilink
    arrow-up
    9
    ·
    1 day ago

    At first I was concerned about these huge tech companies stealing all of human knowledge and using it to make a fortune and drive everyone that created the knowledge into poverty.

    Now I see that they are stealing all of human knowledge to make LLMs, giant digital babbling talkers. It can’t work how they want the way they’re doing it, so it doesn’t matter what data they consume. They seem to lose money on every LLM query, even if you’re paying for the highest tier.

    When they stop subsidizing the cost to cash in, the already lukewarm interest in LLMs will cool further as costs rise.

    Shower response: I don’t like that they’re gobbling my data, but at least they’re choking on it.

  • ikt@aussie.zone
    link
    fedilink
    arrow-up
    9
    ·
    1 day ago

    ai is already amazing at image recognition, training or no training it’s already here