• Reygle@lemmy.world
    link
    fedilink
    English
    arrow-up
    13
    arrow-down
    1
    ·
    8 hours ago

    AITA for understanding that as meaning in order to “summarize” the data the AI read it entirely and will never be instructed to “forget” that data

    • TRBoom@lemmy.zip
      link
      fedilink
      English
      arrow-up
      17
      ·
      8 hours ago

      Unless someone has released something new while I haven’t been paying attention, all the gen AIs are essentially frozen. Your use of them can’t impact the actual weights inside of the model.

      If it seems like it’s remember things is because of the actual input of the LLM is larger than the input you will usually give it.

      For instance lets say the max input for a particular LLM is 9096 tokens. The first part of that will be instructions from the owners of the LLM to prevent their model from being used for things they don’t like. Lets say the first 2000 tokens. That leaves 7k or so for a conversation that will be ‘remembered’.

      Now if someone was really savvy, they’d have the model generate summaries of the conversation and stick them into another chunk of memory, maybe another 2000 tokens worth, that way it will seem to remember more than just the current thread. That would leave you with 5000 tokens to have a running conversation.

      • dgdft@lemmy.world
        link
        fedilink
        English
        arrow-up
        17
        ·
        edit-2
        8 hours ago

        Your general understanding is entirely correct, but:

        Microsoft is almost certainly recording these summarization requests for QA and future training runs; that’s where the leakage would happen.

        • TRBoom@lemmy.zip
          link
          fedilink
          English
          arrow-up
          5
          arrow-down
          1
          ·
          7 hours ago

          100% agree. At this point I am assuming everything sent through their servers is actively being collected for LLM training.

        • Sir. Haxalot@nord.pub
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          2
          ·
          6 hours ago

          That is kind of assuming the worst case scenario though. You wouldn’t assume that QA can read every email you send through their mail servers ”just because ”

          This article sounds a bit like engagement bait based on the idea that any use of LLMs is inherently a privacy violation. I don’t see how pushing the text through a specific class of software is worse than storing confidential data in the mailbox though.

          That is assuming that they don’t leak data for training but the article doesn’t mention that.

          • edm@thelemmy.club
            link
            fedilink
            English
            arrow-up
            3
            ·
            edit-2
            4 hours ago

            Always assume the worst, I gaurentee it is usually that bad in reality. Companies absolutely hate spending money on IT and security is always an after thought. API logs for the production systems that contain your full legal name, DOB, SSN, and home address? Yea wide open and accessible by anyone. Production databases with employee SSN, address, salary information? Same thing, look up how much the worthless management is making and cry.

            Booz Allen just got shit on because of the dude they hired who specifically sought out consulting for the IRS so he could steal Trumps IRS records.

            https://home.treasury.gov/news/press-releases/sb0371

            https://en.wikipedia.org/wiki/Charles_E._Littlejohn

          • dgdft@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            ·
            edit-2
            5 hours ago

            This is some pathetic chuddery you’re spewing…

            You wouldn’t assume that QA can read every email you send through their mail servers ”just because”

            I absolutely would, and Microsoft explicitly maintains the right to do that in their standard T&C, both for emails and for any data passed through their AI products.

            https://www.microsoft.com/en-us/servicesagreement#14s_AIServices

            v. Use of Your Content. As part of providing the AI services, Microsoft will process and store your inputs to the service as well as output from the service, for purposes of monitoring for and preventing abusive or harmful uses or outputs of the service.

            We don’t own Your Content, but we may use Your Content to operate Copilot and improve it. By using Copilot, you grant us permission to use Your Content, which means we can copy, distribute, transmit, publicly display, publicly perform, edit, translate, and reformat it, and we can give those same rights to others who work on our behalf.

            We get to decide whether to use Your Content, and we don’t have to pay you, ask your permission, or tell you when we do.

            • Sir. Haxalot@nord.pub
              link
              fedilink
              English
              arrow-up
              1
              ·
              edit-2
              4 hours ago

              That seems to be the terms for the personal edition of Microsoft 365 though? I’m pretty sure the enterprise edition that has the features like DLP and tagging content as confidential would have a separate agreement where they are not passing on the data.

              That is like the main selling point of paying extra for enterprise AI services over the free publicly available ones.

              Unless this boundary has actually been crossed in which case, yes. It’s very serious.

              • dgdft@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                2 hours ago

                This part applies to all customers:

                v. Use of Your Content. As part of providing the AI services, Microsoft will process and store your inputs to the service as well as output from the service, for purposes of monitoring for and preventing abusive or harmful uses or outputs of the service.

                And while Microsoft has many variations of licensing terms for different jurisdictions and market segments, what they generally promise to opted-out enterprise customers is that they won’t use their inputs to train “public foundation models”. They’re still retaining those inputs, and they reserve the right to use them for training proprietary or specialized models, like safety-filters or summarizers meant to act as part of their broader AI platform, which could leak down the line.

                That’s also assuming Microsoft are competent, good-faith actors — which they definitely aren’t.

    • VeganCheesecake@lemmy.blahaj.zone
      link
      fedilink
      English
      arrow-up
      2
      ·
      5 hours ago

      LLMs are stateless. The model itself stays the same. Doesn’t mean they’re not saving the data elsewhere, but the LLM does not retain interactions.

      • Reygle@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        1
        ·
        7 hours ago

        I’ve noticed growing opposition to critical thoughts about the sick and twisted nature of ai and the people who are in the cult.