Tom's Lemmy
  • Communities
  • Create Post
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
KarlHeinzSchwuke@feddit.org to Technology@lemmy.worldEnglish · 3 months ago

I was wrong about robots.txt

evgeniipendragon.com

external-link
message-square
8
fedilink
1
external-link

I was wrong about robots.txt

evgeniipendragon.com

KarlHeinzSchwuke@feddit.org to Technology@lemmy.worldEnglish · 3 months ago
message-square
8
fedilink
Recently, I wrote an article about my journey in learning about robots.txt and its implications on the data rights in regards to what I write in my blog. I was confident that I wanted to ban all the crawlers from my website. Turned out there was an unintended consequence that I did not account for. My LinkedIn posts became broken Ever since I changed my robots.txt file, I started seeing that my LinkedIn posts no longer had the preview of the article available. I was not sure what the issue was initially, since before then it used to work just fine. In addition to that, I have noticed that LinkedIn’s algorithm has started serving my posts to fewer and fewer connections. I was a bit confused by the issue, thinking that it might have been a temporary problem. But over the next two weeks the missing post previews did not appear.
  • General_Effort@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    3 months ago

    Googlebot if enabled won’t just list you for search, but will also scrape your contents for Google’s AI.

    False.

    • cecilkorik@lemmy.ca
      link
      fedilink
      English
      arrow-up
      1
      ·
      3 months ago

      Absolutely true. They’ll buy the data they want from some shitty crawler running from some data broker in some far-flung and lawless part of the world, hallucinate the actual source, and pretend they had no idea their “data partner” wasn’t respecting robots.txt if they have to, which they won’t ever have to do because it’s literally impossible to detect and prove and realistically unenforceable.

      This is a company that removed it’s company motto of “Don’t be evil” because it found it too “limiting”. Don’t be naive.

    • ell1e@leminal.space
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      3 months ago

      See here: https://arstechnica.com/tech-policy/2025/07/cloudflare-wants-google-to-change-its-ai-search-crawling-google-likely-wont/ If you have a source that says it’s false, I’d be curious.

Technology@lemmy.world

technology@lemmy.world

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


  • @[email protected]
  • @[email protected]
  • @[email protected]
  • @[email protected]
Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 1.89K users / day
  • 8.22K users / week
  • 11.8K users / month
  • 28.8K users / 6 months
  • 1 local subscriber
  • 75.8K subscribers
  • 4.78K Posts
  • 105K Comments
  • Modlog
  • mods:
  • L3s@lemmy.world
  • enu@lemmy.world
  • Technopagan@lemmy.world
  • L4sBot@lemmy.world
  • L3s@hackingne.ws
  • L4s@hackingne.ws
  • BE: 0.19.9
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org