Context: my father is a lawyer and therefore has a bajillion pdf files that were digitised, stored in a server. I’ve gotten an idea on how to do OCR in all of them.

But after that, how can I make them easily searchable? (Keep in mind that unfortunately, the directory structure is important information to classify the files, aka you may have a path like clientABC/caseAV1/d.pdf

  • georift@piefed.social
    link
    fedilink
    English
    arrow-up
    3
    ·
    18 days ago

    Might be a little heavy handed for your needs but I’ve found paperless-ngx to be amazing.

    • First_Thunder@lemmy.zipOP
      link
      fedilink
      arrow-up
      1
      ·
      18 days ago

      My problem is paperless is the fact that it doesn’t preserve the directory structure, losing essential info