Hister: self-hosted search engine for webpages and files with offline result preview

asciimoo@lemmy.ml · 2 days

I’m working on a self-hosted search service called Hister with the goal to reduce my dependence on online search engines.

Hister is a full text indexer for websites which saves all the visited pages rendered by your browser. It provides a flexible web (and terminal) search interface & query language to explore previously visited content with ease or quickly fall back to traditional search engines.

I’ve been using it for a few months and as my local index is growing I can avoid opening google/duckduckgo/kagi more and more frequently.

The project is still heavily under development with a growing community, but the current version is in a fairly usable state in my opinion, so I wanted to share it here - perhaps some of you find it useful as well. (Or at least have some constructive criticism =])

The code is AGPLv3 licensed, available at https://github.com/asciimoo/hister website: https://hister.org/ read-only demo: https://demo.hister.org/

About me: I develop privacy protecting and data liberating free software since 2008. I’m the author of Searx, Colly (https://github.com/gocolly/colly) and many more smaller free software/self-hosted projects (https://github.com/asciimoo).

yelling_at_cloud@programming.dev · 2 hours

I have actually been looking for something like this, look forward to giving it a go!

bestbry@lemmy.world · 1 day

This is a great idea, I’m gonna try

Samsy@lemmy.ml · 2 days

Uhh, neat. Will definitely try it out.

Free_Appalachia@lemmy.ml · 2 days

This looks really rad. I have been trying to build a leftist search engine using searxng and it has a lot of issue because of things getting blocked over tor due to the amount of background requests you have to be making to those sites. I have thought about building something like this to deal with that issue, but just had my hands full with other projects. Quite obviously (for those who like to come online to shit on really cool project people are building) the use case is people looking to build their own privately indexed search results without it having to get polluted by shit that comes up in other search engines. I feel like that was immediately apparent to me, because I have that use case and I immediately saw the utility here. For those who don’t get, just because it doesn’t fit the use case you have for internet related projects, doesn’t mean it’s not extremely useful to others. You literally don’t have to come on here just to shit on what other people are doing simply because, “you don’t get it.” This project looks really cool and I am going to look into replacing my searxng instance with this soon. I am super stoked that somebody has done this work. It has needed to be done for a long time. So thank you!

racoon@lemmy.ml · 2 days

I just want to be able to download and look up offline dictionaries

HawtP0tat0@sh.itjust.works · 23 hours

Not technically dictionaries but Kiwix does Wiki’s offline. Unfortunately its only the wiki’s they provide or you have to scrape yourself.

INeedMana@piefed.zip · 2 days

For that I use https://f-droid.org/packages/com.akylas.aard2
The slob dumps require a bit of hunting but other than that it works well for me

utopiah@lemmy.ml · 2 days

Interesting, I didn’t see it in the documentation so if you didn’t document that already, you can have your local instance as search suggestion for Firefox on mobile and desktop. I use it for my own wiki, e.g. https://mastodon.pirateparty.be/@utopiah/116351732150481942

Also how I would imagine it is default search there and if no hit then fallback to a default search engine, e.g. DDG.

asciimoo@lemmy.ml · 2 days

Also how I would imagine it is default search there and if no hit then fallback to a default search engine, e.g. DDG.

This is exactly how I use it. Hister has even a hotkey to quickly jump to your preferred online search engine with the current search query if you cannot find what you are looking for.

steel_for_humans@piefed.social · 2 days

I don’t get it. It indexes pages which were already visited, right? So in order to find some website I need to first use another search engine. Afterwards, that website is in my browsing history and if I need it again, I don’t need to search for it. So what’s the use case for this project?

paris@lemmy.blahaj.zone · 2 days

Your browsing history does not have full text search, so if you only remember the content of the page and not the title of it, you’re SOL. Or if you browse across multiple devices, you have to check multiple places to hope to find it.

asciimoo@lemmy.ml · 2 days

It indexes pages which were already visited, right?

Yes, if you use the browser extension only, but Hister has an API and a crawler as well if you’d like to add content you have not visited yet. Also, Hister supports indexing local text files, not just websites.

Afterwards, that website is in my browsing history and if I need it again, I don’t need to search for it

Unfortunately browser history does not include the page’s content only the URL + title combo at best.
Browser’s can’t show an offline preview (Having offline previews is a huge privacy - and productivity - win in my opinion, it completely eliminates the need of creating external network requests)

These are the biggest weaknesses of the browser history compared to Hister, but there are many more nuances where Hister can provide extra features and QoL improvements. I recommend checking the documentation & posts on the website if you are interested in the details.

ScoffingLizard@lemmy.dbzer0.com · 1 day

So does that mean that the index starts off as empty? If so, is there a way to create a centralized (I know that’s a bad word) starting repo such that the engine already knows some cool results? I have tabbed bookmarks for news that is not shitty, archives, video that isn’t YouTube, privacy resources, etc. It would be cool if people could post indices focused on certain topics that they could add. Like indices for random stuff, like dog grooming, kayaking, or woodworking. It could be a hub like Docker Hub, but for cool results.

Sorry. Ha ha. You know you have a good idea when people start asking for features. I haven’t even started it yet. Maybe I can try self hosting on my desktop.

This is exciting! I normally use Searxist on Android.

asciimoo@lemmy.ml · 1 day

I’d absolutely love to do this! It’s already on my future plans list: https://hister.org/support :

Create infrastructure for importable, pre-indexed databases organized by topic, letting users quickly expand their local index with curated, relevant content.

It could be a hub like Docker Hub, but for cool results.

Exactly!

Sorry. Ha ha. You know you have a good idea when people start asking for features. I haven’t even started it yet.

<3 No need to apologize. I appreciate suggestions a lot (especially if those are well aligned with my ideas =] ).

Hister: self-hosted search engine for webpages and files with offline result preview

GitHub - asciimoo/hister: Your own search engine