How to archive web pages?

meejle@lemmy.world · 7 months

Archive.ph is good, especially because I’ve never met a paywall it couldn’t bypass. 😏

davel@lemmy.ml · 7 months

archive.ph is just another alternate hostname for archive.today.

meejle@lemmy.world · 7 months

Oh. In which case it still works fine for me, what can I say. 😅

Silki@sh.itjust.works · 7 months

You can save as HTML but animation and videos won’t work. Try singlefile extension

davel@lemmy.ml · 7 months

I used to use archive.today to archive news stories.

Why are specifically are you using archive.today? To post links that bypass paywalls, or for something else? Because if it’s for something else then there may be other solutions, like using archive.org or saving the page locally.

starlight@lemmy.ca · 7 months

I mainly use it to read articles that have paywalls.

davel@lemmy.ml · 7 months

AFAIK, archive.today is the best around for that, outside of installing the Bypass Paywalls Clean browser extension for Firefox or Chrome.

starlight@lemmy.ca · 7 months

I’ll take a look at the browser extensions. Thank you!

call_me_xale@lemmy.zip · 7 months

Just learned about Readeck the other day. Self-hosted for now, but it sounds like they’re planning to launch a centrally-hosted instance at some point, maybe keep an eye on that.

starlight@lemmy.ca · 7 months

I’ll definetely keep an eye on Readeck. Thank you!

golden_zealot@lemmy.ml · 7 months

If you have a machine and/or the storage for it, you could deploy a docker container of linkwarden and do it yourself for a lot of things.

It says it’s for “bookmarking” but in addition to storing the outbound link, it takes backups of pages as text, html, and PDF and can do so recursively with the pages links. Nice interface, makes stuff searchable and taggable etc.

starlight@lemmy.ca · 7 months

That’s really cool. I didn’t know Linkwarden could do that. I’ll further take a look at this, thank you!

Enternasyonal@lemmygrad.ml · 7 months

Tbh internet archive and wayback machine is the best option I can think of. It’s easy to use and I only had problems with it when I was looking for old archives from late 90s and early 2000s, it sometimes didn’t load. That’s the only problem I had w wayback m.

RedStrawberry@lemmy.blahaj.zone · 7 months

As others have said, SingleFile extenstion works well. I’ve also found zotero with the web extension quite good. Its useful for added organisation/catagoriesation especially since I’m already using it for academic work.

There is also zimit for use with kiwix, both a comandline version(see github) and website if you want something simpler.

Although I’ve found the website has long queues quite often and it may not get a clean backup if the website uses cloudflare or the like. But its useful if I need an offline copy of a website with many pages.

I recommend having a look at the archive team wiki page on software, here, see if anything fits your needs.

starlight@lemmy.ca · 7 months

They all look like they can work. Zimit especially looks interesting. I’ll take a look at all of them. Thank you!

hexagonwin@lemmy.sdf.org · 7 months

singlefile or webrecorder in chromium based browsers maybe?

self hosting is actually pretty easy actually :) we’re here to help too.

for large scale crawling i usually use archiveteam’s grab-site.

blueworld@piefed.world · 7 months

What’s your use case?

starlight@lemmy.ca · 7 months

Mainly to bypass paywalls.

cerafredo@reddthat.com · 6 months

deleted by creator