Any way to improving archiving success rate? #1618
Replies: 5 comments 4 replies
-
It would be great if there was an easy way to filter out for failed crawls, including those stopped at captchas, and recrawl through the wayback machine, 12ft.io, etc. I'm going to play around with the karakeep CLI to see if there's a way to do it through a script. I hope it is and then I may open an issue for a feature request or code a PR if time allows. ETA: Looks like I wasn't the only person with this suggestion. |
Beta Was this translation helpful? Give feedback.
-
I was thinking the other day about using archive.is manually as an intermediary like in #1306 but somehow that failed for me as well. Here's one I attempted earlier today before making this discussion post: After more testing it looks like I need to refresh the link an arbitrary number of times inside of karakeep before it goes through. I'll do more manual tests and see what the success rate looks like over the next few days. If there's other such services that you think are better suited I could do some tests on those too. |
Beta Was this translation helpful? Give feedback.
-
I've been testing with archive.is for a bit now by manually archiving there, then adding the archive url to karakeep. I think the ideal scenario would be to implement a system to allow the user to choose their archive service of choice where the karakeep server requests archival of the link, waits a period of time then attempts to retrieve the link at different intervals of time? There's probably a better way but I'm not a developer so it's beyond me at that point. |
Beta Was this translation helpful? Give feedback.
-
Hey! Best would be if you can give some more information, which URL you are trying, some log file pieces, the used Docker Compose file of the stack, and so on. Just put some more meat on the bone ;-) This will increase the chances of reproduce and catching the problem. Best Michael |
Beta Was this translation helpful? Give feedback.
-
archive.is wasn't a good choice after testing but web.archive.org works really well. Put up a feature request here: #1652 and in the meantime I'm using a shoddily patched together discord bot to make the process as easy as possible for myself. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I've noticed a lot of articles I try to save fail to archive. I'll save a bunch of articles while browsing google news on my phone in the morning, The app says "hoarded!" when I do so, then go to check them out later in the day and in the mobile app almost all of them say "null" as the text and nothing else.
In the web browser version it says the following:

Are there any settings or configuration I can be using to avoid this?
To be clear it isn't that it doesn't work at all, just a large amount of links I save do this, in the region of 50-60% if I had to guess.
Thanks,
Beta Was this translation helpful? Give feedback.
All reactions