Content types and custom scraping recipes #1715

rocketpear · 2025-07-08T22:53:41Z

rocketpear
Jul 8, 2025

When I was testing some other bookmarking / read-it-later tools before settling on Karakeep, I noticed that some detect "content types" and then approach the bookmarks in a different manner depending on the type. For example the content types can be:

Tweet
Recipe
Article
Store item
etc

When users save a tweet and later open the bookmark, they might be presented with UI taylored for a tweet and maybe some data relevant specifically in the context of a tweet (author's handle, posting date etc)
On the other hand, when users save news article,s the subsequent process might be focused mainly on producing the best possible "read-it-later" data. When they then open the bookmark, UI might be taylored to reading and highlighting.
For a recipe, the app might attempt to extract the important stuff (steps and ingredient list) and display those in a convenient manner so that users don't have to read trough the obligatory fluff surrounding the recipe.
You get the idea.

To expand on this approach, it might be mostly fine to try to autodetect the content type and then extract the relevant data, but in practice I think the expected content type can be already determined (or at least greatly narrowed down) based on URL. Perhaps there could be a system of configs that would define the content type (or some basic rules for identifying the content types) and where in the site the relevant data can be found based on URL. Users could create configs for their favourite sites and eventually contribute them back to help others reliably bookmark and archive whatever they're interested in.

I am not sure how viable this approach is tbh, but it at least seems to me that it could potentially imporve the accuracy of content archiving (like saving articles to be read later) and overall utility.

MohamedBassem · 2025-07-09T07:04:25Z

MohamedBassem
Jul 9, 2025
Maintainer

this is actually something planned: #1344

I'm not sure yet if it's going to be "automatically detected", but I'm planning to start introducing per website rules to do this kind of marking.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Content types and custom scraping recipes #1715

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Content types and custom scraping recipes #1715

Uh oh!

rocketpear Jul 8, 2025

Replies: 2 comments

Uh oh!

MohamedBassem Jul 9, 2025 Maintainer

rocketpear
Jul 8, 2025

MohamedBassem
Jul 9, 2025
Maintainer