Content types and custom scraping recipes #1715
rocketpear
started this conversation in
Ideas
Replies: 2 comments
-
this is actually something planned: #1344 I'm not sure yet if it's going to be "automatically detected", but I'm planning to start introducing per website rules to do this kind of marking. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
When I was testing some other bookmarking / read-it-later tools before settling on Karakeep, I noticed that some detect "content types" and then approach the bookmarks in a different manner depending on the type. For example the content types can be:
When users save a tweet and later open the bookmark, they might be presented with UI taylored for a tweet and maybe some data relevant specifically in the context of a tweet (author's handle, posting date etc)
On the other hand, when users save news article,s the subsequent process might be focused mainly on producing the best possible "read-it-later" data. When they then open the bookmark, UI might be taylored to reading and highlighting.
For a recipe, the app might attempt to extract the important stuff (steps and ingredient list) and display those in a convenient manner so that users don't have to read trough the obligatory fluff surrounding the recipe.
You get the idea.
To expand on this approach, it might be mostly fine to try to autodetect the content type and then extract the relevant data, but in practice I think the expected content type can be already determined (or at least greatly narrowed down) based on URL. Perhaps there could be a system of configs that would define the content type (or some basic rules for identifying the content types) and where in the site the relevant data can be found based on URL. Users could create configs for their favourite sites and eventually contribute them back to help others reliably bookmark and archive whatever they're interested in.
I am not sure how viable this approach is tbh, but it at least seems to me that it could potentially imporve the accuracy of content archiving (like saving articles to be read later) and overall utility.
Beta Was this translation helpful? Give feedback.
All reactions