-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Is your feature request related to a problem? Please describe.
Resources Type: Articles, Blogs and Updates , some info & metadata is often already available in the published article and can be extracted via the use of some libraries
Describe the solution you'd like
if the metadata is present in the article, i would like some of the content to be autofilled or given an option to choose to fill, directly or indirectly in plone or via REST API
-
Default Tab:
- Title: <Site/Publisher name> - Article Title
- Summary: Article Summary
- Lead Image: Article look based on python generated screenshot
- Lead Image caption: Caption from article or LLAVA based description of image
- Text: OR <rich html text via manual copy paste?>
- Resource Type: "Articles, Blogs and Updates" (may apply for "News Article" too and possibly "Newsletter, Journal", "Press Statement or News Release" if its website based)
-
Ownership Tab:
- Contributors: Article Author
- Rights: Copyright information
-
Dates Tab:
- Publishing Date: Get from site/article Publishing Date
- Rights: Copyright information
-
Categorization Tab:
- Countries: <default to Malaysia?) or <detect based on URL domain or article content?> or leave blank & already mentioned in #3
- SDG Goals:
- Development Themes: or leave blank
-
Partners Tab:
-
Accountable: already mentioned in #3 or leave blank if unsure
-
Implementing Partners: already mentioned in #3 or leave blank if unsure
Describe alternatives you've considered
As the current flow of webbased for Adding Resource, may not facilitate the data entry of metadata without signiifcant redesign of workflow, a Jupyter Notebook that will invoke the necessary metadata is also a possible option and input into as new Resource
Additional context
- Article metadata extraction can use newspaper3k python lib
- Article clean text might be better extracted using trafilatura python lib
- Image - screenshot can use playwright or something like witnessme
- Image caption can use tesseract, or LLAVA
- SDG detection can use seesu python lib
- Partners detection relies on the part describe in #3