Skip to content

Feature request: Population of common fields for Resources Type: Articles, Blogs and Updates & News Article #4

@samqi

Description

@samqi

Is your feature request related to a problem? Please describe.
Resources Type: Articles, Blogs and Updates , some info & metadata is often already available in the published article and can be extracted via the use of some libraries

Describe the solution you'd like
if the metadata is present in the article, i would like some of the content to be autofilled or given an option to choose to fill, directly or indirectly in plone or via REST API

  • Default Tab:

    • Title: <Site/Publisher name> - Article Title
    • Summary: Article Summary
    • Lead Image: Article look based on python generated screenshot
    • Lead Image caption: Caption from article or LLAVA based description of image
    • Text: OR <rich html text via manual copy paste?>
    • Resource Type: "Articles, Blogs and Updates" (may apply for "News Article" too and possibly "Newsletter, Journal", "Press Statement or News Release" if its website based)
  • Ownership Tab:

    • Contributors: Article Author
    • Rights: Copyright information
  • Dates Tab:

    • Publishing Date: Get from site/article Publishing Date
    • Rights: Copyright information
  • Categorization Tab:

    • Countries: <default to Malaysia?) or <detect based on URL domain or article content?> or leave blank & already mentioned in #3
    • SDG Goals:
    • Development Themes: or leave blank
  • Partners Tab:

  • Accountable: already mentioned in #3 or leave blank if unsure

  • Implementing Partners: already mentioned in #3 or leave blank if unsure

Describe alternatives you've considered
As the current flow of webbased for Adding Resource, may not facilitate the data entry of metadata without signiifcant redesign of workflow, a Jupyter Notebook that will invoke the necessary metadata is also a possible option and input into as new Resource

Additional context

  • Article metadata extraction can use newspaper3k python lib
  • Article clean text might be better extracted using trafilatura python lib
  • Image - screenshot can use playwright or something like witnessme
  • Image caption can use tesseract, or LLAVA
  • SDG detection can use seesu python lib
  • Partners detection relies on the part describe in #3

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions