Skip to content

Conversation

@Rabenherz112
Copy link
Contributor

@Rabenherz112 Rabenherz112 commented May 4, 2024

This is my best try on implement awesome-selfhosted/awesome-selfhosted-data#84, as well as some other nice-to-have metadata (which I personally would like to see, such as the current release and release date), which means fetching the metadata over the GrapSQL API instead of the Python package used.

New metadata preview:

name: Paperless-ngx
website_url: https://docs.paperless-ngx.com/
description: Scan, index, and archive all of your paper documents with an improved interface (fork of Paperless).
licenses:
  - GPL-3.0
platforms:
  - Python
  - Docker
tags:
  - Document Management
source_code_url: https://github.com/paperless-ngx/paperless-ngx
demo_url: https://demo.paperless-ngx.com/
stargazers_count: 17058
updated_at: '2024-05-07'
archived: false
current_release:
  tag: v2.8.1
  published_at: '2024-05-07'
commit_history:
  2024-05: 46

The new changes have been tested by running the full metadata processing on the awesome-selfhosted-data repo (last tested May 5). However I still have some open questions regarding how we should do some things, these things have been marked with TODO: in the code itself.

Additionally, there have been no tests written or modified (since I don't code normally or use python), so this is still open and would need to be done.

@Rabenherz112 Rabenherz112 marked this pull request as draft May 4, 2024 15:09
@Rabenherz112 Rabenherz112 changed the title Use GitHub GraphQL for Metadat fetching (with new metadata fields) Use GitHub GraphQL for Metadata fetching (with new metadata fields) May 4, 2024
@nodiscc nodiscc added enhancement New feature or request help wanted Extra attention is needed labels May 4, 2024
@nodiscc nodiscc added this to the 1.3.0 milestone May 4, 2024
@nodiscc
Copy link
Owner

nodiscc commented May 5, 2024

Thank you.
I will review this and #133 when I get some time, it might take a while, I will do it eventually but don't know when.

@nodiscc nodiscc removed the help wanted Extra attention is needed label May 5, 2024
@Rabenherz112
Copy link
Contributor Author

Rabenherz112 commented May 5, 2024

Pending Issues have been fixed, and a test has been done by running a full metadata processing on the awesome-selfhosted-data repository.

Some things in the code still have open Questions; see comments marked with TODO:.

@Rabenherz112
Copy link
Contributor Author

Just a quick ping to remind you about this. 😊
No rush at all, though!

@nodiscc nodiscc self-requested a review November 18, 2024 19:20
@nodiscc nodiscc marked this pull request as ready for review March 29, 2025 10:51
@nodiscc nodiscc self-assigned this Mar 29, 2025
@nodiscc nodiscc removed their assignment Apr 13, 2025
@Rabenherz112 Rabenherz112 requested a review from nodiscc April 20, 2025 09:15
@Rabenherz112 Rabenherz112 requested a review from nodiscc May 17, 2025 22:47
@Rabenherz112
Copy link
Contributor Author

Rabenherz112 commented Jul 31, 2025

Hey there,
It’s been two months since the last update, so if you have any availability, I would greatly appreciate another review of the PR.

Separately, I would like to challenge the current status of whether we should include the commit history. I am currently finishing up the last touches for a new alternative interface for awesome-selfhosted, and plan to ask for feedback in the data repo, once I've ironed out a few bugs in the next coming days.

Maybe as a bit of a spoiler why I would appreciate, if he could include it, here are the interfaces resulting out of the datasets:

@nodiscc
Copy link
Owner

nodiscc commented Jul 31, 2025

These web interfaces look absolutely awesome
I will reconsider adding the commit history in the yaml data files
Will try to review and merge this as soon as possible (possibly this week)
Is there any chance you could share the source code and deployment process for your web interface? We could host it alongside the current website for some time, and why not, make it the default

@Rabenherz112
Copy link
Contributor Author

Rabenherz112 commented Jul 31, 2025

The current version is already open-source (under AGPL-3.0 license) (However this is not yet production ready - I only push to the repo, when I want to check the code against the live data from awesome-selfhosted).

For deployment see this repo (the github action there put's this already on my webserver): https://github.com/Rabenherz112/awesome-selfhosted-web

Quite important to mention also (will do this as well, when I open the discussion in awesome-selfhosted-data, since I find this rather important) that quite a lot of this code is AI generated / fixed and patched, since I am a Sysadmin not a web dev.

- Rewrite of the add_github_metadata() function
    - Uses GraphQL API instead of REST API
    - Adds new additional metadata keys commit_history, current_release
    - Adds new module options batch_size, commit_history_clean_months, commit_history_fetch_months with default values
- Updates setup.py to include new dependencies
- Updates tests to include new module options
@Rabenherz112
Copy link
Contributor Author

Rabenherz112 commented Oct 19, 2025

Little update on this PR.
Since I have been working a lot more with Python lately, I wanted to give this PR a slight rework and apply what I have learned (since initially when I did this PR, this was with one of my first real interactions with python).

The core functionality remains unchanged and has been tested successfully against the awesome-selfhosted-data repo.
Since I expect the review will probably still take a while (this could slowly become a meme 😆), I went ahead and re-added the commit_history key (maybe by the time this PR is actually merged, we'll have a first-party use case for it).

Some improvements I did were:

  • Enhanced the commit_history key with better error handling
  • Added support for initial runs to backfill complete history
  • Refactored for improved readability
  • Added error catching, especially when GraphQL randomly throws error 500, or just gives you empty data for no reason what so ever (it is crazy, how just retiring it with the same query, sometimes is already enough)

PS: I don't understand why the test is failing.. I didn't even touch utils.py or archive_webpages.py (So I assume, I can ignore those errors)

Edit: I use this new version in my pipeline for my own fronted for awesome-selfhosted.

Copy link
Owner

@nodiscc nodiscc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this, reviewed, tested and works well.
I fixed the remaining pylint errors, I don't know why they didn't show earlier.
I will make a release, and then a PR in awesome-selfhosted to bump the hecat version there.
Thanks for your patience

Edit: other tests started failing in the archive_webpage, this is not related to this PR but it's expected that they start failing when websites against which the tests are run change things. I will fix these here as well.

… a comma? (hecat.processors.github_metadata, line 225)' (syntax-error)
- jinja.palletsprojects.com now issues 403 when tested from github actions, run the test agains docs.ansible.com instead
@nodiscc nodiscc merged commit 41f347d into nodiscc:master Oct 23, 2025
1 check passed
@Rabenherz112
Copy link
Contributor Author

Massive thanks, for reviewing and merging it, even though you are quite busy (really did expect that) <3

@Rabenherz112 Rabenherz112 deleted the feat-commit-history branch October 23, 2025 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants