Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check links relevant to the changes in a PR only #238

Closed
f-hollow opened this issue Sep 4, 2024 · 9 comments
Closed

Check links relevant to the changes in a PR only #238

f-hollow opened this issue Sep 4, 2024 · 9 comments

Comments

@f-hollow
Copy link

f-hollow commented Sep 4, 2024

Naturally, it is desirable to minimize bothering your contributor with issues introduced by somebody else in your repo. So I need a way to check and report invalid links (1) added / updated in that contributor's PR only, or (2) existing links affected by the introduced changes.

It appeared to be harder than I expected. You cannot run lychee on the changes only, because the links internal to your repo (checked with my beloved --include-fragments option) can and will produce false positives.

Browsing around this repo, I came across mre's comment about separately checking existing links and new links. This idea nudged me in the right direction! The related issues #17 and #134 offer no immediately usable solution yet.

Below is the workflow that I created and tested so far. It works and does what I need.

Possible improvements:

  • I have to use git stash, otherwise GitHub CI complains about the untracked file links-main.txt (not sure why it works like this):
    $ git checkout my_feature_branch
    error: The following untracked working tree files would be overwritten by checkout:
      links-main.txt
    Please move or remove them before you switch branches.
    Aborting
    Action: I wonder if there is a way to further simplify the steps or reduce their number (like not using git stash)?
  • If a contributor removes a file, the existing internal links referring to this file wouldn't be reported as invalid, because .lycheeignore still contains them. My repo has a cron job checking all links in main once a week, which should take care of such issues for the time being.
    Action: In this workflow, it might be good to recheck existing internal links affected by removed files.
  • I am sure there are other ways to improve this workflow.
    Action: Any other ways to further optimize this workflow or make it more robust?

Your feedback or further improvements to this workflow would be very much appreciated!

The workflow that checks links relevant to the changes in a PR only

name: Check links in diffs

on:
  pull_request:
    branches: [main]

jobs:
  check-links:
    runs-on: ubuntu-latest
    steps:
      - name: Clone repository
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
          ref: ${{github.event.pull_request.head.ref}}
          repository: ${{github.event.pull_request.head.repo.full_name}}

      - name: Check out main branch
        run: git checkout main

      - name: Dump all links from main
        id: dump_links_from_main
        uses: lycheeverse/lychee-action@v1
        with:
          args: |
            --dump
            --include-fragments
            .
          output: ./links-main.txt

      - name: Stash untracked files
        run: git stash push --include-untracked

      - name: Check out feature branch
        run: git checkout ${{ github.head_ref }}

      - name: Apply stashed changes
        # Apply stashed changes, ignore errors if stash is empty
        run: git stash pop || true

      - name: Append links-main.txt to .lycheeignore
        run: cat links-main.txt >> .lycheeignore

      - name: Check links
        uses: lycheeverse/lychee-action@v1
        with:
          args: |
            --no-progress
            --include-fragments
            .
          # Fail action on broken links
          fail: true

      - name: Suggestions
        if: failure()
        run: |
          echo -e "\nPlease review the links reported in the Check links step above."
          echo -e "If a link is valid but fails due to a CAPTCHA challenge, IP blocking, login requirements, etc.,
          consider adding such links to .lycheeignore file to bypass future checks.\n"
          exit 1
@mre
Copy link
Member

mre commented Sep 4, 2024

Interesting.

I think we could add that to our documentation at https://github.com/lycheeverse/lycheeverse.github.io

It could be next to this recipe in the documentation hierarchy:
https://lychee.cli.rs/github_action_recipes/add-pr-comment/

Would you like to create a pull request? Otherwise I can also take care of it.

@mre
Copy link
Member

mre commented Sep 4, 2024

I have to use git stash, otherwise GitHub CI complains about the untracked file links-main.txt (not sure why it works like this):
$ git checkout my_feature_branch
error: The following untracked working tree files would be overwritten by checkout:
links-main.txt
Please move or remove them before you switch branches.
Aborting

That's weird. When I remove the Stash untracked files and Apply stashed changes, it still works for me.

@SRv6d
Copy link

SRv6d commented Sep 10, 2024

Could this be implemented in the action or lychee itself?

@mre
Copy link
Member

mre commented Sep 12, 2024

I guess it could be. Question is if we should. It would be a maintenance burden. Is copy-pasting the GitHub workflow an issue?

@f-hollow
Copy link
Author

f-hollow commented Sep 20, 2024

@mre I still want to implement checking internal links affected by removed files as mentioned above. Once I finish, I will create a pull request to add this information to the docs as you suggested.

It might take a couple of weeks though, since I am a bit busy recently.

@mre
Copy link
Member

mre commented Sep 20, 2024

Cool. Thanks a lot!

@mre
Copy link
Member

mre commented Oct 7, 2024

@f-hollow, any updates? Would love to have this.

@f-hollow
Copy link
Author

f-hollow commented Oct 24, 2024

@f-hollow, any updates? Would love to have this.

Thank you very much for waiting.

This workflow has been running in my repo all this time and there are two instances of strange behavior which I would like to inspect. Then I should be able to create a PR, most likely no later than next week.

Just noticed that a PR has already been created. Thank you @sekyondaMeta!

In this case, I will see if any improvements are needed :)

@mre
Copy link
Member

mre commented Oct 25, 2024

I guess that's a misunderstanding. The referenced pull request targets pytorch/tutorials, not the lychee docs. I went ahead and created a recipe in our documentation here. The original commit is here.

Please double-check if I made a mistake and feel free to update the docs over there.
Apart from that, it looks like we're done here. Thanks for the great idea to use --dump to get a diff to the base branch.

@mre mre closed this as completed Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants