-
-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checking only links from modified files in a Pull Request? #17
Comments
Hey that's a good feature to have. I checked the code (https://github.com/gaurav-nelson/github-action-markdown-link-check/blob/master/entrypoint.sh#L155) and it doesn't seem too hard to port. |
This comment was marked as resolved.
This comment was marked as resolved.
Just to clarify, the diff checking would exactly work like in github-action-markdown-link-check because we'd use the same code. So if that works for you, we should be all set. 😄 |
This comment was marked as outdated.
This comment was marked as outdated.
I like this idea a lot! Is this still something that you plan on adding? |
Yes, I still plan to add that feature by porting over the code from github-action-markdown-link-check. |
This would be amazing! I was doing some testing to add the action to the fastify repo, but it reaches the api rate limit easily for github links |
This issue and the markdown-link-check action mention "check modified files" in particular. Would that also be possible and do you consider this a useful addition? |
Internal links workaround via local webserverIf the links are internal, you could run a simple webserver (Caddy is quite easy). At least I assume Github has no restrictions for links that aren't external traffic? (I tried looking up docs on rate limits, but only found mention of these applying to APIs or to products/services Github hosts) If your internal links to test aren't able to be replaced/mapped to Minimizing requestsPrioritize reachable hosts, then individual linksFor external links, I suppose the best options available to minimize requests would be caching the FQDN availability and only checking the first request to each unique FQDN, then afterwards you could check all other links are actually valid, not just the host for it is reachable. Leverage the response cacheI still haven't got to try lychee out yet, but the devs did add response caching support a while back. If you're hitting the limit due to frequency of runs, leveraging that caching feature across runs should make a big difference :) Workaround for rate-limitFor restricted network requests, I did see this issue, where they are apparently hitting limits because of some links to Github hosted services, which suggested excluding these with a lychee option. You could always have a scheduled workflow that checks your links daily (or hourly) with the response caching in place for working around that hourly limit (with the cache being a larger TTL like 24 hours). Assuming responses that get You would probably want to have such workflows (or this action in general) be conditionally run only if querying the current available requests for the hour is still sufficient to support any other workflows you have that may need to perform Github API requests. |
Do you really need filtering to individual lines for a single file? If you don't have any problems you're trying to avoid, you'll have better reliability with processing the entire file through Here is an example that retrieves the diff from this commit (docs deployment from Github Actions). I haven't tried integrating with the Line diff scope (individual lines per file):Click to view details# Get lychee:
$ docker run --rm -it rust bash
$ cargo install lychee
# Example repo with a branch of deployed docs, clone it to get a diff from desired commit
$ git clone https://github.com/docker-mailserver/docker-mailserver -b gh-pages
$ cd docker-mailserver
# Finally show a specific commit diff (default format is -p for patch), and only show diff for files with `.html` suffix,
# Then filter to addition/modification lines (green lines on github, starts with +),
# And feed these lines into lychee to parse for any links:
$ git show d4ba2b7f5027a2868e19c68fcba41b34b2bad1fd '*.html' | grep '^+' | lychee -
🔍 3 Total ✅ 3 OK 🚫 0 Errors It successfully parsed the 3 links in files that were changed. # For context, these are the 3 filtered lines of that commit:
$ git show d4ba2b7f5027a2868e19c68fcba41b34b2bad1fd '*.html' | grep '^+' | grep 'href'
+<p><a href="https://github.com/docker-mailserver/docker-mailserver/blob/master/target/postfix/main.cf">Our default Postfix configuration</a> can easily be extended to add parameters or modify existing ones by providing a <code>docker-data/dms/config/postfix-main.cf</code>. This file uses the same format as Postfix <code>main.cf</code> does (<a href="http://www.postfix.org/postconf.5.html">See official docs</a> for all parameters and syntax rules).</p>
+<p>One can easily increase the <a href="http://www.postfix.org/postconf.5.html#compatibility_level">backwards-compatibility level</a> and set new Postscreen options:</p> It doesn't have to be an existing commit just a diff, or any other approach you have of collecting only the changed lines and feeding those into The merged commit for the PR associated to that docs deployment commit has different files/content but works just as well with the same technique, so it could be used on the source files (in my case PR content differs for external and internal links, with the latter being relative links that the build tool handles): $ git show 0010786d181a3a6c70856d4c34e46b60ad015b7e '*.md' | grep '^+' | lychee -
🔍 3 Total ✅ 3 OK 🚫 0 Errors
# Just to show the same links that are found:
$ git show 0010786d181a3a6c70856d4c34e46b60ad015b7e '*.md' | grep '^+' | grep 'http'
+[Our default Postfix configuration](https://github.com/docker-mailserver/docker-mailserver/blob/master/target/postfix/main.cf) can easily be extended to add parameters or modify existing ones by providing a `docker-data/dms/config/postfix-main.cf`. This file uses the same format as Postfix `main.cf` does ([See official docs](http://www.postfix.org/postconf.5.html) for all parameters and syntax rules).
+ One can easily increase the [backwards-compatibility level](http://www.postfix.org/postconf.5.html#compatibility_level) and set new Postscreen options: I'm assuming a bit more work would be needed to verify link references are valid. The link referenced may not have changed, but there might be a typo in the reference name or the referenced link itself was accidentally not added by the contributor. The approach above won't work, an example is from this markdown docs commit: - SASLauthd with LDAP auth (please see the note [down below](#ldap-setup))
...
Add these additions to the `mailserver` service in your [`docker-compose.yml`][github-file-compose]: The first references a heading ( These kind of changes, would be local file issues, which could be handled via a separate pass (or alternatively preferring to check links against build output, not source files). Thus Actually... perhaps echo -e '[test](./relative/path/to/file.md)\n\n[test-ref][ref-file]\n\n[ref-file]: ./relative/path/to/another/file.md' \
| lychee --dump - File-level diff scope (file paths of changed files)Click to view detailsIf you don't need granular line changes, you can detect which files were changed instead and provide just these files to This was a bit confusing for me to figure out how to get the list of file paths as args to # --no-commit-id --name-only will list only file paths involved in the commit.
# --diff-filter 'AM' is filtering the files in the diff to only be ones added (A) or modified (M).
# -r (recursive) is to include files in sub-directories, so that all files belonging to the commit are listed.
# HEAD references the latest commit, but could be a commit hash like previous examples.
# '*.md' like above is to filter files to only those with '.md' extension/suffix, uses gitignore syntax,
# ':!README.md' shows how to add more filter patterns, this one excludes with ':!' the top-level 'README.md' file.
#
# xargs splits the multi-line output by '\n' (defaults don't seem to work well with this git diff-tree output), and
# provides each line to lychee as a separate arg, spaces in filepath will be fine, as if quote wrapped.
git diff-tree --no-commit-id --name-only --diff-filter 'AM' -r HEAD '*.md' | xargs -d '\n' lychee --dump -- This should be more reliable, for other types of link support since actual files will be parsed. Leverage the response cache feature and file-level scope for changes should work very close to changed lines scope. At least assuming your File or line scope for PR branch diffClick to view detailsFor PRs, if you're wanting to compare changes from the PR branch to a target branch (eg: # Example with --merge-base, can reference by commit hashes, branch names, or any "tree-ish" value:
# https://stackoverflow.com/questions/4044368/what-does-tree-ish-mean-in-git/18605496#18605496
# The PR would be the 2nd value, to get it's additions/modifications when added to the 1st (eg: master branch).
# Make sure your checked out branch is the 2nd one being compared, so that the correct content is parsed.
git diff-tree --no-commit-id --name-only --diff-filter 'AM' -r --merge-base master remote/branch-name | xargs -d '\n' lychee --dump --
# Same syntax, but swaps '--name-only' for '-p' and adds a filter with grep to output changed lines:
git diff-tree --no-commit-id -p --diff-filter 'AM' -r --merge-base master some-local-branch | grep '^+ ' | lychee --dump - Note that if you have lots of filepaths, you may exceed the character limit for args I think.
I probably should have read through the earlier comments as I see I ended up sharing pretty much the same thing that was linked early on 😅 I wanted to share easier commands, so avoided using a shell script with a loop and array (which would have been easier for me, than trying to resolve the issues I encountered with |
This comment was marked as outdated.
This comment was marked as outdated.
Thanks for the heads-up @polarathene. jobs:
build:
runs-on: ubuntu-latest
name: Test changed-files
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0 # OR "2" -> To retrieve the preceding commit.
- name: Get changed files
id: changed-files
uses: tj-actions/changed-files@v23
- name: Link Checker
id: lychee
uses: lycheeverse/[email protected]
with:
args: --verbose --no-progress ${{ steps.changed-files.outputs.all_changed_files }}
env:
GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}} |
I guess the problem with this in PRs would be to set the fetch depth dynamically to ensure all files modified in the PR are checked |
This comment was marked as outdated.
This comment was marked as outdated.
I got it working, but as suspected if the file path has a space, some extra work is needed to be compatible with Failed attempt with xargsjobs:
build:
runs-on: ubuntu-latest
name: Test changed-files
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 2
- name: 'A - Get a list of changed files to process'
id: changed-files-a
run: |
# Gets a diff of files from git with Added (A) or Modified (M) status,
# Compares the default generated "test merge commit" by Github (HEAD) to the base branch (base.sha == HEAD^1),
# `sed` then converts from multi-line (`\n`) to single-line delimited by `,` to support file paths containing spaces:
CHANGED_FILES=$(git diff-tree --name-only --diff-filter 'AM' -r HEAD^1 HEAD | sed -z 's/\n/,/g;s/,$//')
echo "::set-output name=all_changed_files::${CHANGED_FILES}"
# Same output as prior step:
- name: 'B - Get a list of changed files to process'
id: changed-files-b
uses: tj-actions/changed-files@v23
with:
separator: ','
- name: Link Checker
id: lychee
uses: lycheeverse/[email protected]
with:
args: --verbose --no-progress $( xargs -d ',' <<< "${{ steps.changed-files-a.outputs.changed_files }}" ) The usage of I thought this would work due to Line 17 in 76ab977
But... as it turns out this did not work. I realized to make it I had trouble trying to make that approach work. I gave up and added another intermediate step which works: jobs:
build:
runs-on: ubuntu-latest
name: Test changed-files
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 2
- name: 'A - Get a list of changed files to process'
id: changed-files-a
run: |
# Gets a diff of files from git with Added (A) or Modified (M) status,
# Compares the default generated "test merge commit" by Github (HEAD) to the base branch (base.sha == HEAD^1),
# `sed` then converts from multi-line (`\n`) to single-line delimited by `,` to support file paths containing spaces:
CHANGED_FILES=$(git diff-tree --name-only --diff-filter 'AM' -r HEAD^1 HEAD | sed -z 's/\n/,/g;s/,$//')
echo "::set-output name=all_changed_files::${CHANGED_FILES}"
# Same output as prior step:
- name: 'B - Get a list of changed files to process'
id: changed-files-b
uses: tj-actions/changed-files@v23
with:
separator: ','
# lychee-action requires pre-processing for file paths with spaces
# NOTE: Github outputs and ENV do not support `\n`, so rely on single-line string or read/write a file
- name: 'Compatibility with eval: Quote wrap paths to support spaces'
id: changed-files
env:
# Change between a or b step above:
changed_files: ${{ steps.changed-files-a.outputs.all_changed_files }}
run: |
QUOTE_WRAPPED="'$( sed "s/,/' '/g" <<< '${{ env.changed_files }}' )'"
echo "::set-output name=all_changed_files::${QUOTE_WRAPPED}"
- name: Link Checker
id: lychee
uses: lycheeverse/[email protected]
with:
args: --verbose --no-progress ${{ steps.changed-files.outputs.all_changed_files }} So not being able to use I did try messing around with It'd probably be better for I did not have much luck looking up GH actions limits for input lengths which might be a problem on larger projects? There is this limit for Alternatively not supporting file paths with spaces for |
Thanks for the thorough writeup. To avoid issues with spaces, have you tried terminating the inputs with a NUL terminator? |
Yes, but I don't see how that would help given the input constraints? (not just with spaces, but passing data to an input through from a prior step output/env, and the usage of Like with Perhaps I'm doing it wrong, but here's what I tried. I also think it highlights why this is frustrating to try workaround/resolve? (I know I've spent far more time looking into this than desirable, hoping that others would not have to go through these hurdles to accomplish the same goal) Attempts with xargs that failed to support spaces in paths for lychee-actionFirst example, swapping $ xargs -t -0 printf "'%s' " <<< $(git diff-tree --name-only --diff-filter 'AM' -r --merge-base dms/master docs/ci-build-catch-failures | tr '\n' '\0')
bash: warning: command substitution: ignored null byte in input
printf "'%s' " '.github/workflows/scripts/docs/build-docs.shdocs/content/config/environment.mddocs/content/config/user-management/accounts.mddocs/content/examples/tutorials/basic-installation.mddocs/content/examples/uses-cases/forward-only-mailserver-with-ldap-authentication.md'$'\n'
# Note `$'\n'` at end results in newline for the last quote, and null (\0) ignored, no separation of lines:
'.github/workflows/scripts/docs/build-docs.shdocs/content/config/environment.mddocs/content/config/user-management/accounts.mddocs/content/examples/tutorials/basic-installation.mddocs/content/examples/uses-cases/forward-only-mailserver-with-ldap-authentication.md
' xargs without $ xargs -t printf "'%s' " <<< $(git diff-tree --name-only --diff-filter 'AM' -r --merge-base dms/master docs/ci-build-catch-failures | tr '\n' '\0')')
bash: warning: command substitution: ignored null byte in input
printf "'%s' " .github/workflows/scripts/docs/build-docs.shdocs/content/config/environment.mddocs/content/config/user-management/accounts.mddocs/content/examples/tutorials/basic-installation.mddocs/content/examples/uses-cases/forward-only-mailserver-with-ldap-authentication.md
# No `$'\n'` at end to printf now, but null bytes as delimiter still ignored resulting in one big string:
'.github/workflows/scripts/docs/build-docs.shdocs/content/config/environment.mddocs/content/config/user-management/accounts.mddocs/content/examples/tutorials/basic-installation.mddocs/content/examples/uses-cases/forward-only-mailserver-with-ldap-authentication.md' Original diff output with $ xargs -t printf "'%s' " <<< $(git diff-tree --name-only --diff-filter 'AM' -r --merge-base dms/master docs/ci-build-catch-failures)
printf "'%s' " .github/workflows/scripts/docs/build-docs.sh docs/content/config/environment.md docs/content/config/user-management/accounts.md docs/content/examples/tutorials/basic-installation.md docs/content/examples/uses-cases/forward-only-mailserver-with-ldap-authentication.md
# Each filepath is properly quoted now (treated as separate args):
'.github/workflows/scripts/docs/build-docs.sh' 'docs/content/config/environment.md' 'docs/content/config/user-management/accounts.md' 'docs/content/examples/tutorials/basic-installation.md' 'docs/content/examples/uses-cases/forward-only-mailserver-with-ldap-authentication.md' But actually fails for paths with spaces, as white-space is split on too: $ xargs -t printf "'%s' " <<< $(echo -e 'file.txt\nanother/file.txt\npath/that has/spaces in it.txt\npath/to/another-file.txt')
printf "'%s' " file.txt another/file.txt path/that has/spaces in it.txt path/to/another-file.txt
'file.txt' 'another/file.txt' 'path/that' 'has/spaces' 'in' 'it.txt' 'path/to/another-file.txt' Fixed with $ xargs -t -d '\n' printf "'%s' " <<< $(echo -e 'file.txt\nanother/file.txt\npath/that has/spaces in it.txt\npath/to/another-file.txt')
printf "'%s' " file.txt another/file.txt 'path/that has/spaces in it.txt' path/to/another-file.txt
'file.txt' 'another/file.txt' 'path/that has/spaces in it.txt' 'path/to/another-file.txt' Now change to $ xargs -t -d ',' printf "'%s' " <<< $(echo -e 'file.txt,another/file.txt,path/that has/spaces in it.txt,path/to/another-file.txt')
# Notice the `$'\n'` added by xargs:
printf "'%s' " file.txt another/file.txt 'path/that has/spaces in it.txt' 'path/to/another-file.txt'$'\n'
# Broken with newline at last input again:
'file.txt' 'another/file.txt' 'path/that has/spaces in it.txt' 'path/to/another-file.txt
' Try fix failure by making the trailing '\n' a null byte: $ xargs -t -d ',' printf "'%s' " <<< $(echo -e 'file.txt,another/file.txt,path/that has/spaces in it.txt,path/to/another-file.txt' | tr '\n' '\0')
bash: warning: command substitution: ignored null byte in input
printf "'%s' " file.txt another/file.txt 'path/that has/spaces in it.txt' 'path/to/another-file.txt'$'\n'
'file.txt' 'another/file.txt' 'path/that has/spaces in it.txt' 'path/to/another-file.txt
'
# With xargs -0:
$ xargs -t -d ',' -0 printf "'%s' " <<< $(echo -e 'file.txt,another/file.txt,path/that has/spaces in it.txt,path/to/another-file.txt' | tr '\n' '\0')
bash: warning: command substitution: ignored null byte in input
printf "'%s' " 'file.txt,another/file.txt,path/that has/spaces in it.txt,path/to/another-file.txt'$'\n'
'file.txt,another/file.txt,path/that has/spaces in it.txt,path/to/another-file.txt
' Additional reference, this command with $ xargs -t -0 printf "'%s' " <<< $(git diff-tree --name-only --diff-filter 'AM' -r --merge-base dms/master docs/ci-build-catch-failures)
printf "'%s' " '.github/workflows/scripts/docs/build-docs.sh'$'\n''docs/content/config/environment.md'$'\n''docs/content/config/user-management/accounts.md'$'\n''docs/content/examples/tutorials/basic-installation.md'$'\n''docs/content/examples/uses-cases/forward-only-mailserver-with-ldap-authentication.md'$'\n'
'.github/workflows/scripts/docs/build-docs.sh
docs/content/config/environment.md
docs/content/config/user-management/accounts.md
docs/content/examples/tutorials/basic-installation.md
docs/content/examples/uses-cases/forward-only-mailserver-with-ldap-authentication.md
' To be clear...
|
Closing this in favor of #238, which is a clever and superior solution in my opinion. It uses lychee's |
This is a feature that
github-action-markdown-link-check
offers.Could this action support that as well? (or document advice on how to provide only relevant file paths to check)
Almost 18 months later, I revisited this request, comments of interest are:
lychee
via STDIN input is ill-advised depending on content/context.args
inlychee-action
.xargs
as viable forlychee
, but notlychee-action
as a user. Requested a separatelychee-action
input, or forlychee
an--inputs-from
option.Examples - Workflows that can be used currently
The first two support file paths containing spaces.
Without considering file paths containing spaces; if a mistake is made or a third-party contributor provides a file path containing a space,
lychee
will treat that path as separate files/inputs which would be invalid (or the wrong files to check).Manually generating the file paths to check
I looked into this extensively, and I'm rather confident of this approach.
Click to view YAML
--diff-filter
option as appropriate to your needs. View thegit diff-tree
docs linked for other options that may be of interest such as ignoring white-space only changes.sed
command is specifically forlychee-action
needs. It folds the multi-line output ofgit diff-tree
(\n
/ line delimited list of file paths_) into a single line, separating paths with single quotes and later in thelychee-action.args
input adding single quotes to the start/end, so that file paths can support spaces when provided as a single line toeval
(whichlychee-action
uses internally). This handling is not required forlychee
, which can instead replacesed
withxargs -d '\n' lychee --
.Delegate to
tj-actions/changed-files
action to get the file pathsAfter this PR is resolved, the pre-processing step for
lychee-action
can likely be dropped and the output oftj-actions/changed-files
provided to a new input forlychee-action
that handles the delimited list of paths internally.Click to view YAML
Simpler version when file paths of inputs would never contain spaces
If you are confident that this won't be a concern to support, the extra pre-processing step can be avoided.
NOTE: If adapting to the "manual" example, you must still convert
\n
to space, unlesslychee
gets a--inputs-from
option, then you can provide a file withgit diff-tree
output directly (or without writing an actual file).Click to view YAML
The text was updated successfully, but these errors were encountered: