Bug fix: #79 api update and duplicate remove option#80
Closed
PawsFunctions wants to merge 2 commits intortuszik:mainfrom
Closed
Bug fix: #79 api update and duplicate remove option#80PawsFunctions wants to merge 2 commits intortuszik:mainfrom
PawsFunctions wants to merge 2 commits intortuszik:mainfrom
Conversation
…nse is filled with duplicate
Edit: updated get_existing_links to use new search api and changed cursor
logic to use nextCursor from response.
ADD: delete_links function to delete all duplicate links (OPT_DELETE_DUPLICATE).
ADD: DEBUG enviroment variable to replicate -d.
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In
delete_links, you catchTimeoutbut it’s not defined in this file unless imported elsewhere; consider either importingrequests.exceptions.Timeoutexplicitly or catchingrequests.Timeout/requests.RequestExceptionconsistently to avoid a runtimeNameError. get_existing_linksperforms duplicate deletion only after the generator has been fully consumed, which makes the side effect dependent on the caller exhausting the iterator; consider exposing duplicate cleanup as an explicit function or making the behavior non-lazy to avoid surprising partial-iteration behavior.- When duplicate deletion is enabled, logging the full list of duplicate IDs at info level (
duplicate_link_ids) can become very noisy for large collections; consider logging only counts or truncating the list, and reserve detailed IDs for debug logging.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `delete_links`, you catch `Timeout` but it’s not defined in this file unless imported elsewhere; consider either importing `requests.exceptions.Timeout` explicitly or catching `requests.Timeout`/`requests.RequestException` consistently to avoid a runtime `NameError`.
- `get_existing_links` performs duplicate deletion only after the generator has been fully consumed, which makes the side effect dependent on the caller exhausting the iterator; consider exposing duplicate cleanup as an explicit function or making the behavior non-lazy to avoid surprising partial-iteration behavior.
- When duplicate deletion is enabled, logging the full list of duplicate IDs at info level (`duplicate_link_ids`) can become very noisy for large collections; consider logging only counts or truncating the list, and reserve detailed IDs for debug logging.
## Individual Comments
### Comment 1
<location> `starwarden/linkwarden_api.py:13-22` </location>
<code_context>
-def get_existing_links(linkwarden_url, linkwarden_token, collection_id):
- url = f"{linkwarden_url.rstrip('/')}/api/v1/links"
+def get_existing_links(linkwarden_url, linkwarden_token, collection_id, delete_duplicate=False):
+ """
+ Get existing links from a collection with optional duplicate detection and deletion.
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Duplicate deletion behavior depends on the caller fully exhausting the generator, which could be surprising.
Because duplicate detection and `delete_links` run only after the `while` loop finishes, they only occur if the caller fully consumes the generator. Any caller that stops early (e.g., via `break`) will never trigger deletion, even with `delete_duplicate=True`. To avoid this implicit coupling, consider either making `get_existing_links` eager (non-generator) or moving duplicate deletion into a separate, explicit API.
Suggested implementation:
```python
def get_existing_links(linkwarden_url, linkwarden_token, collection_id, delete_duplicate=False):
"""
Get all existing links from a collection, with optional duplicate detection and deletion.
This function eagerly fetches all pages of links into memory before performing
any duplicate detection or deletion. This guarantees that duplicate handling
does not depend on the caller exhausting a generator.
Args:
linkwarden_url: Base URL of Linkwarden instance.
linkwarden_token: Authorization token for the API.
collection_id: Collection ID to fetch links from.
delete_duplicate: If True, automatically delete duplicate links found.
Returns:
A list of all link URLs (both unique and duplicates). When
delete_duplicate=True, the list still contains the URLs as originally
returned by the API; duplicates are deleted after collection, not filtered
out of the returned list.
```
To fully implement the eager behavior and remove the dependency on exhausting a generator, you should also:
1. Inside `get_existing_links`:
- Initialize an in-memory list before the paging loop, e.g. `all_links = []`.
- Replace any `yield` statements in the `while`/paging loop with code that appends to this list (e.g. `all_links.append(url)` or `all_links.extend(urls)`).
- Ensure the paging loop runs to completion unconditionally (i.e., do not return early from inside the loop).
2. After the paging loop finishes:
- Run your duplicate detection on `all_links` to determine which links are duplicates.
- If `delete_duplicate` is True, call your existing deletion helper (e.g. `delete_links(...)`) with the set/list of duplicate link IDs.
3. At the end of the function:
- `return all_links` instead of using a generator return pattern.
4. If there are callers that rely on streaming behavior for very large collections, consider:
- Introducing a separate streaming helper (e.g. `iter_existing_links(...)`) that exposes the generator behavior but does **not** perform any deletion, while keeping `get_existing_links` as the eager, side-effecting API.
</issue_to_address>
### Comment 2
<location> `README.md:12` </location>
<code_context>
- Export GitHub stars to Linkwarden
- Export to existing collection or create new colletion
- Keep Linkwarden collection up-to-date with your GitHub stars using the docker image.
-
</code_context>
<issue_to_address>
**issue (typo):** Fix the spelling of "colletion" to "collection".
```suggestion
- Export to existing collection or create new collection
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
rtuszik
requested changes
Jan 11, 2026
Owner
rtuszik
left a comment
There was a problem hiding this comment.
Thanks for this PR!
Please address my comments and current failing pytest/lint errors so I can merge this.
Comment on lines
107
to
+195
| @@ -69,6 +134,18 @@ def get_collections(linkwarden_url, linkwarden_token): | |||
|
|
|||
|
|
|||
| def create_collection(linkwarden_url, linkwarden_token, name, description=""): | |||
| """ | |||
| Create a new collection in Linkwarden. | |||
|
|
|||
| Args: | |||
| linkwarden_url: Base URL of the Linkwarden instance | |||
| linkwarden_token: Authorization token for the API | |||
| name: Name of the collection to create | |||
| description: Optional description for the collection (default: empty string) | |||
|
|
|||
| Returns: | |||
| dict: The created collection object if successful, None otherwise | |||
| """ | |||
| url = f"{linkwarden_url.rstrip('/')}/api/v1/collections" | |||
| headers = { | |||
| "Authorization": f"Bearer {linkwarden_token}", | |||
| @@ -101,6 +178,21 @@ def create_collection(linkwarden_url, linkwarden_token, name, description=""): | |||
|
|
|||
|
|
|||
| def upload_link(linkwarden_url, linkwarden_token, collection_id, repo, tags): | |||
| """ | |||
| Upload a repository as a link to Linkwarden. | |||
|
|
|||
| Args: | |||
| linkwarden_url: Base URL of the Linkwarden instance | |||
| linkwarden_token: Authorization token for the API | |||
| collection_id: ID of the collection to add the link to | |||
| repo: Repository object containing repository information | |||
| tags: List of tags to associate with the link | |||
|
|
|||
| Returns: | |||
| str or None: The ID of the created link if successful, | |||
| LINK_EXISTS_GLOBALLY if link already exists, | |||
| None if there was an error | |||
| """ | |||
Owner
There was a problem hiding this comment.
Please get rid of all AI docstring comments added. They don't add any value and are inconsistent with the rest of the codebase.
Contributor
Author
There was a problem hiding this comment.
will send new PR tomorrow 👍
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug fix: #79 - Failed to get all the existing links when entire response is filled with duplicate
Edit: updated get_existing_links to use new search api and changed cursor logic to use nextCursor from response.
new api https://docs.linkwarden.app/api/search-links
old links api deprecated 6 months ago
linkwarden/docs@4846ad7
ADD: delete_links function to delete all duplicate links (OPT_DELETE_DUPLICATE).
ADD: DEBUG enviroment variable to replicate -d.