Bug fix: #79 api update and duplicate remove option by PawsFunctions · Pull Request #80 · rtuszik/starwarden

PawsFunctions · 2026-01-09T04:16:05Z

Bug fix: #79 - Failed to get all the existing links when entire response is filled with duplicate

Edit: updated get_existing_links to use new search api and changed cursor logic to use nextCursor from response.
new api https://docs.linkwarden.app/api/search-links
old links api deprecated 6 months ago
linkwarden/docs@4846ad7

ADD: delete_links function to delete all duplicate links (OPT_DELETE_DUPLICATE).
ADD: DEBUG enviroment variable to replicate -d.

…nse is filled with duplicate Edit: updated get_existing_links to use new search api and changed cursor logic to use nextCursor from response. ADD: delete_links function to delete all duplicate links (OPT_DELETE_DUPLICATE). ADD: DEBUG enviroment variable to replicate -d.

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

In delete_links, you catch Timeout but it’s not defined in this file unless imported elsewhere; consider either importing requests.exceptions.Timeout explicitly or catching requests.Timeout/requests.RequestException consistently to avoid a runtime NameError.
get_existing_links performs duplicate deletion only after the generator has been fully consumed, which makes the side effect dependent on the caller exhausting the iterator; consider exposing duplicate cleanup as an explicit function or making the behavior non-lazy to avoid surprising partial-iteration behavior.
When duplicate deletion is enabled, logging the full list of duplicate IDs at info level (duplicate_link_ids) can become very noisy for large collections; consider logging only counts or truncating the list, and reserve detailed IDs for debug logging.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `delete_links`, you catch `Timeout` but it’s not defined in this file unless imported elsewhere; consider either importing `requests.exceptions.Timeout` explicitly or catching `requests.Timeout`/`requests.RequestException` consistently to avoid a runtime `NameError`.
- `get_existing_links` performs duplicate deletion only after the generator has been fully consumed, which makes the side effect dependent on the caller exhausting the iterator; consider exposing duplicate cleanup as an explicit function or making the behavior non-lazy to avoid surprising partial-iteration behavior.
- When duplicate deletion is enabled, logging the full list of duplicate IDs at info level (`duplicate_link_ids`) can become very noisy for large collections; consider logging only counts or truncating the list, and reserve detailed IDs for debug logging.

## Individual Comments

### Comment 1
<location> `starwarden/linkwarden_api.py:13-22` </location>
<code_context>

-def get_existing_links(linkwarden_url, linkwarden_token, collection_id):
-    url = f"{linkwarden_url.rstrip('/')}/api/v1/links"
+def get_existing_links(linkwarden_url, linkwarden_token, collection_id, delete_duplicate=False):
+    """
+    Get existing links from a collection with optional duplicate detection and deletion.
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Duplicate deletion behavior depends on the caller fully exhausting the generator, which could be surprising.

Because duplicate detection and `delete_links` run only after the `while` loop finishes, they only occur if the caller fully consumes the generator. Any caller that stops early (e.g., via `break`) will never trigger deletion, even with `delete_duplicate=True`. To avoid this implicit coupling, consider either making `get_existing_links` eager (non-generator) or moving duplicate deletion into a separate, explicit API.

Suggested implementation:

```python
def get_existing_links(linkwarden_url, linkwarden_token, collection_id, delete_duplicate=False):
    """
    Get all existing links from a collection, with optional duplicate detection and deletion.

    This function eagerly fetches all pages of links into memory before performing
    any duplicate detection or deletion. This guarantees that duplicate handling
    does not depend on the caller exhausting a generator.

    Args:
        linkwarden_url: Base URL of Linkwarden instance.
        linkwarden_token: Authorization token for the API.
        collection_id: Collection ID to fetch links from.
        delete_duplicate: If True, automatically delete duplicate links found.

    Returns:
        A list of all link URLs (both unique and duplicates). When
        delete_duplicate=True, the list still contains the URLs as originally
        returned by the API; duplicates are deleted after collection, not filtered
        out of the returned list.

```

To fully implement the eager behavior and remove the dependency on exhausting a generator, you should also:

1. Inside `get_existing_links`:
   - Initialize an in-memory list before the paging loop, e.g. `all_links = []`.
   - Replace any `yield` statements in the `while`/paging loop with code that appends to this list (e.g. `all_links.append(url)` or `all_links.extend(urls)`).
   - Ensure the paging loop runs to completion unconditionally (i.e., do not return early from inside the loop).
2. After the paging loop finishes:
   - Run your duplicate detection on `all_links` to determine which links are duplicates.
   - If `delete_duplicate` is True, call your existing deletion helper (e.g. `delete_links(...)`) with the set/list of duplicate link IDs.
3. At the end of the function:
   - `return all_links` instead of using a generator return pattern.
4. If there are callers that rely on streaming behavior for very large collections, consider:
   - Introducing a separate streaming helper (e.g. `iter_existing_links(...)`) that exposes the generator behavior but does **not** perform any deletion, while keeping `get_existing_links` as the eager, side-effecting API.
</issue_to_address>

### Comment 2
<location> `README.md:12` </location>
<code_context>
 - Export GitHub stars to Linkwarden
 - Export to existing collection or create new colletion
 - Keep Linkwarden collection up-to-date with your GitHub stars using the docker image.
-
</code_context>

<issue_to_address>
**issue (typo):** Fix the spelling of "colletion" to "collection".

```suggestion
- Export to existing collection or create new collection
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

starwarden/linkwarden_api.py

README.md

rtuszik

Thanks for this PR!

Please address my comments and current failing pytest/lint errors so I can merge this.

starwarden/linkwarden_api.py

rtuszik · 2026-01-11T12:10:28Z

starwarden/linkwarden_api.py

@@ -69,6 +134,18 @@ def get_collections(linkwarden_url, linkwarden_token):


 def create_collection(linkwarden_url, linkwarden_token, name, description=""):
+    """
+    Create a new collection in Linkwarden.
+
+    Args:
+        linkwarden_url: Base URL of the Linkwarden instance
+        linkwarden_token: Authorization token for the API
+        name: Name of the collection to create
+        description: Optional description for the collection (default: empty string)
+
+    Returns:
+        dict: The created collection object if successful, None otherwise
+    """
    url = f"{linkwarden_url.rstrip('/')}/api/v1/collections"
    headers = {
        "Authorization": f"Bearer {linkwarden_token}",
@@ -101,6 +178,21 @@ def create_collection(linkwarden_url, linkwarden_token, name, description=""):


 def upload_link(linkwarden_url, linkwarden_token, collection_id, repo, tags):
+    """
+    Upload a repository as a link to Linkwarden.
+
+    Args:
+        linkwarden_url: Base URL of the Linkwarden instance
+        linkwarden_token: Authorization token for the API
+        collection_id: ID of the collection to add the link to
+        repo: Repository object containing repository information
+        tags: List of tags to associate with the link
+
+    Returns:
+        str or None: The ID of the created link if successful, 
+                    LINK_EXISTS_GLOBALLY if link already exists,
+                    None if there was an error
+    """


Please get rid of all AI docstring comments added. They don't add any value and are inconsistent with the rest of the codebase.

will send new PR tomorrow 👍

sourcery-ai bot reviewed Jan 9, 2026

View reviewed changes

starwarden/linkwarden_api.py Show resolved Hide resolved

README.md Show resolved Hide resolved

rtuszik requested changes Jan 11, 2026

View reviewed changes

Merge branch 'rtuszik:main' into main

6ed25d9

PawsFunctions closed this by deleting the head repository Jan 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fix: #79 api update and duplicate remove option#80

Bug fix: #79 api update and duplicate remove option#80
PawsFunctions wants to merge 2 commits intortuszik:mainfrom
PawsFunctions:main

PawsFunctions commented Jan 9, 2026

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

rtuszik left a comment

Uh oh!

Uh oh!

rtuszik Jan 11, 2026

Uh oh!

PawsFunctions Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

PawsFunctions commented Jan 9, 2026

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rtuszik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rtuszik Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

PawsFunctions Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants