-
Notifications
You must be signed in to change notification settings - Fork 5
Bug fix: #79 api update and duplicate remove option #80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…e response is filled with duplicate
Edit: updated get_existing_links to use new search api and changed cursor
logic to use nextCursor from response.
ADD: delete_links function to delete all duplicate links (OPT_DELETE_DUPLICATE).
ADD: DEBUG enviroment variable to replicate -d.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 2 issues, and left some high level feedback:
- In
delete_links, you catchTimeoutbut it’s not defined in this file unless imported elsewhere; consider either importingrequests.exceptions.Timeoutexplicitly or catchingrequests.Timeout/requests.RequestExceptionconsistently to avoid a runtimeNameError. get_existing_linksperforms duplicate deletion only after the generator has been fully consumed, which makes the side effect dependent on the caller exhausting the iterator; consider exposing duplicate cleanup as an explicit function or making the behavior non-lazy to avoid surprising partial-iteration behavior.- When duplicate deletion is enabled, logging the full list of duplicate IDs at info level (
duplicate_link_ids) can become very noisy for large collections; consider logging only counts or truncating the list, and reserve detailed IDs for debug logging.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `delete_links`, you catch `Timeout` but it’s not defined in this file unless imported elsewhere; consider either importing `requests.exceptions.Timeout` explicitly or catching `requests.Timeout`/`requests.RequestException` consistently to avoid a runtime `NameError`.
- `get_existing_links` performs duplicate deletion only after the generator has been fully consumed, which makes the side effect dependent on the caller exhausting the iterator; consider exposing duplicate cleanup as an explicit function or making the behavior non-lazy to avoid surprising partial-iteration behavior.
- When duplicate deletion is enabled, logging the full list of duplicate IDs at info level (`duplicate_link_ids`) can become very noisy for large collections; consider logging only counts or truncating the list, and reserve detailed IDs for debug logging.
## Individual Comments
### Comment 1
<location> `starwarden/linkwarden_api.py:13-22` </location>
<code_context>
-def get_existing_links(linkwarden_url, linkwarden_token, collection_id):
- url = f"{linkwarden_url.rstrip('/')}/api/v1/links"
+def get_existing_links(linkwarden_url, linkwarden_token, collection_id, delete_duplicate=False):
+ """
+ Get existing links from a collection with optional duplicate detection and deletion.
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Duplicate deletion behavior depends on the caller fully exhausting the generator, which could be surprising.
Because duplicate detection and `delete_links` run only after the `while` loop finishes, they only occur if the caller fully consumes the generator. Any caller that stops early (e.g., via `break`) will never trigger deletion, even with `delete_duplicate=True`. To avoid this implicit coupling, consider either making `get_existing_links` eager (non-generator) or moving duplicate deletion into a separate, explicit API.
Suggested implementation:
```python
def get_existing_links(linkwarden_url, linkwarden_token, collection_id, delete_duplicate=False):
"""
Get all existing links from a collection, with optional duplicate detection and deletion.
This function eagerly fetches all pages of links into memory before performing
any duplicate detection or deletion. This guarantees that duplicate handling
does not depend on the caller exhausting a generator.
Args:
linkwarden_url: Base URL of Linkwarden instance.
linkwarden_token: Authorization token for the API.
collection_id: Collection ID to fetch links from.
delete_duplicate: If True, automatically delete duplicate links found.
Returns:
A list of all link URLs (both unique and duplicates). When
delete_duplicate=True, the list still contains the URLs as originally
returned by the API; duplicates are deleted after collection, not filtered
out of the returned list.
```
To fully implement the eager behavior and remove the dependency on exhausting a generator, you should also:
1. Inside `get_existing_links`:
- Initialize an in-memory list before the paging loop, e.g. `all_links = []`.
- Replace any `yield` statements in the `while`/paging loop with code that appends to this list (e.g. `all_links.append(url)` or `all_links.extend(urls)`).
- Ensure the paging loop runs to completion unconditionally (i.e., do not return early from inside the loop).
2. After the paging loop finishes:
- Run your duplicate detection on `all_links` to determine which links are duplicates.
- If `delete_duplicate` is True, call your existing deletion helper (e.g. `delete_links(...)`) with the set/list of duplicate link IDs.
3. At the end of the function:
- `return all_links` instead of using a generator return pattern.
4. If there are callers that rely on streaming behavior for very large collections, consider:
- Introducing a separate streaming helper (e.g. `iter_existing_links(...)`) that exposes the generator behavior but does **not** perform any deletion, while keeping `get_existing_links` as the eager, side-effecting API.
</issue_to_address>
### Comment 2
<location> `README.md:12` </location>
<code_context>
- Export GitHub stars to Linkwarden
- Export to existing collection or create new colletion
- Keep Linkwarden collection up-to-date with your GitHub stars using the docker image.
-
</code_context>
<issue_to_address>
**issue (typo):** Fix the spelling of "colletion" to "collection".
```suggestion
- Export to existing collection or create new collection
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
rtuszik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR!
Please address my comments and current failing pytest/lint errors so I can merge this.
| """ | ||
| Fetch all collections from the Linkwarden instance. | ||
| Args: | ||
| linkwarden_url: Base URL of the Linkwarden instance | ||
| linkwarden_token: Authorization token for the API | ||
| Returns: | ||
| list: List of collection objects | ||
| Raises: | ||
| Exception: If there's an error fetching collections from the API | ||
| """ | ||
| url = f"{linkwarden_url.rstrip('/')}/api/v1/collections" | ||
| headers = { | ||
| "Authorization": f"Bearer {linkwarden_token}", | ||
| "Content-Type": "application/json", | ||
| } | ||
| try: | ||
| response = requests.get(url, headers=headers, timeout=30) | ||
| response.raise_for_status() | ||
| data = response.json() | ||
| collections = data.get("response", []) | ||
| logger.debug(f"Fetched collections: {json.dumps(collections, indent=2)}") | ||
| return collections | ||
| except requests.RequestException as e: | ||
| raise Exception(f"Error fetching collections from Linkwarden: {str(e)}") from e | ||
|
|
||
|
|
||
| def create_collection(linkwarden_url, linkwarden_token, name, description=""): | ||
| """ | ||
| Create a new collection in Linkwarden. | ||
| Args: | ||
| linkwarden_url: Base URL of the Linkwarden instance | ||
| linkwarden_token: Authorization token for the API | ||
| name: Name of the collection to create | ||
| description: Optional description for the collection (default: empty string) | ||
| Returns: | ||
| dict: The created collection object if successful, None otherwise | ||
| """ | ||
| url = f"{linkwarden_url.rstrip('/')}/api/v1/collections" | ||
| headers = { | ||
| "Authorization": f"Bearer {linkwarden_token}", | ||
| "Content-Type": "application/json", | ||
| } | ||
| data = {"name": name, "description": description} | ||
| try: | ||
| logger.debug(f"Attempting to create collection: {name}") | ||
| response = requests.post( | ||
| url, | ||
| headers=headers, | ||
| json=data, | ||
| timeout=30, | ||
| ) | ||
| logger.debug(f"Response status code: {response.status_code}") | ||
| logger.debug(f"Response content: {response.text}") | ||
| response.raise_for_status() | ||
| created_collection = response.json().get("response", {}) | ||
| logger.info(f"Created collection: {created_collection}") | ||
| return created_collection | ||
| except Timeout: | ||
| logger.error("Request timed out while creating new collection in Linkwarden") | ||
| return None | ||
| except requests.RequestException as e: | ||
| logger.error(f"Error creating new collection in Linkwarden: {str(e)}") | ||
| if hasattr(e, "response") and e.response is not None: | ||
| logger.error(f"Response status code: {e.response.status_code}") | ||
| logger.error(f"Response content: {e.response.text}") | ||
| return None | ||
|
|
||
|
|
||
| def upload_link(linkwarden_url, linkwarden_token, collection_id, repo, tags): | ||
| """ | ||
| Upload a repository as a link to Linkwarden. | ||
| Args: | ||
| linkwarden_url: Base URL of the Linkwarden instance | ||
| linkwarden_token: Authorization token for the API | ||
| collection_id: ID of the collection to add the link to | ||
| repo: Repository object containing repository information | ||
| tags: List of tags to associate with the link | ||
| Returns: | ||
| str or None: The ID of the created link if successful, | ||
| LINK_EXISTS_GLOBALLY if link already exists, | ||
| None if there was an error | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please get rid of all AI docstring comments added. They don't add any value and are inconsistent with the rest of the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will send new PR tomorrow 👍
Bug fix: #79 - Failed to get all the existing links when entire response is filled with duplicate
Edit: updated get_existing_links to use new search api and changed cursor logic to use nextCursor from response.
new api https://docs.linkwarden.app/api/search-links
old links api deprecated 6 months ago
linkwarden/docs@4846ad7
ADD: delete_links function to delete all duplicate links (OPT_DELETE_DUPLICATE).
ADD: DEBUG enviroment variable to replicate -d.