Conversation
WalkthroughRefactors Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant CLI as argparse CLI
participant RD as RepositoryDownloader
participant GH as GitHub (HTTP)
participant GIT as git subprocess
rect rgb(240,248,255)
User->>CLI: run script (--nrepos, --output-dir)
CLI->>RD: instantiate(output, url, timeout)
RD->>RD: setup_directories()
end
rect rgb(245,255,240)
RD->>GH: GET trending page
GH-->>RD: HTML response
RD->>RD: parse & normalize repo URLs (ensure full GitHub .git)
end
loop each repo (up to max_repositories)
RD->>RD: derive owner and repo_name
alt repo not present locally
RD->>GIT: git clone <repo_url> into output/owner/repo
GIT-->>RD: success / error
RD->>RD: record result
else repo exists
RD->>RD: skip clone (log)
end
end
RD-->>CLI: finish (summary)
CLI-->>User: exit
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (1)
scripts/01-fetch-github.py (1)
74-77: URL parsing logic is fragile.The string manipulation approach to extract owner and repo name is brittle. It assumes a specific URL format and doesn't handle edge cases like trailing slashes, different URL schemes, or malformed URLs.
Consider using
urllib.parseor a regex pattern for more robust URL parsing:+ from urllib.parse import urlparse + - path_parts = repo_url.replace('https://github.com/', '').replace('.git', '').split('/') - if len(path_parts) < 2: - continue - owner, repo_name = path_parts[0], path_parts[1] + parsed = urlparse(repo_url) + path_parts = parsed.path.strip('/').replace('.git', '').split('/') + if len(path_parts) != 2: + print(f"Skipping malformed URL: {repo_url}") + continue + owner, repo_name = path_parts[0], path_parts[1]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
scripts/01-fetch-github.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
scripts/01-fetch-github.py (1)
stubs/bs4/__init__.pyi (1)
BeautifulSoup(7-37)
🪛 GitHub Actions: flake8
scripts/01-fetch-github.py
[error] 5-5: flake8: F401 'os' imported but unused
[error] 15-15: flake8: W291 trailing whitespace
[error] 20-20: flake8: W293 blank line contains whitespace
[error] 24-24: flake8: W293 blank line contains whitespace
[error] 33-33: flake8: W291 trailing whitespace
[error] 36-36: flake8: W291 trailing whitespace
[error] 38-38: flake8: W293 blank line contains whitespace
[error] 41-41: flake8: W291 trailing whitespace
[error] 44-44: flake8: W291 trailing whitespace
[error] 58-58: flake8: W291 trailing whitespace
[error] 62-62: flake8: W293 blank line contains whitespace
[error] 64-64: flake8: W291 trailing whitespace
[error] 69-69: flake8: W291 trailing whitespace
[error] 70-70: flake8: W291 trailing whitespace
[error] 73-73: flake8: W291 trailing whitespace
[error] 76-76: flake8: W291 trailing whitespace
[error] 77-77: flake8: W291 trailing whitespace
[error] 78-78: flake8: W291 trailing whitespace
[error] 80-80: flake8: W291 trailing whitespace
[error] 107-107: flake8: W291 trailing whitespace
[error] 1-1: flake8: General lint failure causing pipeline exit
🪛 GitHub Actions: pylint
scripts/01-fetch-github.py
[error] 15-15: Trailing whitespace (trailing-whitespace) in line 15
[error] 20-20: Trailing whitespace (trailing-whitespace) in line 20
[error] 24-24: Trailing whitespace (trailing-whitespace) in line 24
[error] 33-33: Trailing whitespace (trailing-whitespace) in line 33
[error] 36-36: Trailing whitespace (trailing-whitespace) in line 36
[error] 38-38: Trailing whitespace (trailing-whitespace) in line 38
[error] 41-41: Trailing whitespace (trailing-whitespace) in line 41
[error] 44-44: Trailing whitespace (trailing-whitespace) in line 44
[error] 58-58: Trailing whitespace (trailing-whitespace) in line 58
[error] 62-62: Trailing whitespace (trailing-whitespace) in line 62
[error] 64-64: Trailing whitespace (trailing-whitespace) in line 64
[error] 69-69: Trailing whitespace (trailing-whitespace) in line 69
[error] 70-70: Trailing whitespace (trailing-whitespace) in line 70
[error] 73-73: Trailing whitespace (trailing-whitespace) in line 73
[error] 76-76: Trailing whitespace (trailing-whitespace) in line 76
[error] 77-77: Trailing whitespace (trailing-whitespace) in line 77
[error] 78-78: Trailing whitespace (trailing-whitespace) in line 78
[error] 80-80: Trailing whitespace (trailing-whitespace) in line 80
[error] 107-107: Trailing whitespace (trailing-whitespace) in line 107
[error] 5-5: Unused import 'os' (unused-import) detected by pylint
🪛 GitHub Actions: ruff
scripts/01-fetch-github.py
[error] 5-5: F401: 'os' imported but unused
[error] 16-16: Q000: Double quotes found but single quotes preferred
[error] 18-18: Q000: Double quotes found but single quotes preferred
[error] 30-30: Q000: Double quotes found but single quotes preferred
[error] 36-36: Q000: Double quotes found but single quotes preferred
[error] 43-43: Q000: Double quotes found but single quotes preferred
[error] 54-54: Q000: Double quotes found but single quotes preferred
[error] 57-57: Q000: Double quotes found but single quotes preferred
[error] 60-60: Q000: Double quotes found but single quotes preferred
[error] 64-64: Q000: Double quotes found but single quotes preferred
[error] 68-68: Q000: Double quotes found but single quotes preferred
[error] 78-78: Q000: Double quotes found but single quotes preferred
[error] 81-81: Q000: Double quotes found but single quotes preferred
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: make
- GitHub Check: latexmk
🔇 Additional comments (5)
scripts/01-fetch-github.py (5)
16-23: Good use of Path and sensible defaults.The initialization and setup logic is clean. Using
pathlib.Pathfor directory operations and providing sensible defaults for output directory and timeout is a solid choice.
39-61: Solid clone implementation with good error handling.The repository cloning logic is well-structured:
- Creates owner directories as needed
- Checks for existing repositories to avoid redundant clones
- Captures and reports both subprocess failures and exceptions
- Returns boolean status for caller to track success
84-103: Clean argument parsing with sensible defaults.The
parse_argumentsfunction is well-structured with clear help text and appropriate defaults for both the number of repositories and output directory.
106-110: Clean main function orchestration.The
mainfunction effectively wires together argument parsing, downloader instantiation, and execution flow in a clear, linear manner.
113-114: Proper main entry point.The
if __name__ == '__main__'block correctly calls themain()function, following Python best practices.
scripts/01-fetch-github.py
Outdated
| print(f"Error fetching trending repositories: {e}", file=sys.stderr) | ||
| raise | ||
| soup = BeautifulSoup(response.text, 'html.parser') | ||
| repositories = [] |
There was a problem hiding this comment.
Remove trailing whitespace.
Lines 33, 36, and 38 have trailing whitespace causing linter failures.
Also applies to: 36-36, 38-38
🧰 Tools
🪛 GitHub Actions: flake8
[error] 33-33: flake8: W291 trailing whitespace
🪛 GitHub Actions: pylint
[error] 33-33: Trailing whitespace (trailing-whitespace) in line 33
🤖 Prompt for AI Agents
In scripts/01-fetch-github.py around lines 33, 36 and 38, there are trailing
spaces causing linter failures; remove the trailing whitespace at the end of
those lines (and any other accidental trailing spaces in the file), save the
file, and re-run the linter to confirm the warnings are resolved.
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
scripts/01-fetch-github.py (2)
15-19: Use a Requests Session and set a User-Agent.Improves connection reuse and avoids UA-related blocks. Add Session in init and use it.
def __init__(self, output_dir: str = "target/01"): self.output_dir = Path(output_dir) - self.trending_url = "https://github.com/trending/java?since=daily" + self.trending_url = 'https://github.com/trending/java?since=daily' self.request_timeout = 30 + self.session = requests.Session() + self.session.headers.update( + {'User-Agent': 'aibolit/01-fetch-github (+https://github.com/cqfn/aibolit)'} + ) @@ - response = requests.get(self.trending_url, timeout=self.request_timeout) + response = self.session.get(self.trending_url, timeout=self.request_timeout)Based on learnings
Also applies to: 26-26
83-102: Validate CLI: ensure --nrepos is non-negative.Prevents surprising early exit with negative values.
- parser.add_argument( + parser.add_argument( '--nrepos', - type=int, + type=positive_int, required=False, default=100, help='Number of repositories to download (default: 100)' )Add helper (outside this range):
def positive_int(value: str) -> int: ivalue = int(value) if ivalue < 0: raise argparse.ArgumentTypeError('nrepos must be >= 0') return ivalue
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
scripts/01-fetch-github.py(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: flake8
scripts/01-fetch-github.py
[error] 14-14: flake8: W291 trailing whitespace.
[error] 19-19: flake8: W293 blank line contains whitespace.
[error] 23-23: flake8: W293 blank line contains whitespace.
[error] 32-32: flake8: W291 trailing whitespace.
[error] 35-35: flake8: W291 trailing whitespace.
[error] 37-37: flake8: W293 blank line contains whitespace.
[error] 40-40: flake8: W291 trailing whitespace.
[error] 43-43: flake8: W291 trailing whitespace.
[error] 57-57: flake8: W291 trailing whitespace.
[error] 61-61: flake8: W293 blank line contains whitespace.
[error] 63-63: flake8: W291 trailing whitespace.
[error] 68-68: flake8: W291 trailing whitespace.
[error] 69-69: flake8: W291 trailing whitespace.
[error] 72-72: flake8: W291 trailing whitespace.
[error] 75-75: flake8: W291 trailing whitespace.
[error] 76-76: flake8: W291 trailing whitespace.
[error] 79-79: flake8: W291 trailing whitespace.
[error] 106-106: flake8: W291 trailing whitespace.
🪛 GitHub Actions: pylint
scripts/01-fetch-github.py
[error] 14-14: Trailing whitespace (trailing-whitespace)
[error] 19-19: Trailing whitespace (trailing-whitespace)
[error] 23-23: Trailing whitespace (trailing-whitespace)
[error] 32-32: Trailing whitespace (trailing-whitespace)
[error] 35-35: Trailing whitespace (trailing-whitespace)
[error] 37-37: Trailing whitespace (trailing-whitespace)
[error] 40-40: Trailing whitespace (trailing-whitespace)
[error] 43-43: Trailing whitespace (trailing-whitespace)
[error] 57-57: Trailing whitespace (trailing-whitespace)
[error] 61-61: Trailing whitespace (trailing-whitespace)
[error] 63-63: Trailing whitespace (trailing-whitespace)
[error] 68-68: Trailing whitespace (trailing-whitespace)
[error] 69-69: Trailing whitespace (trailing-whitespace)
[error] 72-72: Trailing whitespace (trailing-whitespace)
[error] 75-75: Trailing whitespace (trailing-whitespace)
[error] 76-76: Trailing whitespace (trailing-whitespace)
[error] 77-77: Trailing whitespace (trailing-whitespace)
[error] 79-79: Trailing whitespace (trailing-whitespace)
[error] 106-106: Trailing whitespace (trailing-whitespace)
🪛 GitHub Actions: ruff
scripts/01-fetch-github.py
[error] 15-15: Q000: Double quotes found but single quotes preferred
[error] 17-17: Q000: Double quotes found but single quotes preferred
[error] 29-29: Q000: Double quotes found but single quotes preferred
[error] 35-35: Q000: Double quotes found but single quotes preferred
[error] 42-42: Q000: Double quotes found but single quotes preferred
[error] 53-53: Q000: Double quotes found but single quotes preferred
[error] 56-56: Q000: Double quotes found but single quotes preferred
[error] 59-59: Q000: Double quotes found but single quotes preferred
[error] 63-63: Q000: Double quotes found but single quotes preferred
[error] 67-67: Q000: Double quotes found but single quotes preferred
[error] 77-77: Q000: Double quotes found but single quotes preferred
[error] 80-80: Q000: Double quotes found but single quotes preferred
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: latexmk
- GitHub Check: make
🔇 Additional comments (2)
scripts/01-fetch-github.py (2)
14-23: Nice refactor to a class with Path-based dirs.Cleaner structure and directory setup look good.
10-11: Dependencies are declared and pinned
beautifulsoup4==4.14.2andrequests==2.32.5are pinned inpyproject.toml.
scripts/01-fetch-github.py
Outdated
| def clone_repository(self, repo_url: str, owner: str, repo_name: str) -> bool: | ||
| owner_dir = self.output_dir / owner | ||
| owner_dir.mkdir(exist_ok=True) | ||
| if (owner_dir / repo_name).exists(): | ||
| print(f"Repository {owner}/{repo_name} already exists, skipping...") | ||
| return True | ||
| try: | ||
| result = subprocess.run( | ||
| ['git', 'clone', repo_url], | ||
| cwd=owner_dir, | ||
| capture_output=True, | ||
| text=True, | ||
| check=False | ||
| ) | ||
| if result.returncode == 0: | ||
| print(f"Successfully cloned {owner}/{repo_name}") | ||
| return True | ||
| else: | ||
| print(f"Failed to clone {owner}/{repo_name}: {result.stderr}") | ||
| return False | ||
| except subprocess.SubprocessError as e: | ||
| print(f"Error cloning {owner}/{repo_name}: {e}", file=sys.stderr) | ||
| return False | ||
|
|
There was a problem hiding this comment.
Harden cloning: shallow clone, timeout, parents=True, and handle missing git.
Prevents hangs, reduces bandwidth, and avoids crashes when git is absent.
def clone_repository(self, repo_url: str, owner: str, repo_name: str) -> bool:
owner_dir = self.output_dir / owner
- owner_dir.mkdir(exist_ok=True)
+ owner_dir.mkdir(parents=True, exist_ok=True)
if (owner_dir / repo_name).exists():
- print(f"Repository {owner}/{repo_name} already exists, skipping...")
+ print(f'Repository {owner}/{repo_name} already exists, skipping...')
return True
try:
result = subprocess.run(
- ['git', 'clone', repo_url],
+ ['git', 'clone', '--depth', '1', repo_url],
cwd=owner_dir,
capture_output=True,
text=True,
- check=False
+ check=False,
+ timeout=300,
)
if result.returncode == 0:
- print(f"Successfully cloned {owner}/{repo_name}")
+ print(f'Successfully cloned {owner}/{repo_name}')
return True
else:
- print(f"Failed to clone {owner}/{repo_name}: {result.stderr}")
+ print(f'Failed to clone {owner}/{repo_name}: {result.stderr}')
return False
+ except FileNotFoundError:
+ print('git not found on PATH; cannot clone repositories.', file=sys.stderr)
+ return False
except subprocess.SubprocessError as e:
- print(f"Error cloning {owner}/{repo_name}: {e}", file=sys.stderr)
+ print(f'Error cloning {owner}/{repo_name}: {e}', file=sys.stderr)
return False📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def clone_repository(self, repo_url: str, owner: str, repo_name: str) -> bool: | |
| owner_dir = self.output_dir / owner | |
| owner_dir.mkdir(exist_ok=True) | |
| if (owner_dir / repo_name).exists(): | |
| print(f"Repository {owner}/{repo_name} already exists, skipping...") | |
| return True | |
| try: | |
| result = subprocess.run( | |
| ['git', 'clone', repo_url], | |
| cwd=owner_dir, | |
| capture_output=True, | |
| text=True, | |
| check=False | |
| ) | |
| if result.returncode == 0: | |
| print(f"Successfully cloned {owner}/{repo_name}") | |
| return True | |
| else: | |
| print(f"Failed to clone {owner}/{repo_name}: {result.stderr}") | |
| return False | |
| except subprocess.SubprocessError as e: | |
| print(f"Error cloning {owner}/{repo_name}: {e}", file=sys.stderr) | |
| return False | |
| def clone_repository(self, repo_url: str, owner: str, repo_name: str) -> bool: | |
| owner_dir = self.output_dir / owner | |
| owner_dir.mkdir(parents=True, exist_ok=True) | |
| if (owner_dir / repo_name).exists(): | |
| print(f'Repository {owner}/{repo_name} already exists, skipping...') | |
| return True | |
| try: | |
| result = subprocess.run( | |
| ['git', 'clone', '--depth', '1', repo_url], | |
| cwd=owner_dir, | |
| capture_output=True, | |
| text=True, | |
| check=False, | |
| timeout=300, | |
| ) | |
| if result.returncode == 0: | |
| print(f'Successfully cloned {owner}/{repo_name}') | |
| return True | |
| else: | |
| print(f'Failed to clone {owner}/{repo_name}: {result.stderr}') | |
| return False | |
| except FileNotFoundError: | |
| print('git not found on PATH; cannot clone repositories.', file=sys.stderr) | |
| return False | |
| except subprocess.SubprocessError as e: | |
| print(f'Error cloning {owner}/{repo_name}: {e}', file=sys.stderr) | |
| return False |
🧰 Tools
🪛 GitHub Actions: flake8
[error] 40-40: flake8: W291 trailing whitespace.
[error] 43-43: flake8: W291 trailing whitespace.
[error] 57-57: flake8: W291 trailing whitespace.
[error] 61-61: flake8: W293 blank line contains whitespace.
🪛 GitHub Actions: pylint
[error] 40-40: Trailing whitespace (trailing-whitespace)
[error] 43-43: Trailing whitespace (trailing-whitespace)
[error] 57-57: Trailing whitespace (trailing-whitespace)
[error] 61-61: Trailing whitespace (trailing-whitespace)
🪛 GitHub Actions: ruff
[error] 42-42: Q000: Double quotes found but single quotes preferred
[error] 53-53: Q000: Double quotes found but single quotes preferred
[error] 56-56: Q000: Double quotes found but single quotes preferred
[error] 59-59: Q000: Double quotes found but single quotes preferred
🤖 Prompt for AI Agents
In scripts/01-fetch-github.py around lines 38 to 61, the clone_repository method
should be hardened: create owner_dir with owner_dir.mkdir(parents=True,
exist_ok=True); perform a shallow clone by adding '--depth', '1' to the git args
to reduce bandwidth and time; pass a timeout (e.g. timeout=60) to subprocess.run
and catch subprocess.TimeoutExpired to handle hangs; also catch
FileNotFoundError to produce a clear error when git is not installed; keep
checking result.returncode and surface result.stderr on failure. Ensure you add
those exception handlers and use the parents=True mkdir flag.
scripts/01-fetch-github.py
Outdated
| for repo_url in repositories: | ||
| if downloaded_count >= max_repositories: | ||
| break | ||
| path_parts = repo_url.replace('https://github.com/', '').replace('.git', '').split('/') | ||
| if len(path_parts) < 2: | ||
| continue | ||
| owner, repo_name = path_parts[0], path_parts[1] | ||
| print(f"Processing {owner}/{repo_name}...") | ||
| if self.clone_repository(repo_url, owner, repo_name): | ||
| downloaded_count += 1 |
There was a problem hiding this comment.
Parse owner/repo via urlparse for correctness across URL shapes.
String replace is brittle; use URL parsing.
- path_parts = repo_url.replace('https://github.com/', '').replace('.git', '').split('/')
- if len(path_parts) < 2:
- continue
- owner, repo_name = path_parts[0], path_parts[1]
- print(f"Processing {owner}/{repo_name}...")
+ parsed = urlparse(repo_url)
+ parts = [p for p in parsed.path.strip('/').split('/') if p]
+ if len(parts) < 2:
+ continue
+ owner, repo_name = parts[0], parts[1].split('.git', 1)[0]
+ print(f'Processing {owner}/{repo_name}...')
if self.clone_repository(repo_url, owner, repo_name):
downloaded_count += 1 Add import (outside this range):
from urllib.parse import urlparse🧰 Tools
🪛 GitHub Actions: flake8
[error] 72-72: flake8: W291 trailing whitespace.
[error] 75-75: flake8: W291 trailing whitespace.
[error] 76-76: flake8: W291 trailing whitespace.
[error] 79-79: flake8: W291 trailing whitespace.
🪛 GitHub Actions: pylint
[error] 72-72: Trailing whitespace (trailing-whitespace)
[error] 75-75: Trailing whitespace (trailing-whitespace)
[error] 76-76: Trailing whitespace (trailing-whitespace)
[error] 77-77: Trailing whitespace (trailing-whitespace)
[error] 79-79: Trailing whitespace (trailing-whitespace)
🪛 GitHub Actions: ruff
[error] 77-77: Q000: Double quotes found but single quotes preferred
🤖 Prompt for AI Agents
In scripts/01-fetch-github.py around lines 70 to 79, parsing owner/repo by
string replace is brittle; use urllib.parse.urlparse to handle different GitHub
URL shapes. Update the loop to parse repo_url with urlparse, extract the path,
strip leading/trailing slashes, split on '/' and validate there are at least two
segments, then assign owner and repo_name from those segments (remove a trailing
.git from repo_name if present). Also add the import "from urllib.parse import
urlparse" at the top of the file.
There was a problem hiding this comment.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
scripts/01-fetch-github.py(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: pylint
scripts/01-fetch-github.py
[error] 32-32: Pylint: Line too long (104/100) (C0301). Command 'make pylint' failed with exit code 2.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: latexmk
- GitHub Check: make
🔇 Additional comments (7)
scripts/01-fetch-github.py (7)
14-18: LGTM!The class initialization is well-structured with sensible defaults and proper use of
Pathfor filesystem operations.
20-21: LGTM!The directory setup correctly uses
parents=Trueto ensure intermediate directories are created.
23-42: Good error handling and URL normalization.The method correctly handles HTTP errors and normalizes repository URLs. The CSS selectors properly target GitHub Trending entries.
68-86: Approve the overall coordination logic.The method correctly orchestrates fetching and cloning with appropriate error handling and progress reporting. Once the URL parsing is fixed, this will be robust.
89-108: LGTM!The argument parsing is well-structured with clear descriptions and sensible defaults.
111-115: LGTM!The main function provides a clean entry point with proper initialization sequence.
118-119: LGTM!The entry point correctly uses the standard Python idiom.
| def clone_repository(self, repo_url: str, owner: str, repo_name: str) -> bool: | ||
| owner_dir = self.output_dir / owner | ||
| owner_dir.mkdir(exist_ok=True) | ||
| if (owner_dir / repo_name).exists(): | ||
| print(f'Repository {owner}/{repo_name} already exists, skipping...') | ||
| return True | ||
| try: | ||
| result = subprocess.run( | ||
| ['git', 'clone', repo_url], | ||
| cwd=owner_dir, | ||
| capture_output=True, | ||
| text=True, | ||
| check=False | ||
| ) | ||
| if result.returncode == 0: | ||
| print(f'Successfully cloned {owner}/{repo_name}') | ||
| return True | ||
| else: | ||
| print(f'Failed to clone {owner}/{repo_name}: {result.stderr}') | ||
| return False | ||
| except subprocess.SubprocessError as e: | ||
| print(f'Error cloning {owner}/{repo_name}: {e}', file=sys.stderr) | ||
| return False |
There was a problem hiding this comment.
Address unresolved hardening concerns from previous review.
The clone operation still lacks several critical improvements requested in the previous review:
- Line 46:
owner_dir.mkdir(exist_ok=True)should useparents=Trueto handle nested paths. - Lines 52: Missing
--depth 1flag for shallow cloning (reduces bandwidth and time). - Line 51: Missing
timeoutparameter onsubprocess.run(prevents hangs on unresponsive clones). - Line 64: Missing
FileNotFoundErrorhandler to gracefully handle missinggitexecutable.
Apply this diff to address all concerns:
def clone_repository(self, repo_url: str, owner: str, repo_name: str) -> bool:
owner_dir = self.output_dir / owner
- owner_dir.mkdir(exist_ok=True)
+ owner_dir.mkdir(parents=True, exist_ok=True)
if (owner_dir / repo_name).exists():
print(f'Repository {owner}/{repo_name} already exists, skipping...')
return True
try:
result = subprocess.run(
- ['git', 'clone', repo_url],
+ ['git', 'clone', '--depth', '1', repo_url],
cwd=owner_dir,
capture_output=True,
text=True,
- check=False
+ check=False,
+ timeout=300,
)
if result.returncode == 0:
print(f'Successfully cloned {owner}/{repo_name}')
return True
else:
print(f'Failed to clone {owner}/{repo_name}: {result.stderr}')
return False
+ except FileNotFoundError:
+ print('git not found on PATH; cannot clone repositories.', file=sys.stderr)
+ return False
+ except subprocess.TimeoutExpired:
+ print(f'Timeout cloning {owner}/{repo_name}', file=sys.stderr)
+ return False
except subprocess.SubprocessError as e:
print(f'Error cloning {owner}/{repo_name}: {e}', file=sys.stderr)
return False📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def clone_repository(self, repo_url: str, owner: str, repo_name: str) -> bool: | |
| owner_dir = self.output_dir / owner | |
| owner_dir.mkdir(exist_ok=True) | |
| if (owner_dir / repo_name).exists(): | |
| print(f'Repository {owner}/{repo_name} already exists, skipping...') | |
| return True | |
| try: | |
| result = subprocess.run( | |
| ['git', 'clone', repo_url], | |
| cwd=owner_dir, | |
| capture_output=True, | |
| text=True, | |
| check=False | |
| ) | |
| if result.returncode == 0: | |
| print(f'Successfully cloned {owner}/{repo_name}') | |
| return True | |
| else: | |
| print(f'Failed to clone {owner}/{repo_name}: {result.stderr}') | |
| return False | |
| except subprocess.SubprocessError as e: | |
| print(f'Error cloning {owner}/{repo_name}: {e}', file=sys.stderr) | |
| return False | |
| def clone_repository(self, repo_url: str, owner: str, repo_name: str) -> bool: | |
| owner_dir = self.output_dir / owner | |
| owner_dir.mkdir(parents=True, exist_ok=True) | |
| if (owner_dir / repo_name).exists(): | |
| print(f'Repository {owner}/{repo_name} already exists, skipping...') | |
| return True | |
| try: | |
| result = subprocess.run( | |
| ['git', 'clone', '--depth', '1', repo_url], | |
| cwd=owner_dir, | |
| capture_output=True, | |
| text=True, | |
| check=False, | |
| timeout=300, | |
| ) | |
| if result.returncode == 0: | |
| print(f'Successfully cloned {owner}/{repo_name}') | |
| return True | |
| else: | |
| print(f'Failed to clone {owner}/{repo_name}: {result.stderr}') | |
| return False | |
| except FileNotFoundError: | |
| print('git not found on PATH; cannot clone repositories.', file=sys.stderr) | |
| return False | |
| except subprocess.TimeoutExpired: | |
| print(f'Timeout cloning {owner}/{repo_name}', file=sys.stderr) | |
| return False | |
| except subprocess.SubprocessError as e: | |
| print(f'Error cloning {owner}/{repo_name}: {e}', file=sys.stderr) | |
| return False |
🤖 Prompt for AI Agents
In scripts/01-fetch-github.py around lines 44 to 66, the clone_repository
implementation needs hardening: create owner_dir with
owner_dir.mkdir(parents=True, exist_ok=True) to handle nested paths; run git
with a shallow clone flag by adding '--depth', '1' to the git clone args; pass a
sensible timeout to subprocess.run (e.g., timeout=60) to avoid hanging; and add
a separate except FileNotFoundError handler to log a clear error when the git
executable is missing (return False), keeping the existing SubprocessError
handler for other subprocess errors.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
scripts/01-fetch-github.py (1)
48-70: Critical hardening issues remain unaddressed from previous review.The
clone_repositorymethod still lacks several critical improvements that were requested in the previous review but have not been implemented:
- Line 50:
owner_dir.mkdir(exist_ok=True)is missingparents=True, which will cause failures if the owner path has nested directories.- Line 56: The git clone command is missing
--depth 1for shallow cloning, wasting bandwidth and time by cloning full repository history.- Line 55:
subprocess.runis missing thetimeoutparameter, allowing the process to hang indefinitely on unresponsive remotes.- Line 68: Only catches
SubprocessError, but missing aFileNotFoundErrorhandler to provide a clear error message when thegitexecutable is not installed.Apply this diff to address all unresolved hardening concerns:
def clone_repository(self, repo_url: str, owner: str, repo_name: str) -> bool: owner_dir = self.output_dir / owner - owner_dir.mkdir(exist_ok=True) + owner_dir.mkdir(parents=True, exist_ok=True) if (owner_dir / repo_name).exists(): print(f'Repository {owner}/{repo_name} already exists, skipping...') return True try: result = subprocess.run( - ['git', 'clone', repo_url], + ['git', 'clone', '--depth', '1', repo_url], cwd=owner_dir, capture_output=True, text=True, - check=False + check=False, + timeout=300, ) if result.returncode == 0: print(f'Successfully cloned {owner}/{repo_name}') return True else: print(f'Failed to clone {owner}/{repo_name}: {result.stderr}') return False + except FileNotFoundError: + print('git not found on PATH; cannot clone repositories.', file=sys.stderr) + return False + except subprocess.TimeoutExpired: + print(f'Timeout cloning {owner}/{repo_name}', file=sys.stderr) + return False except subprocess.SubprocessError as e: print(f'Error cloning {owner}/{repo_name}: {e}', file=sys.stderr) return False
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
scripts/01-fetch-github.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
scripts/01-fetch-github.py (1)
stubs/bs4/__init__.pyi (1)
BeautifulSoup(7-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: make
- GitHub Check: latexmk
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
scripts/01-fetch-github.py (1)
48-70: Address unresolved hardening concerns from previous review.The clone operation still lacks critical improvements flagged in previous reviews:
- Line 50:
owner_dir.mkdir(exist_ok=True)missingparents=Truefor nested paths- Line 56: Missing
--depth 1flag for shallow cloning (reduces bandwidth/time)- Line 55: Missing
timeoutparameter onsubprocess.run(prevents hangs)- Line 68: Missing
FileNotFoundErrorhandler for missinggitexecutableApply this diff to address all concerns:
def clone_repository(self, repo_url: str, owner: str, repo_name: str) -> bool: owner_dir = self.output_dir / owner - owner_dir.mkdir(exist_ok=True) + owner_dir.mkdir(parents=True, exist_ok=True) if (owner_dir / repo_name).exists(): print(f'Repository {owner}/{repo_name} already exists, skipping...') return True try: result = subprocess.run( - ['git', 'clone', repo_url], + ['git', 'clone', '--depth', '1', repo_url], cwd=owner_dir, capture_output=True, text=True, - check=False + check=False, + timeout=300, ) if result.returncode == 0: print(f'Successfully cloned {owner}/{repo_name}') return True else: print(f'Failed to clone {owner}/{repo_name}: {result.stderr}') return False + except FileNotFoundError: + print('git not found on PATH; cannot clone repositories.', file=sys.stderr) + return False + except subprocess.TimeoutExpired: + print(f'Timeout cloning {owner}/{repo_name}', file=sys.stderr) + return False except subprocess.SubprocessError as e: print(f'Error cloning {owner}/{repo_name}: {e}', file=sys.stderr) return False
🧹 Nitpick comments (1)
scripts/01-fetch-github.py (1)
115-119: LGTM! Consider making URL and timeout configurable.The main function flow is correct. Optionally, you could add CLI arguments for
--urland--timeoutto make these configurable instead of hardcoded on line 117.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
scripts/01-fetch-github.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
scripts/01-fetch-github.py (1)
stubs/bs4/__init__.pyi (1)
BeautifulSoup(7-37)
🪛 GitHub Actions: flake8
scripts/01-fetch-github.py
[error] 15-15: SyntaxError: non-default argument follows default argument (E999) during linting. This caused 'flake8 --max-line-length=120' to fail.
🪛 GitHub Actions: pylint
scripts/01-fetch-github.py
[error] 15-15: Pylint parsing failed: 'non-default argument follows default argument' (syntax-error) in 01-fetch-github.py:15
🪛 GitHub Actions: ruff
scripts/01-fetch-github.py
[error] 15-15: Syntax error: Parameter without a default cannot follow a parameter with a default in init signature.
[error] 15-15: Syntax error: Parameter without a default cannot follow a parameter with a default in init signature.
🪛 GitHub Actions: ty
scripts/01-fetch-github.py
[error] 15-15: Step: uvx ty==0.0.1-alpha.8 check. Command failed with exit code 1. invalid-syntax: Parameter without a default cannot follow a parameter with a default in init(self, output_dir: str = 'target/01', trend_url, timeout).
[error] 15-15: Step: uvx ty==0.0.1-alpha.8 check. Command failed with exit code 1. invalid-syntax: Parameter without a default cannot follow a parameter with a default in init(self, output_dir: str = 'target/01', trend_url, timeout).
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: latexmk
- GitHub Check: make
🔇 Additional comments (4)
scripts/01-fetch-github.py (4)
20-21: LGTM!The directory setup correctly uses
parents=Trueto handle nested paths.
23-46: LGTM!The implementation is robust with proper error handling, multiple CSS selector fallbacks, and defensive URL normalization. The timeout and error propagation ensure the caller can handle failures appropriately.
93-112: LGTM!The argument parser is well-structured with sensible defaults and clear help text.
122-123: LGTM!Standard and correct
__main__guard implementation.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
scripts/01-fetch-github.py (1)
48-70: Critical: Address unresolved hardening concerns from previous reviews.The clone operation still lacks several critical improvements requested in previous reviews:
- Line 50:
owner_dir.mkdir(exist_ok=True)should useparents=Trueto handle nested directory paths safely.- Line 56: Missing
--depth 1flag for shallow cloning (significantly reduces bandwidth and time).- Line 55: Missing
timeoutparameter onsubprocess.run(prevents hangs on unresponsive clones).- Line 68: Missing
FileNotFoundErrorhandler to gracefully handle missinggitexecutable.Apply this diff to address all concerns:
def clone_repository(self, repo_url: str, owner: str, repo_name: str) -> bool: owner_dir = self.output_dir / owner - owner_dir.mkdir(exist_ok=True) + owner_dir.mkdir(parents=True, exist_ok=True) if (owner_dir / repo_name).exists(): print(f'Repository {owner}/{repo_name} already exists, skipping...') return True try: result = subprocess.run( - ['git', 'clone', repo_url], + ['git', 'clone', '--depth', '1', repo_url], cwd=owner_dir, capture_output=True, text=True, - check=False + check=False, + timeout=300, ) if result.returncode == 0: print(f'Successfully cloned {owner}/{repo_name}') return True else: print(f'Failed to clone {owner}/{repo_name}: {result.stderr}') return False + except FileNotFoundError: + print('git not found on PATH; cannot clone repositories.', file=sys.stderr) + return False + except subprocess.TimeoutExpired: + print(f'Timeout cloning {owner}/{repo_name}', file=sys.stderr) + return False except subprocess.SubprocessError as e: print(f'Error cloning {owner}/{repo_name}: {e}', file=sys.stderr) return False
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
scripts/01-fetch-github.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
scripts/01-fetch-github.py (1)
stubs/bs4/__init__.pyi (1)
BeautifulSoup(7-37)
🪛 GitHub Actions: pylint
scripts/01-fetch-github.py
[error] 117-117: Pylint: Line too long (C0301) 106/100 in scripts/01-fetch-github.py:117. Command that failed: 'make pylint' (uv run pylint aibolit test scripts --ignore=scripts/target).
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: make
- GitHub Check: latexmk
🔇 Additional comments (1)
scripts/01-fetch-github.py (1)
15-18: Critical: Fix parameter ordering in__init__signature.Parameters with default values must come after parameters without defaults. This syntax error is blocking all CI pipelines (flake8, pylint, ruff, ty).
Apply this diff to fix the parameter ordering:
- def __init__(self, output, url, timeout): + def __init__(self, url, timeout, output='target/01'):Alternatively, if all parameters should be explicit:
- def __init__(self, output, url, timeout): + def __init__(self, output: str, url: str, timeout: int):Then update the call site at line 117 to pass all three arguments explicitly.
Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
scripts/01-fetch-github.py (1)
48-70: Critical: Implement all requested clone hardening improvements.This method still lacks the critical improvements requested in multiple previous reviews:
- Line 50:
mkdirmissingparents=True— will fail on nested owner paths.- Line 56: Missing
--depth 1flag — wastes bandwidth and time on full clones.- Line 55: Missing
timeoutparameter — process can hang indefinitely on unresponsive remotes.- Line 68: Missing
FileNotFoundErrorhandler — crashes ungracefully whengitis not installed.Apply this diff to implement all hardening improvements:
def clone_repository(self, repo_url: str, owner: str, repo_name: str) -> bool: owner_dir = self.output_dir / owner - owner_dir.mkdir(exist_ok=True) + owner_dir.mkdir(parents=True, exist_ok=True) if (owner_dir / repo_name).exists(): print(f'Repository {owner}/{repo_name} already exists, skipping...') return True try: result = subprocess.run( - ['git', 'clone', repo_url], + ['git', 'clone', '--depth', '1', repo_url], cwd=owner_dir, capture_output=True, text=True, - check=False + check=False, + timeout=300, ) if result.returncode == 0: print(f'Successfully cloned {owner}/{repo_name}') return True else: print(f'Failed to clone {owner}/{repo_name}: {result.stderr}') return False + except FileNotFoundError: + print('git not found on PATH; cannot clone repositories.', file=sys.stderr) + return False + except subprocess.TimeoutExpired: + print(f'Timeout cloning {owner}/{repo_name}', file=sys.stderr) + return False except subprocess.SubprocessError as e: print(f'Error cloning {owner}/{repo_name}: {e}', file=sys.stderr) return False
🧹 Nitpick comments (1)
scripts/01-fetch-github.py (1)
15-18: Add type hints to__init__parameters.The parameters lack type annotations, reducing code clarity and preventing static type checkers from catching errors.
Apply this diff:
- def __init__(self, output, url, timeout): + def __init__(self, output: str, url: str, timeout: int): self.output_dir = Path(output) self.trending_url = url self.request_timeout = timeout
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
scripts/01-fetch-github.py(1 hunks)
🔇 Additional comments (2)
scripts/01-fetch-github.py (2)
20-46: LGTM!The directory setup and repository fetching logic are well implemented:
setup_directoriescorrectly usesparents=Trueto handle nested paths.fetch_trending_repositorieshas robust error handling and uses a fallback chain of DOM selectors.- Defensive parsing of
hrefattributes prevents crashes on malformed HTML.
93-127: LGTM!The CLI argument parsing, main function, and module guard are well structured:
parse_argumentsproperly configures defaults and help text.mainorchestrates the download workflow cleanly.- The
__main__guard correctly invokesmain().
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
scripts/01-fetch-github.py (1)
48-70: Critical: Address unresolved hardening concerns from previous reviews.The clone operation still lacks several critical improvements requested in multiple previous reviews:
- Line 50:
owner_dir.mkdir(exist_ok=True)should useparents=Trueto handle nested paths.- Line 56: Missing
--depth 1flag for shallow cloning (reduces bandwidth and time).- Line 55: Missing
timeoutparameter onsubprocess.run(prevents hangs on unresponsive clones).- Line 68: Missing
FileNotFoundErrorhandler to gracefully handle missinggitexecutable.- Missing
TimeoutExpiredhandler for the timeout parameter.Apply this diff to address all concerns:
def clone_repository(self, repo_url: str, owner: str, repo_name: str) -> bool: owner_dir = self.output_dir / owner - owner_dir.mkdir(exist_ok=True) + owner_dir.mkdir(parents=True, exist_ok=True) if (owner_dir / repo_name).exists(): print(f'Repository {owner}/{repo_name} already exists, skipping...') return True try: result = subprocess.run( - ['git', 'clone', repo_url], + ['git', 'clone', '--depth', '1', repo_url], cwd=owner_dir, capture_output=True, text=True, - check=False + check=False, + timeout=300, ) if result.returncode == 0: print(f'Successfully cloned {owner}/{repo_name}') return True else: print(f'Failed to clone {owner}/{repo_name}: {result.stderr}') return False + except FileNotFoundError: + print('git not found on PATH; cannot clone repositories.', file=sys.stderr) + return False + except subprocess.TimeoutExpired: + print(f'Timeout cloning {owner}/{repo_name}', file=sys.stderr) + return False except subprocess.SubprocessError as e: print(f'Error cloning {owner}/{repo_name}: {e}', file=sys.stderr) return False
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
scripts/01-fetch-github.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
scripts/01-fetch-github.py (1)
stubs/bs4/__init__.pyi (1)
BeautifulSoup(7-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: make
- GitHub Check: latexmk
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
scripts/01-fetch-github.py (1)
49-71: Harden git clone: shallow clone, timeout, parents=True, and handle missing git/timeouts.Prevents hangs, reduces bandwidth, and improves error handling. This was flagged previously and remains unresolved.
def clone_repository(self, repo_url: str, owner: str, repo_name: str) -> bool: owner_dir = self.output_dir / owner - owner_dir.mkdir(exist_ok=True) + owner_dir.mkdir(parents=True, exist_ok=True) if (owner_dir / repo_name).exists(): print(f'Repository {owner}/{repo_name} already exists, skipping...') return True try: result = subprocess.run( - ['git', 'clone', repo_url], + ['git', 'clone', '--depth', '1', repo_url], cwd=owner_dir, capture_output=True, text=True, - check=False + check=False, + timeout=300, ) if result.returncode == 0: print(f'Successfully cloned {owner}/{repo_name}') return True else: - print(f'Failed to clone {owner}/{repo_name}: {result.stderr}') + print(f'Failed to clone {owner}/{repo_name}: {result.stderr}', file=sys.stderr) return False + except FileNotFoundError: + print('git not found on PATH; cannot clone repositories.', file=sys.stderr) + return False + except subprocess.TimeoutExpired: + print(f'Timeout cloning {owner}/{repo_name}', file=sys.stderr) + return False except subprocess.SubprocessError as e: print(f'Error cloning {owner}/{repo_name}: {e}', file=sys.stderr) return False
🧹 Nitpick comments (4)
scripts/01-fetch-github.py (4)
24-47: Trending parsing: solid selectors and normalization. Minor hardening suggestion.Consider adding a User-Agent to reduce chances of 403s from GitHub.
- response = requests.get(self.trending_url, timeout=self.request_timeout) + response = requests.get( + self.trending_url, + timeout=self.request_timeout, + headers={'User-Agent': 'aibolit/1.0 (+https://github.com/cqfn/aibolit)'} + )
73-79: Minor: print fetch failure to stderr.- print('Failed to fetch repository list. Exiting.') + print('Failed to fetch repository list. Exiting.', file=sys.stderr)
8-8: Use built‑in generics (list[str]) to drop typing.List.-from typing import List +# from typing import List # no longer needed with built-in generics - def fetch_trending_repositories(self) -> List[str]: + def fetch_trending_repositories(self) -> list[str]: - repositories: List[str] = [] + repositories: list[str] = []Also applies to: 24-25, 32-33
96-107: Validate --nrepos as positive integer.-def parse_arguments() -> argparse.Namespace: +def parse_arguments() -> argparse.Namespace: + def positive_int(value: str) -> int: + iv = int(value) + if iv < 1: + raise argparse.ArgumentTypeError('nrepos must be a positive integer') + return iv @@ - '--nrepos', - type=int, + '--nrepos', + type=positive_int, required=False, default=100,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
scripts/01-fetch-github.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
scripts/01-fetch-github.py (1)
stubs/bs4/__init__.pyi (1)
BeautifulSoup(7-37)
🪛 GitHub Actions: pylint
scripts/01-fetch-github.py
[warning] 12-12: C0411: standard import 'urllib.parse.urlparse' should be placed before third party imports 'requests', 'bs4.BeautifulSoup' (wrong-import-order)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: make
- GitHub Check: latexmk
🔇 Additional comments (2)
scripts/01-fetch-github.py (2)
16-19: Constructor typing and order look good.
84-90: Good switch to urlparse; confirm Python baseline for removesuffix.
str.removesuffixneeds Python 3.9+. If 3.8 must be supported, usersplit('.git', 1)[0]instead.
scripts/01-fetch-github.py
Outdated
| import requests | ||
| from bs4 import BeautifulSoup | ||
| from urllib.parse import urlparse |
There was a problem hiding this comment.
Fix import order to unblock pylint (standard libs before third‑party).
Move urlparse above requests/bs4 and separate groups with a blank line.
-import requests
-from bs4 import BeautifulSoup
-from urllib.parse import urlparse
+from urllib.parse import urlparse
+
+import requests
+from bs4 import BeautifulSoup📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import requests | |
| from bs4 import BeautifulSoup | |
| from urllib.parse import urlparse | |
| from urllib.parse import urlparse | |
| import requests | |
| from bs4 import BeautifulSoup |
🧰 Tools
🪛 GitHub Actions: pylint
[warning] 12-12: C0411: standard import 'urllib.parse.urlparse' should be placed before third party imports 'requests', 'bs4.BeautifulSoup' (wrong-import-order)
🤖 Prompt for AI Agents
In scripts/01-fetch-github.py around lines 10 to 12, the standard-library import
from urllib.parse (urlparse) should be moved above third-party imports and
groups separated by a blank line to satisfy pylint; reorder so standard-library
imports come first (urllib.parse import), add a single blank line, then keep
requests and bs4 imports below.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
scripts/01-fetch-github.py (2)
56-71: Major: Add shallow cloning, timeout, and missing error handlers.These hardening improvements were requested in previous reviews but remain unaddressed:
- Line 57: Missing
--depth 1for shallow cloning (wastes bandwidth and time downloading full history)- Line 56: Missing
timeoutparameter (could hang indefinitely)- Line 69: Missing
FileNotFoundErrorhandler (crashes unclearly if git not installed)Apply this diff:
try: result = subprocess.run( - ['git', 'clone', repo_url], + ['git', 'clone', '--depth', '1', repo_url], cwd=owner_dir, capture_output=True, text=True, - check=False + check=False, + timeout=300, ) if result.returncode == 0: print(f'Successfully cloned {owner}/{repo_name}') return True else: print(f'Failed to clone {owner}/{repo_name}: {result.stderr}') return False + except FileNotFoundError: + print('git not found on PATH; cannot clone repositories.', file=sys.stderr) + return False + except subprocess.TimeoutExpired: + print(f'Timeout cloning {owner}/{repo_name}', file=sys.stderr) + return False except subprocess.SubprocessError as e: print(f'Error cloning {owner}/{repo_name}: {e}', file=sys.stderr) return False
51-51: Critical: Addparents=Trueto prevent failures on nested paths.This issue was flagged in previous reviews but remains unresolved. Line 51 uses
mkdir(exist_ok=True)withoutparents=True, which will fail if the owner name requires intermediate directory creation.Apply this diff:
- owner_dir.mkdir(exist_ok=True) + owner_dir.mkdir(parents=True, exist_ok=True)
🧹 Nitpick comments (1)
scripts/01-fetch-github.py (1)
118-126: LGTM! Clean entrypoint with proper initialization.The main function correctly:
- Parses CLI arguments
- Converts output path to Path object
- Initializes downloader with appropriate parameters
- Calls methods in correct order
Optional: Consider making the trending URL and timeout configurable via CLI arguments for flexibility:
parser.add_argument( '--output-dir', type=str, required=False, default='target/01', help='Output directory for downloaded repositories (default: target/01)' ) + parser.add_argument( + '--url', + type=str, + required=False, + default='https://github.com/trending/java?since=daily', + help='Trending URL to fetch repositories from' + ) + parser.add_argument( + '--timeout', + type=int, + required=False, + default=30, + help='HTTP request timeout in seconds (default: 30)' + ) return parser.parse_args()Then update main:
downloader = RepositoryDownloader( Path(args.output_dir), - 'https://github.com/trending/java?since=daily', - 30 + args.url, + args.timeout )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
scripts/01-fetch-github.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
scripts/01-fetch-github.py (1)
stubs/bs4/__init__.pyi (1)
BeautifulSoup(7-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: latexmk
- GitHub Check: make
🔇 Additional comments (7)
scripts/01-fetch-github.py (7)
4-12: LGTM! Imports are clean and necessary.All imports are used and properly organized (standard library first, then third-party).
15-19: LGTM! Constructor is well-typed and clear.Type hints are present and correct, parameter order is valid, and attribute names are descriptive.
21-22: LGTM! Directory creation is robust.Using
parents=Trueandexist_ok=Truetogether ensures the directory structure is created safely without failing on existing paths.
24-47: LGTM! Robust repository fetching with good error handling.The implementation includes:
- Timeout on HTTP request
- Proper exception handling with re-raise
- Multiple CSS selector fallbacks for different GitHub layouts
- Defensive href extraction preventing KeyError/AttributeError
- URL normalization ensuring consistent format
73-93: LGTM! Clean orchestration with proper URL parsing.The method correctly:
- Handles fetch errors gracefully
- Uses
urlparsefor robust URL parsing (addressing previous brittleness concerns)- Validates path structure before extraction
- Uses
removesuffix('.git')for clean repo name extraction- Respects max_repositories limit
96-115: LGTM! Standard argparse usage with clear defaults.The CLI configuration is straightforward with sensible defaults (100 repos, 'target/01' directory) and descriptive help text.
129-130: LGTM! Standard Python entrypoint.Correct usage of the
__main__guard calling the main function.
|
@yegor256 @literally-bug-creator hello, please check PR |
|
@kodsurfer First things first, we should cover the code with tests, and then we can patch it |
|
@literally-bug-creator @yegor256 I can add tests for new code |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (2)
test/scripts/test_fetch_github.py (2)
178-199: Consider testing production code directly instead of duplicating logic.The
_normalize_urlhelper method duplicates the URL normalization logic fromfetch_trending_repositories. This creates a maintenance burden: if the production logic changes, this test helper must also be updated, and the test might pass even when production behavior is incorrect.Consider one of these approaches:
Option 1: Test the normalization through
fetch_trending_repositoriesitself (which you're already doing in lines 35-62).Option 2: If URL normalization is complex enough to warrant separate testing, extract it as a standalone method in the production code:
In
scripts/01-fetch-github.py:@staticmethod def normalize_github_url(href: str) -> str: """Normalize a GitHub repository URL.""" if not href: return href if href.startswith('/'): href = f'https://github.com{href}' if not href.endswith('.git'): href = f'{href}.git' return hrefThen test it directly:
def test_url_parsing_and_normalization(self): """Test URL parsing and .git suffix normalization""" test_cases = [...] for input_href, expected_url in test_cases: with self.subTest(href=input_href): normalized = RepositoryDownloader.normalize_github_url(input_href) self.assertEqual(normalized, expected_url)
1-237: Consider standardizing quote style for consistency.The linter (ruff) flags multiple instances where double quotes are used instead of single quotes. While this doesn't affect functionality, consistent quote usage improves code readability.
If your project prefers single quotes (as indicated by the ruff configuration), you can run an automated formatter to fix all instances at once:
#!/bin/bash # Use ruff to automatically fix quote style ruff check --select Q --fix test/scripts/test_fetch_github.py
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
test/scripts/test_fetch_github.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
test/scripts/test_fetch_github.py (1)
scripts/01-fetch-github.py (6)
RepositoryDownloader(15-93)setup_directories(21-22)fetch_trending_repositories(24-47)clone_repository(49-71)download_repositories(73-93)parse_arguments(96-115)
🪛 GitHub Actions: flake8
test/scripts/test_fetch_github.py
[error] 4-4: F401 'os' imported but unused.
[error] 8-8: F401 'unittest.mock.MagicMock' imported but unused.
[error] 220-220: W291 trailing whitespace.
[error] 224-224: E302 expected 2 blank lines, found 1.
[error] 228-228: W291 trailing whitespace.
[error] 230-230: F841 local variable 'mock_main' is assigned to but never used.
🪛 GitHub Actions: make
test/scripts/test_fetch_github.py
[error] 12-12: ModuleNotFoundError: No module named 'RepositoryDownloader' during test collection. Ensure the dependency is installed or adjust import path.
🪛 GitHub Actions: ruff
test/scripts/test_fetch_github.py
[error] 4-4: F401: 'os' imported but unused
[error] 8-8: F401: 'unittest.mock.MagicMock' imported but unused
[warning] 18-18: Q000: Double quotes found but single quotes preferred
[warning] 21-21: Q000: Double quotes found but single quotes preferred
[warning] 40-40: Q001: Double quotes found but single quotes preferred
[warning] 55-55: Q000: Double quotes found but single quotes preferred
[warning] 60-60: Q000: Double quotes found but single quotes preferred
[warning] 66-66: Q001: Double quotes found but single quotes preferred
[warning] 75-75: Q000: Double quotes found but single quotes preferred
[warning] 85-85: Q000: Double quotes found but single quotes preferred
[warning] 88-88: Q000: Double quotes found but single quotes preferred
[warning] 89-89: Q000: Double quotes found but single quotes preferred
[warning] 90-90: Q000: Double quotes found but single quotes preferred
[warning] 95-95: Q000: Double quotes found but single quotes preferred
[warning] 105-105: Q000: Double quotes found but single quotes preferred
[warning] 116-116: Q000: Double quotes found but single quotes preferred
[warning] 126-126: Q000: Double quotes found but single quotes preferred
[warning] 131-131: Q000: Double quotes found but single quotes preferred
[warning] 132-132: Q000: Double quotes found but single quotes preferred
[warning] 133-133: Q000: Double quotes found but single quotes preferred
[warning] 140-140: Q000: Double quotes found but single quotes preferred
[warning] 141-141: Q000: Double quotes found but single quotes preferred
[warning] 152-152: Q000: Double quotes found but single quotes preferred
[warning] 162-162: Q000: Double quotes found but single quotes preferred
[warning] 169-169: Q000: Double quotes found but single quotes preferred
[warning] 181-181: Q000: Double quotes found but single quotes preferred
[warning] 181-181: Q000: Double quotes found but single quotes preferred
[warning] 230-230: F841: Local variable mock_main is assigned to but never used
🪛 GitHub Actions: ty
test/scripts/test_fetch_github.py
[error] 12-12: Cannot resolve imported module RepositoryDownloader.
[error] 206-206: Cannot resolve imported module your_module.
[error] 219-219: Cannot resolve imported module your_module.
[error] 231-231: Cannot resolve imported module your_module.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: latexmk
🔇 Additional comments (4)
test/scripts/test_fetch_github.py (4)
15-33: LGTM!The test class setup, teardown, and directory creation test are well-structured. Proper use of temporary directories ensures test isolation.
124-135: LGTM!The test correctly validates that existing repositories are skipped without requiring mocks.
236-237: LGTM!Standard unittest main guard is correctly implemented.
35-62: Add missing@patchdecorator.The test method expects a
mock_getparameter but lacks the required@patchdecorator.Apply this diff:
+@patch('scripts.01-fetch-github.requests.get') def test_fetch_trending_repositories_success(self, mock_get):Note: Adjust the patch path to match your actual module structure (e.g.,
scripts.fetch_github.requests.getif using Python-friendly naming).Likely an incorrect or invalid review comment.
test/scripts/test_fetch_github.py
Outdated
| # SPDX-FileCopyrightText: Copyright (c) 2019-2025 Aibolit | ||
| # SPDX-License-Identifier: MIT | ||
|
|
||
| import os |
There was a problem hiding this comment.
Fix critical import errors and remove unused imports.
Multiple import issues detected:
- Line 4:
osis imported but never used - Line 8:
MagicMockis imported but never used - Line 12: The import statement is completely incorrect and will cause
ModuleNotFoundError
Apply this diff to fix the imports:
-import os
import tempfile
import unittest
from pathlib import Path
-from unittest.mock import Mock, patch, MagicMock
+from unittest.mock import Mock, patch
import subprocess
import requests
-import RepositoryDownloader
+from scripts['01-fetch-github'] import RepositoryDownloaderNote: The actual import path may need adjustment based on your project structure. Common alternatives:
from scripts.fetch_github import RepositoryDownloader(if the file is renamed)- Add the scripts directory to
sys.pathbefore importing
Also applies to: 8-8, 12-12
🧰 Tools
🪛 GitHub Actions: flake8
[error] 4-4: F401 'os' imported but unused.
🪛 GitHub Actions: ruff
[error] 4-4: F401: 'os' imported but unused
🤖 Prompt for AI Agents
In test/scripts/test_fetch_github.py around lines 4, 8 and 12 remove unused
imports and fix the broken import: delete the unused "import os" (line 4) and
remove the unused "MagicMock" import (line 8), then replace the incorrect module
import on line 12 with a correct path to the tested class (for example use "from
scripts.fetch_github import RepositoryDownloader" or the proper relative import
for your package); if the test runner cannot find the scripts package, add a
short sys.path append before importing to include the project root (e.g. insert
sys.path manipulation above the import) so the import resolves.
| def test_fetch_trending_repositories_request_exception(self, mock_get): | ||
| """Test handling of request exceptions""" | ||
| mock_get.side_effect = requests.RequestException("Network error") | ||
|
|
||
| with self.assertRaises(requests.RequestException): | ||
| self.downloader.fetch_trending_repositories() | ||
|
|
||
| def test_fetch_trending_repositories_empty_anchors(self, mock_get): | ||
| """Test handling of empty repository list""" | ||
| mock_response = Mock() | ||
| mock_response.raise_for_status.return_value = None | ||
| mock_response.text = "<html><body></body></html>" | ||
| mock_get.return_value = mock_response | ||
|
|
||
| repositories = self.downloader.fetch_trending_repositories() | ||
| self.assertEqual(repositories, []) |
There was a problem hiding this comment.
Add missing @patch decorators.
Both test methods expect a mock_get parameter but lack the required @patch decorator.
Apply this diff:
+@patch('scripts.01-fetch-github.requests.get')
def test_fetch_trending_repositories_request_exception(self, mock_get):
"""Test handling of request exceptions"""
mock_get.side_effect = requests.RequestException("Network error")
with self.assertRaises(requests.RequestException):
self.downloader.fetch_trending_repositories()
+@patch('scripts.01-fetch-github.requests.get')
def test_fetch_trending_repositories_empty_anchors(self, mock_get):🧰 Tools
🪛 GitHub Actions: ruff
[warning] 66-66: Q001: Double quotes found but single quotes preferred
[warning] 75-75: Q000: Double quotes found but single quotes preferred
🤖 Prompt for AI Agents
In test/scripts/test_fetch_github.py around lines 64 to 79, the two test methods
declare a mock_get parameter but are missing the @patch decorator that supplies
it; add @patch('requests.get') above both
test_fetch_trending_repositories_request_exception and
test_fetch_trending_repositories_empty_anchors so the mock_get argument is
injected, and import patch from unittest.mock if not already imported.
| def test_clone_repository_success(self, mock_subprocess): | ||
| """Test successful repository cloning""" | ||
| mock_result = Mock() | ||
| mock_result.returncode = 0 | ||
| mock_result.stderr = "" | ||
| mock_subprocess.return_value = mock_result | ||
| result = self.downloader.clone_repository( | ||
| "https://github.com/owner/repo.git", | ||
| "owner", | ||
| "repo" | ||
| ) | ||
| self.assertTrue(result) | ||
| mock_subprocess.assert_called_once_with( | ||
| ['git', 'clone', 'https://github.com/owner/repo.git'], | ||
| cwd=self.test_output_dir / "owner", | ||
| capture_output=True, | ||
| text=True, | ||
| check=False | ||
| ) | ||
|
|
||
| def test_clone_repository_failure(self, mock_subprocess): | ||
| """Test repository cloning failure""" | ||
| mock_result = Mock() | ||
| mock_result.returncode = 1 | ||
| mock_result.stderr = "Authentication failed" | ||
| mock_subprocess.return_value = mock_result | ||
| result = self.downloader.clone_repository( | ||
| "https://github.com/owner/repo.git", | ||
| "owner", | ||
| "repo" | ||
| ) | ||
| self.assertFalse(result) | ||
|
|
||
| def test_clone_repository_subprocess_error(self, mock_subprocess): | ||
| """Test handling of subprocess errors""" | ||
| mock_subprocess.side_effect = subprocess.SubprocessError("Git not found") | ||
| result = self.downloader.clone_repository( | ||
| "https://github.com/owner/repo.git", | ||
| "owner", | ||
| "repo" | ||
| ) | ||
| self.assertFalse(result) |
There was a problem hiding this comment.
Add missing @patch decorators.
Three test methods expect a mock_subprocess parameter but lack the required @patch decorator for subprocess.run.
Apply this diff:
+@patch('subprocess.run')
def test_clone_repository_success(self, mock_subprocess):
"""Test successful repository cloning"""
mock_result = Mock()
mock_result.returncode = 0
mock_result.stderr = ""
mock_subprocess.return_value = mock_result
...
+@patch('subprocess.run')
def test_clone_repository_failure(self, mock_subprocess):
"""Test repository cloning failure"""
...
+@patch('subprocess.run')
def test_clone_repository_subprocess_error(self, mock_subprocess):🧰 Tools
🪛 GitHub Actions: ruff
[warning] 85-85: Q000: Double quotes found but single quotes preferred
[warning] 88-88: Q000: Double quotes found but single quotes preferred
[warning] 89-89: Q000: Double quotes found but single quotes preferred
[warning] 90-90: Q000: Double quotes found but single quotes preferred
[warning] 95-95: Q000: Double quotes found but single quotes preferred
[warning] 105-105: Q000: Double quotes found but single quotes preferred
[warning] 116-116: Q000: Double quotes found but single quotes preferred
🤖 Prompt for AI Agents
In test/scripts/test_fetch_github.py around lines 81 to 122, the three tests
take a mock_subprocess argument but do not patch subprocess.run; add
@patch('subprocess.run') decorators above test_clone_repository_success,
test_clone_repository_failure, and test_clone_repository_subprocess_error so the
mock_subprocess parameter is provided by the test framework; ensure the patch
import (from unittest.mock import patch) is present at the top of the file if
missing.
| def test_download_repositories_success(self, mock_clone, mock_fetch): | ||
| """Test successful download of multiple repositories""" | ||
| mock_fetch.return_value = [ | ||
| "https://github.com/owner1/repo1.git", | ||
| "https://github.com/owner2/repo2.git", | ||
| "https://github.com/owner3/repo3.git" | ||
| ] | ||
| mock_clone.return_value = True | ||
| self.downloader.download_repositories(max_repositories=2) | ||
| self.assertEqual(mock_clone.call_count, 2) | ||
| mock_fetch.assert_called_once() | ||
|
|
||
| def test_download_repositories_partial_success(self, mock_clone, mock_fetch): | ||
| """Test download with some repositories failing to clone""" | ||
| mock_fetch.return_value = [ | ||
| "https://github.com/owner1/repo1.git", | ||
| "https://github.com/owner2/repo2.git", | ||
| "https://github.com/owner3/repo3.git" | ||
| ] | ||
| mock_clone.side_effect = [True, False, True] | ||
| self.downloader.download_repositories(max_repositories=3) | ||
| self.assertEqual(mock_clone.call_count, 3) | ||
|
|
||
| def test_download_repositories_fetch_fails(self, mock_fetch): | ||
| """Test download when fetching repository list fails""" | ||
| mock_fetch.side_effect = requests.RequestException("Fetch failed") | ||
| self.downloader.download_repositories(max_repositories=10) | ||
| mock_fetch.assert_called_once() | ||
|
|
||
| def test_download_repositories_invalid_urls(self, mock_clone, mock_fetch): | ||
| """Test handling of invalid repository URLs""" | ||
| mock_fetch.return_value = [ | ||
| "https://github.com/owner1/repo1.git", | ||
| "invalid_url", | ||
| "https://github.com/singlepart", | ||
| "https://github.com/owner2/repo2.git" | ||
| ] | ||
| mock_clone.return_value = True | ||
| self.downloader.download_repositories(max_repositories=10) | ||
| self.assertEqual(mock_clone.call_count, 2) |
There was a problem hiding this comment.
Add missing @patch decorators.
All four test methods expect mock parameters but lack the required @patch decorators.
Apply this diff:
+@patch.object(RepositoryDownloader, 'fetch_trending_repositories')
+@patch.object(RepositoryDownloader, 'clone_repository')
def test_download_repositories_success(self, mock_clone, mock_fetch):
"""Test successful download of multiple repositories"""
...
+@patch.object(RepositoryDownloader, 'fetch_trending_repositories')
+@patch.object(RepositoryDownloader, 'clone_repository')
def test_download_repositories_partial_success(self, mock_clone, mock_fetch):
"""Test download with some repositories failing to clone"""
...
+@patch.object(RepositoryDownloader, 'fetch_trending_repositories')
def test_download_repositories_fetch_fails(self, mock_fetch):
"""Test download when fetching repository list fails"""
...
+@patch.object(RepositoryDownloader, 'fetch_trending_repositories')
+@patch.object(RepositoryDownloader, 'clone_repository')
def test_download_repositories_invalid_urls(self, mock_clone, mock_fetch):Note: When using multiple decorators, they are applied bottom-up, so the parameter order matches the decorator order from bottom to top.
🧰 Tools
🪛 GitHub Actions: ruff
[warning] 140-140: Q000: Double quotes found but single quotes preferred
[warning] 141-141: Q000: Double quotes found but single quotes preferred
[warning] 152-152: Q000: Double quotes found but single quotes preferred
[warning] 162-162: Q000: Double quotes found but single quotes preferred
[warning] 169-169: Q000: Double quotes found but single quotes preferred
🤖 Prompt for AI Agents
In test/scripts/test_fetch_github.py around lines 137 to 176, the four test
methods accept mock parameters but are missing the @patch decorators; add two
decorators to each test so the bottom decorator patches the clone function and
the top decorator patches the fetch function (e.g. top:
@patch('scripts.fetch_github.fetch_repository_list'), bottom:
@patch('scripts.fetch_github.clone_repository')) so the parameter order
(mock_clone, mock_fetch) matches decorator order bottom-to-top; apply these two
decorators to each of the four test methods.
| class ArgumentParserTestCase(unittest.TestCase): | ||
| def test_parse_arguments_default(self): | ||
| """Test argument parser with default values""" | ||
| with patch('sys.argv', ['script_name']): | ||
| from your_module import parse_arguments | ||
| args = parse_arguments() | ||
| self.assertEqual(args.nrepos, 100) | ||
| self.assertEqual(args.output_dir, 'target/01') | ||
|
|
||
| def test_parse_arguments_custom(self): | ||
| """Test argument parser with custom values""" | ||
| test_args = [ | ||
| 'script_name', | ||
| '--nrepos', '50', | ||
| '--output-dir', 'custom/directory' | ||
| ] | ||
| with patch('sys.argv', test_args): | ||
| from your_module import parse_arguments | ||
| args = parse_arguments() | ||
| self.assertEqual(args.nrepos, 50) | ||
| self.assertEqual(args.output_dir, 'custom/directory') |
There was a problem hiding this comment.
Replace placeholder module name and fix formatting.
The tests import from 'your_module' which is a placeholder and will cause import errors. Additionally, Line 220 has trailing whitespace.
Apply this diff:
def test_parse_arguments_default(self):
"""Test argument parser with default values"""
with patch('sys.argv', ['script_name']):
- from your_module import parse_arguments
+ from scripts['01-fetch-github'] import parse_arguments
args = parse_arguments()
self.assertEqual(args.nrepos, 100)
self.assertEqual(args.output_dir, 'target/01')
def test_parse_arguments_custom(self):
"""Test argument parser with custom values"""
test_args = [
'script_name',
'--nrepos', '50',
'--output-dir', 'custom/directory'
]
with patch('sys.argv', test_args):
- from your_module import parse_arguments
- args = parse_arguments()
+ from scripts['01-fetch-github'] import parse_arguments
+ args = parse_arguments()
self.assertEqual(args.nrepos, 50)
self.assertEqual(args.output_dir, 'custom/directory')Note: Adjust the import path based on your project structure (e.g., scripts.fetch_github if using Python-friendly naming).
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 GitHub Actions: flake8
[error] 220-220: W291 trailing whitespace.
🪛 GitHub Actions: ty
[error] 206-206: Cannot resolve imported module your_module.
[error] 219-219: Cannot resolve imported module your_module.
🤖 Prompt for AI Agents
In test/scripts/test_fetch_github.py around lines 202 to 222, the tests import
from the placeholder 'your_module' which causes import errors and line 220
contains trailing whitespace; update the import to the real module path used by
the project (for example replace "from your_module import parse_arguments" with
"from scripts.fetch_github import parse_arguments" or the correct package/module
path), ensure both test cases use that real import, and remove the trailing
whitespace at line 220 so the file has no extraneous spaces.
test/scripts/test_fetch_github.py
Outdated
| def test_main_execution(mock_path, mock_downloader): | ||
| """Test main function execution""" | ||
| mock_args = Mock() | ||
| mock_args.nrepos = 50 | ||
| mock_args.output_dir = 'test/output' | ||
| with patch('your_module.parse_arguments', return_value=mock_args): | ||
| with patch('your_module.main') as mock_main: | ||
| from your_module import main | ||
| main() | ||
| mock_downloader.assert_called_once() |
There was a problem hiding this comment.
Fix multiple critical issues with test structure and logic.
This test has several problems that prevent it from working:
- Not a proper test method: It's a standalone function, not a method of a test class, so unittest won't discover or run it
- Placeholder module names: References to
'your_module'will cause import errors - Confused mocking logic: Patches
main, importsmain, callsmain, but never verifies the mock - Undefined mock: Asserts
mock_downloaderbut it's neither defined nor passed as a parameter - Missing decorators:
mock_pathandmock_downloaderparameters have no corresponding@patchdecorators - Formatting: Missing blank line before function definition and trailing whitespace on Line 228
Recommended approach - create a proper test method in the ArgumentParserTestCase class:
class ArgumentParserTestCase(unittest.TestCase):
# ... existing tests ...
@patch('scripts.01-fetch-github.RepositoryDownloader')
@patch('scripts.01-fetch-github.Path')
def test_main_execution(self, mock_path, mock_downloader_class):
"""Test main function execution"""
mock_args = Mock()
mock_args.nrepos = 50
mock_args.output_dir = 'test/output'
mock_downloader_instance = Mock()
mock_downloader_class.return_value = mock_downloader_instance
with patch('scripts.01-fetch-github.parse_arguments', return_value=mock_args):
from scripts['01-fetch-github'] import main
main()
# Verify RepositoryDownloader was instantiated correctly
mock_downloader_class.assert_called_once()
# Verify download_repositories was called
mock_downloader_instance.setup_directories.assert_called_once()
mock_downloader_instance.download_repositories.assert_called_once_with(50)🧰 Tools
🪛 GitHub Actions: flake8
[error] 224-224: E302 expected 2 blank lines, found 1.
[error] 228-228: W291 trailing whitespace.
[error] 230-230: F841 local variable 'mock_main' is assigned to but never used.
🪛 GitHub Actions: ruff
[warning] 230-230: F841: Local variable mock_main is assigned to but never used
🪛 GitHub Actions: ty
[error] 231-231: Cannot resolve imported module your_module.
🤖 Prompt for AI Agents
In test/scripts/test_fetch_github.py around lines 224-233, the standalone test
function should be converted into a proper unittest method inside the existing
ArgumentParserTestCase class, remove trailing whitespace and add a blank line
before the method; add @patch decorators to mock scripts.01-fetch-github.Path
and scripts.01-fetch-github.RepositoryDownloader (order them so the mock args
match the signature), create a mock_args object and make parse_arguments return
it via patch('scripts.01-fetch-github.parse_arguments', return_value=mock_args),
have the RepositoryDownloader mock return a mock instance and call main()
imported from scripts.01-fetch-github, then assert the downloader class was
instantiated once and that the instance methods (setup_directories and
download_repositories) were called with the expected arguments
(download_repositories called with mock_args.nrepos).
Summary by CodeRabbit
New Features
Improvements
Refactor
Tests