-
Notifications
You must be signed in to change notification settings - Fork 531
Description
📋 Summary
Add configurable path exclusion for the CodeGraph pipeline to ignore common directories like .venv, node_modules, pycache, and other build/cache folders when processing code repositories.
🔍 Background
Currently, CodeGraph pipeline has basic exclusion logic in get_source_code_files() but only excludes test files and checks for .venv in filenames (not full paths). Large repositories contain many irrelevant directories that should be ignored.
🎯 Acceptance Criteria
- Default Exclusion Patterns
Common Python exclusions: .venv/, venv/, pycache/, .pytest_cache/, build/, dist/
Node.js exclusions: node_modules/, .npm/
General exclusions: .git/, .svn/, .idea/, .vscode/, tmp/, temp/
File patterns: *.pyc, *.pyo, *.log, *.tmp
- Implementation
Update get_source_code_files() in get_repo_file_dependencies.py
Replace current filename check with full path checking
Add configurable exclusion list parameter
- Configuration
Add excluded_paths parameter to run_code_graph_pipeline()
Support both glob patterns and exact path matches
Default exclusion list + user customization