Skip to content

Exclude certain parts from CodeGraph pipeline #1163

@Vasilije1990

Description

@Vasilije1990

📋 Summary

Add configurable path exclusion for the CodeGraph pipeline to ignore common directories like .venv, node_modules, pycache, and other build/cache folders when processing code repositories.

🔍 Background

Currently, CodeGraph pipeline has basic exclusion logic in get_source_code_files() but only excludes test files and checks for .venv in filenames (not full paths). Large repositories contain many irrelevant directories that should be ignored.

🎯 Acceptance Criteria

  1. Default Exclusion Patterns

Common Python exclusions: .venv/, venv/, pycache/, .pytest_cache/, build/, dist/

Node.js exclusions: node_modules/, .npm/

General exclusions: .git/, .svn/, .idea/, .vscode/, tmp/, temp/

File patterns: *.pyc, *.pyo, *.log, *.tmp

  1. Implementation

Update get_source_code_files() in get_repo_file_dependencies.py

Replace current filename check with full path checking

Add configurable exclusion list parameter

  1. Configuration

Add excluded_paths parameter to run_code_graph_pipeline()

Support both glob patterns and exact path matches

Default exclusion list + user customization

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions