Skip to content

Extend codegraph pipeline for more languages #1160

@Vasilije1990

Description

@Vasilije1990

📋 Summary

Extend the CodeGraph pipeline beyond Python to support multiple programming languages including JavaScript/TypeScript, Java, C#, Go, Rust, and others for comprehensive code repository analysis.

🔍 Background

Currently, CodeGraph pipeline only processes Python files (.py) and uses Python-specific parsing logic. Modern repositories often contain multiple languages that should be analyzed together for complete understanding.

🎯 Acceptance Criteria

  1. Language Support

JavaScript/TypeScript: .js, .ts, .jsx, .tsx files

Java: .java files with package imports

C#: .cs files with using statements

Go: .go files with import statements

Rust: .rs files with use statements

C/C++: .c, .cpp, .h, .hpp files

  1. File Discovery Enhancement

Update get_source_code_files() to accept language_config parameter

Add language-specific file extension mapping

Support multi-language repository scanning

  1. Language-Specific Parsing

Extend tree-sitter integration for each language

Create language-specific dependency extractors

Handle different import/module systems:

Python: import, from...import

JavaScript: import, require, export

Java: import, package

Go: import, package

  1. Configuration

Add supported_languages parameter to run_code_graph_pipeline()

Language detection from file extensions

Per-language exclusion patterns

  1. CodeGraph Entities Update

Extend CodeFile entity with language type

Language-specific dependency relationships

Cross-language dependency detection

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions