Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
bc0b2f6
feat: add SSH proxy server support
dcoric Sep 12, 2025
2bcb475
refactor: convert SSH files from JavaScript to TypeScript
dcoric Sep 12, 2025
0b38aee
feat: update SSH server to enhance client handling and logging
dcoric Sep 15, 2025
af69d45
Merge branch 'main' into denis-coric/ssh
dcoric Sep 15, 2025
8df000a
fix: enhance SSH server tests and client handling
dcoric Sep 15, 2025
719103a
feat: add findUserBySSHKey function to user database operations
dcoric Sep 15, 2025
2fd1703
refactor: enhance SSH server keepalive functionality and error handling
dcoric Sep 17, 2025
18b52ab
feat: implement SSH key retention feature for Git Proxy
dcoric Sep 17, 2025
91b58eb
feat: add SSH configuration and enhance server command handling
dcoric Sep 19, 2025
b2e7557
chore: update .gitignore to exclude Claude directory
dcoric Sep 19, 2025
7e3553c
fix: ensure SSH enabled configuration is a boolean and improve error …
dcoric Sep 19, 2025
2d56a76
Merge remote-tracking branch 'finos/main' into denis-coric/ssh-flow
dcoric Sep 25, 2025
61e6a0b
fix: fixes lint and refreshed package-lock.json
dcoric Sep 25, 2025
27b190b
Merge remote-tracking branch 'finos/main' into denis-coric/ssh-flow
dcoric Oct 3, 2025
d39e32e
fix: implement SSH pack data capture for security scanning
dcoric Oct 3, 2025
6192ee9
fix: adds test SSH keys to .gitignore
dcoric Oct 6, 2025
1f94f95
test: enhance SSHServer tests for git-receive-pack handling
dcoric Oct 6, 2025
3150f5d
feat: enhance configuration for SSH and git operations
dcoric Oct 7, 2025
2cc7553
feat: add comprehensive performance tests for HTTP/HTTPS and SSH prot…
dcoric Oct 9, 2025
107bac1
Merge branch 'main' into denis-coric/ssh-flow
dcoric Oct 9, 2025
8698ad1
Merge remote-tracking branch 'finos/main' into denis-coric/ssh-flow
dcoric Oct 13, 2025
cd47fb8
refactor: rename variables in performance tests for clarity
dcoric Oct 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -269,3 +269,8 @@ website/.docusaurus

# Jetbrains IDE
.idea

.claude/

# Test SSH keys (generated during tests)
test/keys/
1 change: 1 addition & 0 deletions .nvmrc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
v20
382 changes: 382 additions & 0 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,382 @@
# GitProxy Architecture

**Version**: 2.0.0-rc.3
**Last Updated**: 2025-01-10

## Overview

GitProxy is a security-focused Git proxy that intercepts push operations between developers and Git remote endpoints (GitHub, GitLab, etc.) to enforce security policies, compliance rules, and workflows. It supports both **HTTP/HTTPS** and **SSH** protocols with identical security scanning through a shared processor chain.

## High-Level Architecture

```mermaid
graph TB
subgraph "Client Side"
DEV[Developer]
GIT[Git Client]
end

subgraph "GitProxy"
subgraph "Protocol Handlers"
HTTP[HTTP/HTTPS Handler]
SSH[SSH Handler]
end

subgraph "Core Processing"
PACK[Pack Data Capture]
CHAIN[Security Processor Chain]
AUTH[Authorization Engine]
end

subgraph "Storage"
DB[(Database)]
CACHE[(Cache)]
end
end

subgraph "Remote Side"
GITHUB[GitHub/GitLab/etc]
end

DEV --> GIT
GIT --> HTTP
GIT --> SSH
HTTP --> PACK
SSH --> PACK
PACK --> CHAIN
CHAIN --> AUTH
AUTH --> GITHUB
CHAIN --> DB
AUTH --> CACHE
```

## Core Components

### 1. Protocol Handlers

#### HTTP/HTTPS Handler (`src/proxy/routes/index.ts`)

- **Purpose**: Handles HTTP/HTTPS Git operations
- **Entry Point**: Express middleware
- **Key Features**:
- Pack data extraction via `getRawBody` middleware
- Request validation and routing
- Error response formatting (Git protocol)
- Streaming support up to 1GB

#### SSH Handler (`src/proxy/ssh/server.ts`)

- **Purpose**: Handles SSH Git operations
- **Entry Point**: SSH2 server
- **Key Features**:
- SSH key-based authentication
- Stream-based pack data capture
- SSH user context preservation
- Error response formatting (stderr)

### 2. Security Processor Chain (`src/proxy/chain.ts`)

The heart of GitProxy's security model - a shared 17-processor chain used by both protocols:

```typescript
const pushActionChain = [
proc.push.parsePush, // Extract commit data from pack
proc.push.checkEmptyBranch, // Validate branch is not empty
proc.push.checkRepoInAuthorisedList, // Repository authorization
proc.push.checkCommitMessages, // Commit message validation
proc.push.checkAuthorEmails, // Author email validation
proc.push.checkUserPushPermission, // User push permissions
proc.push.pullRemote, // Clone remote repository
proc.push.writePack, // Write pack data locally
proc.push.checkHiddenCommits, // Hidden commit detection
proc.push.checkIfWaitingAuth, // Check authorization status
proc.push.preReceive, // Pre-receive hooks
proc.push.getDiff, // Generate diff
proc.push.gitleaks, // Secret scanning
proc.push.clearBareClone, // Cleanup
proc.push.scanDiff, // Diff analysis
proc.push.captureSSHKey, // SSH key capture
proc.push.blockForAuth, // Authorization workflow
];
```

### 3. Database Abstraction (`src/db/index.ts`)

Two implementations for different deployment scenarios:

#### NeDB (Development)

- **File-based**: Local JSON files
- **Use Case**: Development and testing
- **Performance**: Good for small to medium datasets

#### MongoDB (Production)

- **Document-based**: Full-featured database
- **Use Case**: Production deployments
- **Performance**: Scalable for large datasets

### 4. Configuration Management (`src/config/`)

Hierarchical configuration system:

1. **Schema Definition**: `config.schema.json`
2. **Generated Types**: `src/config/generated/config.ts`
3. **User Config**: `proxy.config.json`
4. **Configuration Loader**: `src/config/index.ts`

## Request Flow

### HTTP/HTTPS Flow

```mermaid
sequenceDiagram
participant Client
participant Express
participant Middleware
participant Chain
participant Remote

Client->>Express: POST /repo.git/git-receive-pack
Express->>Middleware: extractRawBody()
Middleware->>Middleware: Capture pack data (1GB limit)
Middleware->>Chain: Execute security chain
Chain->>Chain: Run 17 processors
Chain->>Remote: Forward if approved
Remote->>Client: Response
```

### SSH Flow

```mermaid
sequenceDiagram
participant Client
participant SSH Server
participant Stream Handler
participant Chain
participant Remote

Client->>SSH Server: git-receive-pack 'repo'
SSH Server->>Stream Handler: Capture pack data
Stream Handler->>Stream Handler: Buffer chunks (500MB limit)
Stream Handler->>Chain: Execute security chain
Chain->>Chain: Run 17 processors
Chain->>Remote: Forward if approved
Remote->>Client: Response
```

## Security Model

### Pack Data Processing

Both protocols follow the same pattern:

1. **Capture**: Extract pack data from request/stream
2. **Parse**: Extract commit information and ref updates
3. **Clone**: Create local repository copy
4. **Analyze**: Run security scans and validations
5. **Authorize**: Apply approval workflow
6. **Forward**: Send to remote if approved

### Security Scans

#### Gitleaks Integration

- **Purpose**: Detect secrets, API keys, passwords
- **Implementation**: External gitleaks binary
- **Scope**: Full pack data scanning
- **Performance**: Optimized for large repositories

#### Diff Analysis

- **Purpose**: Analyze code changes for security issues
- **Implementation**: Custom pattern matching
- **Scope**: Only changed files
- **Performance**: Fast incremental analysis

#### Hidden Commit Detection

- **Purpose**: Detect manipulated or hidden commits
- **Implementation**: Pack data integrity checks
- **Scope**: Full commit history validation
- **Performance**: Minimal overhead

### Authorization Workflow

#### Auto-Approval

- **Trigger**: All security checks pass
- **Process**: Automatic approval and forwarding
- **Logging**: Full audit trail maintained

#### Manual Approval

- **Trigger**: Security check failure or policy requirement
- **Process**: Human review via web interface
- **Logging**: Detailed approval/rejection reasons

## Plugin System

### Architecture (`src/plugin.ts`)

Extensible processor system for custom validation:

```typescript
class MyPlugin {
async exec(req: any, action: Action): Promise<Action> {
// Custom validation logic
return action;
}
}
```

### Plugin Types

- **Push Plugins**: Inserted after `parsePush` (position 1)
- **Pull Plugins**: Inserted at start (position 0)

### Plugin Lifecycle

1. **Loading**: Discovered from configuration
2. **Initialization**: Constructor called with config
3. **Execution**: `exec()` called for each request
4. **Cleanup**: Resources cleaned up on shutdown

## Error Handling

### Protocol-Specific Error Responses

#### HTTP/HTTPS

```typescript
res.set('content-type', 'application/x-git-receive-pack-result');
res.status(200).send(handleMessage(errorMessage));
```

#### SSH

```typescript
stream.stderr.write(`Error: ${errorMessage}\n`);
stream.exit(1);
stream.end();
```

### Error Categories

- **Validation Errors**: Invalid requests or data
- **Authorization Errors**: Access denied or insufficient permissions
- **Security Errors**: Policy violations or security issues
- **System Errors**: Internal errors or resource exhaustion

## Performance Characteristics

### Memory Management

#### HTTP/HTTPS

- **Streaming**: Native Express streaming
- **Memory**: PassThrough streams minimize buffering
- **Size Limit**: 1GB (configurable)

#### SSH

- **Streaming**: Custom buffer management
- **Memory**: In-memory buffering up to 500MB
- **Size Limit**: 500MB (configurable)

### Performance Optimizations

#### Caching

- **Repository Clones**: Temporary local clones
- **Configuration**: Cached configuration values
- **Authentication**: Cached user sessions

#### Concurrency

- **HTTP/HTTPS**: Express handles multiple requests
- **SSH**: One command per SSH session
- **Processing**: Async processor chain execution

## Monitoring and Observability

### Logging

- **Structured Logging**: JSON-formatted logs
- **Log Levels**: Debug, Info, Warn, Error
- **Context**: Request ID, user, repository tracking

### Metrics

- **Request Counts**: Total requests by protocol
- **Processing Time**: Chain execution duration
- **Error Rates**: Failed requests by category
- **Resource Usage**: Memory and CPU utilization

### Audit Trail

- **User Actions**: All user operations logged
- **Security Events**: Policy violations and approvals
- **System Events**: Configuration changes and errors

## Deployment Architecture

### Development

```
Developer → GitProxy (NeDB) → GitHub
```

### Production

```
Developer → Load Balancer → GitProxy (MongoDB) → GitHub
```

### High Availability

```
Developer → Load Balancer → Multiple GitProxy Instances → GitHub
```

## Security Considerations

### Data Protection

- **Encryption**: SSH keys encrypted at rest
- **Transit**: HTTPS/TLS for all communications
- **Secrets**: No secrets in logs or configuration

### Access Control

- **Authentication**: Multiple provider support
- **Authorization**: Granular permission system
- **Audit**: Complete operation logging

### Compliance

- **Regulatory**: Financial services compliance
- **Standards**: Industry security standards
- **Reporting**: Detailed compliance reports

## Future Enhancements

### Planned Features

- **Rate Limiting**: Per-user and per-repository limits
- **Streaming to Disk**: For very large pack files
- **Performance Monitoring**: Real-time metrics
- **Advanced Caching**: Repository and diff caching

### Scalability

- **Horizontal Scaling**: Multiple instance support
- **Database Sharding**: Large-scale data distribution
- **CDN Integration**: Global content distribution

---

**Architecture Status**: ✅ **Production Ready**
**Scalability**: ✅ **Horizontal Scaling Supported**
**Security**: ✅ **Enterprise Grade**
**Maintainability**: ✅ **Well Documented**
Loading
Loading