π Transform your DevOps operations with AI-powered expertise
π Documentation β’ π Quick Start β’ πΊοΈ Roadmap β’ π¬ Community β’ π€ Contributing
InfraGenius is a comprehensive AI-powered platform designed specifically for DevOps, SRE, Cloud, and Platform Engineering professionals. It provides industry-level expertise through advanced AI models, optimized for infrastructure operations, reliability engineering, and cloud architecture.
To democratize intelligent infrastructure management by providing developers worldwide with AI-driven insights, automation, and best practices - making reliable, scalable infrastructure accessible to everyone.
π See our detailed 6-Month Roadmap for upcoming features and community goals!
- π€ AI-Powered Analysis: Advanced DevOps/SRE expertise using open source models (gpt-oss:latest)
- π Local Development: Optimized for local development with Ollama - no cloud dependencies
- π― Cursor Integration: Works as MCP server with Cursor for seamless AI assistance
- β‘ High Performance: Sub-second response times with intelligent caching
- π Open Source: MIT licensed, community-driven development
- π Multiple Domains: DevOps, SRE, Cloud Architecture, Platform Engineering expertise
- π οΈ Developer Friendly: Comprehensive docs, examples, and development tools
graph TB
subgraph "Client Layer"
UI[Web UI]
API[REST API]
CLI[CLI Tool]
end
subgraph "API Gateway"
LB[Load Balancer]
AUTH[Authentication]
RATE[Rate Limiting]
end
subgraph "Application Layer"
MCP1[MCP Server 1]
MCP2[MCP Server 2]
MCP3[MCP Server N]
end
subgraph "AI/ML Layer"
OLLAMA[Ollama Service]
MODELS[Fine-tuned Models]
CACHE[Model Cache]
end
subgraph "Data Layer"
POSTGRES[(PostgreSQL)]
REDIS[(Redis Cache)]
S3[(Object Storage)]
end
subgraph "Infrastructure"
K8S[Kubernetes]
DOCKER[Docker]
CLOUD[Multi-Cloud]
end
subgraph "Monitoring"
PROM[Prometheus]
GRAF[Grafana]
JAEGER[Jaeger]
end
UI --> LB
API --> LB
CLI --> LB
LB --> AUTH
AUTH --> RATE
RATE --> MCP1
RATE --> MCP2
RATE --> MCP3
MCP1 --> OLLAMA
MCP2 --> OLLAMA
MCP3 --> OLLAMA
OLLAMA --> MODELS
MODELS --> CACHE
MCP1 --> POSTGRES
MCP1 --> REDIS
MCP2 --> POSTGRES
MCP2 --> REDIS
MCP3 --> POSTGRES
MCP3 --> REDIS
POSTGRES --> S3
K8S --> CLOUD
DOCKER --> K8S
MCP1 --> PROM
MCP2 --> PROM
MCP3 --> PROM
PROM --> GRAF
MCP1 --> JAEGER
%% Client Layer Styling
style UI fill:#e3f2fd,stroke:#2196f3,stroke-width:2px
style API fill:#e8f5e8,stroke:#4caf50,stroke-width:2px
style CLI fill:#fff3e0,stroke:#ff9800,stroke-width:2px
%% API Gateway Styling
style LB fill:#ffcdd2,stroke:#d32f2f,stroke-width:3px
style AUTH fill:#ffab91,stroke:#ff5722,stroke-width:2px
style RATE fill:#80cbc4,stroke:#00695c,stroke-width:2px
%% Application Layer Styling
style MCP1 fill:#90caf9,stroke:#1976d2,stroke-width:3px
style MCP2 fill:#a5d6a7,stroke:#388e3c,stroke-width:3px
style MCP3 fill:#ffcc80,stroke:#f57c00,stroke-width:3px
%% AI/ML Layer Styling
style OLLAMA fill:#ce93d8,stroke:#7b1fa2,stroke-width:3px
style MODELS fill:#f8bbd9,stroke:#c2185b,stroke-width:2px
style CACHE fill:#b39ddb,stroke:#512da8,stroke-width:2px
%% Data Layer Styling
style POSTGRES fill:#81c784,stroke:#2e7d32,stroke-width:3px
style REDIS fill:#ef5350,stroke:#c62828,stroke-width:3px
style S3 fill:#ffb74d,stroke:#ef6c00,stroke-width:3px
%% Infrastructure Styling
style K8S fill:#42a5f5,stroke:#1565c0,stroke-width:3px
style DOCKER fill:#29b6f6,stroke:#0277bd,stroke-width:2px
style CLOUD fill:#66bb6a,stroke:#2e7d32,stroke-width:2px
%% Monitoring Styling
style PROM fill:#ff7043,stroke:#d84315,stroke-width:2px
style GRAF fill:#ffa726,stroke:#ef6c00,stroke-width:2px
style JAEGER fill:#ab47bc,stroke:#6a1b9a,stroke-width:2px
InfraGenius/
βββ π environments/ # Environment-specific configurations
β βββ π test/ # Test environment configs
β βββ π staging/ # Staging environment configs
β βββ π production/ # Production environment configs
βββ π src/ # Source code
β βββ π core/ # Core application logic
β βββ π plugins/ # Extensible plugins
β βββ π ui/ # Web interface
βββ π docker/ # Docker configurations
β βββ π development/ # Development containers
β βββ π production/ # Production containers
βββ π kubernetes/ # K8s manifests
β βββ π test/ # Test cluster configs
β βββ π staging/ # Staging cluster configs
β βββ π production/ # Production cluster configs
βββ π docs/ # Documentation
β βββ π architecture/ # Architecture diagrams
β βββ π api/ # API documentation
β βββ π deployment/ # Deployment guides
βββ π tests/ # Test suites
β βββ π unit/ # Unit tests
β βββ π integration/ # Integration tests
β βββ π e2e/ # End-to-end tests
βββ π scripts/ # Automation scripts
β βββ π setup/ # Setup and installation
β βββ π deploy/ # Deployment automation
β βββ π utils/ # Utility scripts
βββ π monitoring/ # Monitoring configurations
β βββ π grafana/ # Grafana dashboards
β βββ π prometheus/ # Prometheus configs
βββ π security/ # Security configurations
βββ π backup/ # Backup and recovery
βββ π migrations/ # Database migrations
βββ π examples/ # Usage examples
βββ π tools/ # Development tools
βββ π README.md # This file
π― Focus: InfraGenius is currently optimized for local development with Ollama and open source models. This is perfect for learning, contributing, and building amazing DevOps/SRE solutions locally!
# Clone the repository
git clone https://github.com/your-username/infragenius.git
cd infragenius
# π One-click setup (installs everything automatically)
./scripts/quick-local-setup.sh
# π That's it! Server will start automatically
# π Health check: http://localhost:8000/health
# π API docs: http://localhost:8000/docs# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Windows
winget install ollama# Start Ollama service
ollama serve
# Download AI model (in new terminal)
ollama pull gpt-oss:latest
# Verify model is ready
ollama list# Create Python environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt
# Create configuration
cp mcp_server/config.json.example mcp_server/config.json
# Start InfraGenius
python mcp_server/server.py# Test API health
curl http://localhost:8000/health
# Test AI analysis
curl -X POST http://localhost:8000/analyze \
-H "Content-Type: application/json" \
-d '{
"prompt": "My Kubernetes pods are crashing with OOMKilled errors",
"domain": "devops",
"context": "Production cluster on AWS EKS"
}'InfraGenius works as an MCP (Model Context Protocol) server with Cursor, giving you a specialized DevOps/SRE AI assistant directly in your IDE!
# 1. Setup Cursor integration
make cursor-setup
# 2. Install MCP dependency
source venv/bin/activate
pip install mcp
# 3. Test the integration
python -m mcp_server.cursor_integrationAdd InfraGenius to your Cursor MCP configuration file at ~/.cursor/mcp.json:
{
"mcpServers": {
"infragenius": {
"command": "<file_path>/InfraGenius/venv/bin/python",
"args": [
"-m", "mcp_server.cursor_integration"
],
"cwd": "<file_path>/InfraGenius",
"env": {
"OLLAMA_BASE_URL": "http://localhost:11434",
"OLLAMA_MODEL": "gpt-oss:latest",
"PYTHONPATH": "<file_path>/InfraGenius"
}
}
}
}π Replace YOUR_USERNAME with your actual username!
π‘ Quick Copy: Use the template at examples/cursor-mcp-template.json and update the paths.
If you already have other MCP servers configured, just add the infragenius entry to your existing mcpServers object:
{
"mcpServers": {
"existing-server": {
"command": "some-other-mcp-server",
"args": ["..."]
},
"infragenius": {
"command": "<file_path>/InfraGenius/venv/bin/python",
"args": ["-m", "mcp_server.cursor_integration"],
"cwd": "<file_path>/InfraGenius",
"env": {
"OLLAMA_BASE_URL": "http://localhost:11434",
"OLLAMA_MODEL": "gpt-oss:latest"
}
}
}
}Once configured, use InfraGenius tools directly in Cursor:
// DevOps Issue Analysis
@infragenius analyze_devops_issue {
"prompt": "My Kubernetes pods are crashing with OOMKilled",
"context": "Production EKS cluster with 50+ microservices",
"urgency": "high"
}
// SRE Incident Response
@infragenius analyze_sre_incident {
"incident": "Database connection pool exhausted",
"severity": "critical",
"affected_services": "user-service, payment-service"
}
// Cloud Architecture Review
@infragenius review_cloud_architecture {
"architecture": "3-tier web app on AWS with RDS and ElastiCache",
"cloud_provider": "aws",
"focus_area": "cost"
}
// Generate Configurations
@infragenius generate_config {
"tool": "kubernetes",
"requirements": "Redis cluster with persistence and monitoring",
"environment": "production"
}
// Log Analysis
@infragenius explain_logs {
"logs": "ERROR: Connection timeout after 30s in database pool",
"log_type": "application"
}
// Platform Engineering Advice
@infragenius platform_engineering_advice {
"challenge": "Improve developer onboarding and reduce time-to-first-commit",
"team_size": "30 developers",
"tech_stack": "Node.js, React, Kubernetes, PostgreSQL"
}| Tool | Purpose | Best For |
|---|---|---|
π§ analyze_devops_issue |
DevOps problem solving | CI/CD issues, deployment problems |
π¨ analyze_sre_incident |
Incident response guidance | Outages, performance issues, alerts |
βοΈ review_cloud_architecture |
Architecture analysis | Cost optimization, security, scaling |
βοΈ generate_config |
Configuration generation | K8s manifests, Docker, Terraform |
π explain_logs |
Log analysis & debugging | Error investigation, troubleshooting |
ποΈ platform_engineering_advice |
Platform guidance | Developer experience, internal tools |
- Restart Cursor completely after updating mcp.json
- Check MCP Status in Cursor's settings/extensions
- Test Integration: Type
@infrageniusin any chat - Verify Tools: You should see tool suggestions appear
If @infragenius doesn't appear:
# Check if integration works
cd /path/to/InfraGenius
source venv/bin/activate
python -c "import mcp_server.cursor_integration; print('β
Integration OK')"
# Verify Ollama is running
curl http://localhost:11434/api/tags
# Check your paths in mcp.json are correctCommon Issues:
- β Wrong paths in mcp.json β Update to your actual paths
- β Virtual env not activated β Use full path to venv/bin/python
- β Ollama not running β Start with
ollama serve - β Model not available β Download with
ollama pull gpt-oss:latest
- Combine with other MCP servers - InfraGenius works alongside other AI models
- Use specific tools - Each tool is optimized for different scenarios
- Provide context - More context = better, more actionable responses
- Save configurations - Generated configs can be saved directly to files
# Development environment
docker-compose -f docker/development/docker-compose.yml up
# Production environment
docker-compose -f docker/production/docker-compose.yml up# Test environment
kubectl apply -f kubernetes/test/
# Staging environment
kubectl apply -f kubernetes/staging/
# Production environment
kubectl apply -f kubernetes/production/- Purpose: Development and testing
- Resources: Minimal (1 CPU, 2GB RAM)
- Data: Synthetic test data
- Monitoring: Basic metrics
./scripts/deploy/deploy-test.sh- Purpose: Pre-production validation
- Resources: Production-like (2 CPU, 4GB RAM)
- Data: Anonymized production data
- Monitoring: Full observability stack
./scripts/deploy/deploy-staging.sh- Purpose: Live customer traffic
- Resources: Auto-scaling (2-20 instances)
- Data: Live production data
- Monitoring: Enterprise monitoring + alerting
./scripts/deploy/deploy-production.sh# Example API usage
import requests
response = requests.post('http://localhost:8080/analyze', {
"prompt": "My Kubernetes pods are crashing with OOMKilled errors",
"domain": "devops",
"context": "Production cluster with 100+ microservices"
})
print(response.json())
# Returns detailed analysis with:
# - Root cause identification
# - Step-by-step resolution
# - Prevention strategies
# - Best practices| Domain | Features | Use Cases |
|---|---|---|
| DevOps | CI/CD, IaC, Automation | Pipeline optimization, deployment strategies |
| SRE | Reliability, Incidents, SLOs | Incident response, reliability improvements |
| Cloud | Architecture, Security, Cost | Cloud migration, cost optimization |
| Platform | Developer Experience, APIs | Platform design, developer productivity |
- Sub-second response times with intelligent caching
- Auto-scaling based on demand (2-100 instances)
- Multi-level caching (Redis + in-memory)
- Connection pooling for optimal resource usage
- Async processing for high throughput
- Response streaming for large analyses
- JWT authentication with refresh tokens
- Role-based access control (RBAC)
- Rate limiting by user tier
- API key management
- Audit logging for compliance
- Data encryption at rest and in transit
- Network security with VPC and firewalls
- Prometheus metrics collection
- Grafana dashboards and visualization
- Application performance monitoring
- Infrastructure monitoring
- Business metrics tracking
- Custom alerts and notifications
# System health
curl http://localhost:8080/health
# Detailed health check
curl http://localhost:8080/health/detailed
# Metrics endpoint
curl http://localhost:8080/metrics{
"runtime": {
"python": ">=3.11",
"docker": ">=20.10",
"kubernetes": ">=1.24"
},
"databases": {
"postgresql": ">=15",
"redis": ">=7.0"
},
"ai_models": {
"ollama": ">=0.1.0",
"gpt-oss": "latest"
},
"monitoring": {
"prometheus": ">=2.40",
"grafana": ">=9.0"
}
}| Component | Minimum | Recommended | Production |
|---|---|---|---|
| CPU | 2 cores | 4 cores | 8+ cores |
| Memory | 4GB | 8GB | 16+ GB |
| Storage | 20GB | 50GB | 100+ GB |
| Network | 1Mbps | 10Mbps | 100+ Mbps |
# Quick start for development
./scripts/setup/local-dev.sh
# With monitoring stack
./scripts/setup/local-dev.sh --monitoring
# With sample data
./scripts/setup/local-dev.sh --sample-data./scripts/deploy/aws-deploy.sh \
--region us-east-1 \
--environment production \
--instance-type t3.large./scripts/deploy/gcp-deploy.sh \
--region us-central1 \
--environment production \
--machine-type e2-standard-4./scripts/deploy/azure-deploy.sh \
--region eastus \
--environment production \
--vm-size Standard_D4s_v3./scripts/deploy/on-premises.sh \
--kubernetes-config ~/.kube/config \
--storage-class fast-ssd# Run all unit tests
pytest tests/unit/
# Run with coverage
pytest tests/unit/ --cov=src --cov-report=html# Run integration tests
pytest tests/integration/
# Test specific service
pytest tests/integration/test_ollama_integration.py# Run E2E tests
pytest tests/e2e/
# Run against specific environment
pytest tests/e2e/ --env=staging# Load testing
./scripts/utils/load-test.sh --concurrent=50 --requests=1000
# Stress testing
./scripts/utils/stress-test.sh --duration=300sWe welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
- Python: Follow PEP 8
- Testing: Minimum 80% coverage
- Documentation: Document all public APIs
- Commits: Use conventional commits
This project is licensed under the MIT License - see the LICENSE file for details.
- π Documentation
- π¬ GitHub Discussions
- π GitHub Issues
- π€ Contributing Guide
| Metric | Performance |
|---|---|
| Response Time | <2s |
| Throughput | 100+ req/s |
| Uptime | 99.5%+ |
| Support Response | Community-driven |
Choose your deployment option and get started in minutes:
# Quick local setup
curl -sSL https://get.# | bash
# Or manual setup
git clone https://github.com/your-org/infragenius.git
cd infragenius
./scripts/setup/quick-start.shTransform your DevOps operations with AI-powered expertise! π