-
Notifications
You must be signed in to change notification settings - Fork 51
Description
🚨 NHN Cloud Resource Cleanup Issue
Executive Summary
Persistent resource cleanup failures in NHN Cloud kr1 region after successful MCI deletion. Security Group and VNet resources remain stuck despite multiple cleanup attempts over extended timeframes.
Deployment Context
Original Infrastructure Deployment
- Date: August 28, 2025 (18:09:46 ~ 18:12:41 KST)
- Target: 30 VMs across 6 different specifications in NHN Cloud kr1 region (Pangyo, South Korea)
- Strategy: Cost-optimized multi-spec cluster deployment
- Success Rate: 93.3% (28/30 VMs successfully deployed)
- Failed VMs: 2 VMs failed during initial deployment due to Block Storage creation issues
Infrastructure Configuration
Cluster Name: nhn-seoul-30vm-cluster
Namespace: default
Region: NHN Cloud kr1 (Pangyo, South Korea)
Network: default-shared-nhn-kr1
Security Group: default-shared-nhn-kr1
SSH Key: default-shared-nhn-kr1
Image: Ubuntu Server 22.04.5 LTS (a43455df-17a4-4faf-8429-fe15f74d0e38)
VM Specifications Deployed
Spec Type | vCPU | RAM | Count | Cost/Hour | Status |
---|---|---|---|---|---|
t2.c1m1 | 1 | 1GB | 5 | $0.01776 | ✅ Deployed |
t2.c1m2 | 1 | 2GB | 5 | $0.02664 | ✅ Deployed |
c2.c2m2 | 2 | 2GB | 5 | $0.03552 | ✅ Deployed |
m2.c2m4 | 2 | 4GB | 5 | $0.0474 | ✅ Deployed |
c2.c4m4 | 4 | 4GB | 4 | $0.07104 | ✅ Deployed |
r2.c2m8 | 2 | 8GB | 4 | $0.0947 | ✅ Deployed |
🗑️ Cleanup Timeline
Phase 1: MCI Deletion (Successful)
- Date: August 29, 2025
- Operation:
delete_mci("nhn-seoul-30vm-cluster", "default")
- Result: ✅ SUCCESSFUL
- Resources Deleted:
- 28 VMs completely terminated
- 28 SubGroups removed
- MCI metadata purged
- Verification:
get_mci_list("default")
returnsnull
Phase 2: Shared Resource Release (Failed)
Multiple attempts with persistent failures:
Attempt 1 (Initial)
release_resources("default", force_release=True)
Results:
- ✅ AWS resources: Successfully deleted
- ✅ NHN SSH Key: Successfully deleted
- ❌ NHN Security Group: FAILED
- ❌ NHN VNet: FAILED
Attempt 2 (Retry after 10 minutes)
- ❌ NHN Security Group: FAILED (same error)
- ❌ NHN VNet: FAILED (same error)
Attempt 3 (Retry after 20 minutes)
- ❌ NHN Security Group: FAILED (same error)
- ❌ NHN VNet: FAILED (same error)
Error Analysis
Security Group Deletion Failure
{
"resource": "default-shared-nhn-kr1",
"csp_resource_id": "5db5ceb2-6e00-404e-b338-ade7dee5777d",
"error_type": "Bad Request",
"http_status": 400,
"api_endpoint": "https://kr1-api-instance-infrastructure.nhncloudservice.com/v2/a77e25da7cc04a388716a7dc10dc9340/os-security-groups/5db5ceb2-6e00-404e-b338-ade7dee5777d",
"error_message": "Security Group 5db5ceb2-6e00-404e-b338-ade7dee5777d in use",
"neutron_request_id": "req-b5fc2868-c3e2-42ed-8a74-bd0eabcfe004",
"spider_error": "500 Internal Server Error"
}
VNet Deletion Failure
{
"resource": "default-shared-nhn-kr1",
"csp_resource_id": "e7e75e42-3d1b-4497-8d54-4e0ae7abac75",
"error_type": "Internet Gateway Removal Failure",
"error_message": "Failed to Get the Internet Gateway ID!!",
"spider_error": "500 Internal Server Error",
"affected_subnets": [
"91ea5475-48df-4139-b27b-523b0234b536",
"2adc8ffa-931d-4231-bc18-08307705c11c"
]
}
Current Resource State
CB-Tumblebug Metadata Analysis
{
"security_group": {
"id": "default-shared-nhn-kr1",
"csp_resource_id": "5db5ceb2-6e00-404e-b338-ade7dee5777d",
"associated_objects": [],
"status": "Active in CB-Tumblebug metadata"
},
"vnet": {
"id": "default-shared-nhn-kr1",
"csp_resource_id": "e7e75e42-3d1b-4497-8d54-4e0ae7abac75",
"status": "InUse",
"subnets": [
{
"id": "91ea5475-48df-4139-b27b-523b0234b536",
"cidr": "10.162.0.0/18",
"zone": "kr-pub-a",
"status": "Available"
},
{
"id": "2adc8ffa-931d-4231-bc18-08307705c11c",
"cidr": "10.162.64.0/18",
"zone": "kr-pub-b",
"status": "Available"
}
],
"associated_objects": []
}
}
Root Cause Analysis
1. NHN Cloud Backend Cleanup Lag
- Observation: VM deletion completed but network cleanup lagging
- Theory: NHN Cloud's backend systems require extended time for network interface cleanup
- Evidence: No associated objects in CB-Tumblebug metadata yet resources remain "in use"
2. Phantom Resource Dependencies
- Issue: Security Group reports "in use" despite no visible dependencies
- Potential Causes:
- Orphaned network interfaces in NHN Cloud backend
- Neutron database consistency issues
- Asynchronous cleanup process delays
3. Internet Gateway Reference Issues
- Problem: VNet deletion fails on Internet Gateway ID resolution
- Impact: Prevents complete VPC teardown
- Technical Detail: CB-Spider cannot locate Internet Gateway ID for cleanup
📊 Comparison with Other Providers
Provider | VM Deletion | Network Cleanup | Overall Success |
---|---|---|---|
AWS | ✅ Immediate | ✅ Immediate | ✅ 100% |
NHN Cloud | ✅ Immediate | ❌ Stuck |
🎯 Conclusion
The NHN Cloud resource cleanup issue represents a provider-specific backend limitation rather than a CB-Tumblebug system failure. The primary infrastructure deletion objective has been successfully achieved with 28 VMs completely removed and costs eliminated.
Recommended Action: Monitor for 24-48 hours and escalate to NHN Cloud support if resources remain stuck, while proceeding with confidence that the main cost optimization goals have been met.
Report Generated: August 29, 2025
Incident Tracking: NHN-CLEANUP-20250829
Severity: Low (Cost impact minimal, core objectives achieved)