Add GPU/Device Support and Fix Symlink Deduplication Issues #832
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes-102, Fixes-306, Fixes-401, Fixes-579
What
This MR introduces four patches that add GPU support to docker-slim and fix critical symlink processing bugs:
Implement docker device request in CLI - Adds
--cro-device-requestand--cro-runtimeCLI flags to enable GPU access during container profiling. Allows passing device requests as JSON (e.g.,--cro-device-request '{"Count":-1, "Capabilities":[["gpu"]]}' --cro-runtime nvidia).Fix issues with duplicate symlink processing - Resolves a bug where files accessed through multiple symlinked paths (e.g.,
/usr/local/cuda/vs/usr/local/cuda-12.9/) would be copied multiple times, with later copies randomly overwriting with 0-byte content. Implements inode-based deduplication to ensure only one canonical copy is kept.Example use case with nvidia runtime - Adds documentation and example scripts demonstrating GPU workloads:
test_nvidia_smi.sh- Slims ubuntu to run nvidia-smitest_nvidia_pytorch.sh- Slims nvidia-pytorch to run CUDA testsNon-trivial example of slimming vllm - Adds a comprehensive example for slimming VLLM (LLM inference) containers with full API test suite validation.
Why
GPU Support (docker-slim removing cuda which is required for GPU computation #102, please help me how to slim gpu docker #401): Many modern workloads require GPU access, especially in ML/AI contexts. Without
--cro-device-requestand--cro-runtime, it was impossible to profile containers that required GPU access during execution, making docker-slim unusable for CUDA-based images.Symlink Bug (Incomplete Image Nvidia/PyTorch #306, Non usable python-gunicorn docker slimed image #579): The symlink deduplication bug caused slimmed images to have corrupted (0-byte) library files, particularly in NVIDIA CUDA containers where
/usr/local/cudasymlinks to versioned directories like/usr/local/cuda-12.9. This made slimmed images non-functional, as critical.sofiles would be empty.How Tested
Unit Tests: Added comprehensive test coverage for:
clifvgetter_test.go,config_test.go,device_request_test.go)dedup_test.go)ptrace_test.go)fsutil_test.go)Integration Tests:
test_nvidia_smi.sh- Verifies slimmed ubuntu image can run nvidia-smi with identical output to originaltest_nvidia_pytorch.sh- Verifies slimmed pytorch image passes CUDA teststest_nvidia_vllm.sh- Comprehensive VLLM API test suite comparing original vs slimmed image behavior (15 endpoint tests)Manual Testing: Successfully slimmed NVIDIA NIM LLM containers with GPU workloads and verified functional parity.