|
| 1 | +# User Story: Add HuggingFace Token Support for Gated/Private Models |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +Add HuggingFace token support to the SGLang worker to enable access to gated and private models from the HuggingFace Hub. This includes adding a secure password input field in the RunPod Hub configuration and ensuring the token is properly passed to the SGLang engine. |
| 6 | + |
| 7 | +## Current State |
| 8 | + |
| 9 | +- **hub.json**: Contains comprehensive model configuration options but lacks HuggingFace token support |
| 10 | +- **engine.py**: SGLang engine initialization without HF_TOKEN environment variable |
| 11 | +- **Dockerfile**: Container setup without HuggingFace token environment variable handling |
| 12 | + |
| 13 | +## Goal |
| 14 | + |
| 15 | +Enable the SGLang worker to access gated and private models from HuggingFace Hub by implementing secure token handling throughout the worker infrastructure. |
| 16 | + |
| 17 | +## Acceptance Criteria |
| 18 | + |
| 19 | +### RunPod Hub Configuration |
| 20 | + |
| 21 | +- [ ] Add `HF_TOKEN` environment variable configuration to `hub.json`: |
| 22 | + - Use `password` type for secure input handling |
| 23 | + - Set as non-required (optional) field |
| 24 | + - Include descriptive text about gated/private model access |
| 25 | + - Position prominently alongside MODEL_PATH (not in advanced section) |
| 26 | + |
| 27 | +### Environment Variable Integration |
| 28 | + |
| 29 | +- [ ] Ensure `HF_TOKEN` environment variable is properly exported in Docker container |
| 30 | +- [ ] Verify SGLang engine can access `HF_TOKEN` from environment |
| 31 | +- [ ] Confirm token is passed to HuggingFace transformers library for model downloads |
| 32 | + |
| 33 | +### Security & Best Practices |
| 34 | + |
| 35 | +- [ ] Use `password` input type to mask token in RunPod Hub UI |
| 36 | +- [ ] Ensure token is not logged or exposed in error messages |
| 37 | +- [ ] Follow HuggingFace token security best practices |
| 38 | +- [ ] Maintain compatibility with existing non-gated model workflows |
| 39 | + |
| 40 | +### Validation |
| 41 | + |
| 42 | +- [ ] Test worker with gated model (e.g., Llama models requiring approval) |
| 43 | +- [ ] Test worker with private model (requires HF token) |
| 44 | +- [ ] Verify worker continues to work with public models (without token) |
| 45 | +- [ ] Confirm token is properly masked in RunPod Hub interface |
| 46 | +- [ ] Validate Docker build succeeds with new configuration |
| 47 | + |
| 48 | +### Documentation |
| 49 | + |
| 50 | +- [ ] Update any relevant documentation about using gated/private models |
| 51 | +- [ ] Ensure token field has clear description for users |
| 52 | + |
| 53 | +## Implementation Details |
| 54 | + |
| 55 | +### hub.json Changes |
| 56 | + |
| 57 | +Add new environment variable configuration: |
| 58 | + |
| 59 | +```json |
| 60 | +{ |
| 61 | + "key": "HF_TOKEN", |
| 62 | + "input": { |
| 63 | + "name": "HuggingFace Token", |
| 64 | + "type": "password", |
| 65 | + "description": "HuggingFace access token for gated and private models", |
| 66 | + "default": "", |
| 67 | + "required": false |
| 68 | + } |
| 69 | +} |
| 70 | +``` |
| 71 | + |
| 72 | +### Engine Integration |
| 73 | + |
| 74 | +- Verify `HF_TOKEN` environment variable is available to SGLang engine |
| 75 | +- Ensure HuggingFace transformers library automatically uses the token when set |
| 76 | +- No explicit token passing required if environment variable is properly set |
| 77 | + |
| 78 | +### Docker Environment |
| 79 | + |
| 80 | +- Confirm Docker container inherits `HF_TOKEN` environment variable from RunPod |
| 81 | +- Test that SGLang can access HuggingFace Hub with the provided token |
| 82 | + |
| 83 | +## Files to Modify |
| 84 | + |
| 85 | +- `.runpod/hub.json` - Add HF_TOKEN configuration |
| 86 | +- Test with `engine.py` - Verify token usage (may not require code changes) |
| 87 | +- Validate `Dockerfile` - Ensure environment variable handling (may not require changes) |
| 88 | + |
| 89 | +## Success Metrics |
| 90 | + |
| 91 | +- User can input HuggingFace token through RunPod Hub interface |
| 92 | +- Token input is masked (password type) for security |
| 93 | +- Worker successfully downloads and runs gated models when token is provided |
| 94 | +- Worker continues to work with public models when no token is provided |
| 95 | +- No token information is exposed in logs or error messages |
| 96 | + |
| 97 | +## Testing Scenarios |
| 98 | + |
| 99 | +1. **Public Model**: Test without HF_TOKEN (existing functionality) |
| 100 | +2. **Gated Model**: Test with valid HF_TOKEN for gated model |
| 101 | +3. **Private Model**: Test with valid HF_TOKEN for private model |
| 102 | +4. **Invalid Token**: Test with invalid HF_TOKEN (should fail gracefully) |
| 103 | +5. **UI Security**: Verify token is masked in RunPod Hub interface |
0 commit comments