feat: Add parallel user processing with intelligent offset batching #2664
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🚀 Parallel User Processing with Intelligent Offset Batching
📋 Summary
This PR introduces automatic parallel processing for multiple usernames with an intelligent offset batching strategy that prevents rate limiting while delivering significant performance improvements.
Performance Improvement: ⚡ ~40% faster when searching multiple users!
❌ Problem Statement
Currently, when checking multiple usernames, Sherlock processes them sequentially (one after another):
This is inefficient because:
✅ Solution
Implement automatic parallel processing with intelligent offset batching to avoid rate limiting:
🎯 Key Innovation: Intelligent Offset Batching
The Challenge
If we naively run multiple users in parallel, they would all hit the same websites at the same time, risking rate limits:
The Solution: Offset Batching
Our implementation uses offset batching to ensure users check different sites at the same time:
How It Works Mathematically
Visual Representation
🎨 Features
1. Automatic Mode Detection
The feature works automatically without any configuration:
2. Clean, Separated Output
Results are buffered per user and displayed sequentially (no mixing):
✅ Complete output for User1, then complete output for User2
❌ No mixing or interleaving!
3. Configurable Batch Size
Advanced users can customize the parallel batch size:
📊 Performance Benchmarks
Test Environment
faizan842
,faizan841
Results
Detailed Timing
Scalability
🔧 Technical Implementation
Architecture
Key Functions
1.
process_username()
Offset Logic:
2.
process_users_in_parallel()
Batch Processing:
Output Buffering
To ensure clean, non-interleaved output:
Thread Safety
📝 New CLI Argument
🛡️ Rate Limiting Prevention
Multi-Layer Protection
Offset Batching (Primary Protection)
Conservative Batch Size
Per-User Worker Limit
Existing Timeout Mechanisms
Comparison
✅ Benefits
Performance
User Experience
Safety
Code Quality
🧪 Testing
Test Cases
✅ Test 1: Single User (Backward Compatibility)
$ python -m sherlock_project faizan842 --txt --timeout 30 Expected: Sequential processing (unchanged behavior) Result: ✅ PASS - Output format identical to original - File created: faizan842.txt - Time: ~89 seconds
✅ Test 2: Two Users (Automatic Parallel)
✅ Test 3: Custom Batch Size
$ python -m sherlock_project user1 user2 user3 user4 --parallel 4 Expected: All 4 users processed simultaneously Result: ✅ PASS - Batch size respected - Output clean for all 4 users
✅ Test 4: Output File Integrity
🔄 Backward Compatibility
✅ Single User: Zero Changes
✅ All Existing Flags Work
📦 Changes Summary
Files Modified
sherlock_project/sherlock.py
(+300 lines, -111 lines)New Imports
New Functions
Modified Functions
New CLI Arguments
🚀 Future Enhancements
Potential improvements for future PRs:
Progress Bar
Dynamic Batch Sizing
Result Caching
Advanced Rate Limiting
Statistics Dashboard
📋 Checklist
🎯 Conclusion
This PR delivers a significant performance improvement (~40% faster) while maintaining complete backward compatibility and adding intelligent rate limiting prevention through offset batching.
The implementation is production-ready, well-tested, and provides immediate value to users who need to check multiple usernames efficiently.
Ready for review and merge! 🎉
🙏 Acknowledgments
Thanks to the Sherlock project maintainers for creating such a robust foundation that made this enhancement possible!
📞 Questions?
Feel free to ask questions or request changes. Happy to iterate!