Simple, fast, and intelligent news aggregation platform built with Python/Flask. Designed as a modern drudgereport.com clone that automatically aggregates and curates news from multiple categories, updated 24/7 with AI-powered headline generation.
This project is free and open source software released under the GNU Lesser General Public License v3.0 (LGPL v3).
DeepWiki provides excellent analysis of the codebase, including visual dependency graphs.
Category | URL | Focus |
---|---|---|
Linux | linuxreport.net | Linux news, open source, tech |
COVID | covidreport.org | Health, pandemic updates |
AI | aireport.keithcu.com | Artificial intelligence, ML |
Solar/PV | pvreport.org | Solar energy, renewable tech |
Techno | news.thedetroitilove.com | Detroit techno music |
Space | news.spaceelevatorwiki.com | Space exploration |
- π High Performance: Thread pools and Apache process pools for scalability
- π€ AI-Powered Headlines: Automatic headline curation using 30+ LLM models via OpenRouter.ai
- π― Multi-Platform: Support for multiple news categories in one codebase
- π Dark Mode: User-customizable themes and font sizes
- π± Mobile Responsive: Optimized for all devices
- β‘ Advanced Caching: Multi-layer caching system for optimal performance
- π CDN Support: s3cmd integration with long cache expiration headers for optimal image delivery
- π Secure: Rate limiting, admin authentication, input validation
- π οΈ Configurable: Easy RSS feed management and customization
The system uses sophisticated AI for headline generation through OpenRouter.ai, randomly selecting from over 30 free models including:
If a model fails, it automatically falls back to Mistral Small for reliability. See the model selection logic for implementation details.
# Clone the repository
git clone https://github.com/KeithCu/LinuxReport
cd LinuxReport
# Install dependencies
pip install -r requirements.txt
# Configure (see Configuration section below)
cp config.yaml.example config.yaml
# Edit config.yaml with your settings
# Run development server
python -m flask run
LinuxReport uses a sophisticated multi-layered architecture designed for performance and scalability:
- Backend: Python 3.x + Flask with extensions (Login, Limiter, Assets, Mobility)
- Database: SQLite via Diskcache for high-performance persistent storage
- Caching: Multi-layer system (disk, memory, file-based)
- Frontend: Responsive HTML/CSS/JS with automatic bundling and minification
- Scraping: BeautifulSoup4 + Selenium with Tor support for complex sites
- Images: Automatic WebP conversion and optimization
The system achieves high performance through:
- Thread Pools: Concurrent RSS feed processing
- Multi-layer Caching: Disk, memory, and file-based caching strategies
- CDN Integration: s3cmd synchronization with long cache expiration headers for static assets
- Asset Optimization: Automatic JavaScript bundling and CSS minification
- Smart Deduplication: Article deduplication across feeds and time periods
- Rate Limiting: Intelligent request throttling and IP blocking
- Edit config.yaml (copy from config.yaml.example if needed):
# IMPORTANT: Change default password for security!
admin:
password: "YOUR_SECURE_PASSWORD_HERE"
secret_key: "your-super-secret-key-change-this-in-production"
# Configure your domains
settings:
allowed_domains:
- "https://yourdomain.com"
- "https://www.yourdomain.com"
-
Configure Report Types: Edit
*_report_settings.py
files to customize RSS feeds and appearance for each report type. -
Production Deployment: Use the included
httpd-vhosts-sample.conf
for Apache configuration.
To add a new report category:
- Create
{type}_report_settings.py
with RSS feeds and configuration - Add HTML template
{type}reportabove.html
for custom headlines - Add logos and assets to
static/images/
- Configure automatic updates in systemd (optional)
LinuxReport/
βββ app.py # Flask application setup and configuration
βββ routes.py # Main routing and request handling
βββ shared.py # Shared utilities and constants
βββ models.py # Data models and configurations
βββ workers.py # Background feed processing
βββ auto_update.py # AI headline generation
βββ caching.py # Multi-layer caching system
βββ *_report_settings.py # Report-specific configurations
βββ templates/ # Jinja2 templates + modular JavaScript
βββ static/ # CSS, images, compiled assets
βββ tests/ # Test suite
βββ config.yaml # Configuration file
- Modular JavaScript: Source files in
templates/
auto-bundle tostatic/
- Hot Reload: Development mode with unminified assets for debugging
- Type Safety: Type hints throughout the codebase
- Comprehensive Caching: See
Caching.md
for detailed documentation - Test Suite: pytest-based testing in
tests/
directory
- agents.md: Comprehensive guide for AI agents and developers
- Caching.md: Detailed caching system documentation
- ROADMAP.md: Future development plans
- Scaling.md: Performance optimization strategies
Admin functionality is protected by authentication:
# config.yaml
admin:
password: "CHANGE_THIS_DEFAULT_PASSWORD"
- Rate Limiting: Configurable per-endpoint throttling
- Input Validation: Secure file uploads and form processing
- CORS Protection: Configurable domain allowlists
- Security Headers: XSS protection, content type validation
- IP Blocking: Persistent banned IP storage
Use the included httpd-vhosts-sample.conf
:
<VirtualHost *:443>
ServerName yourdomain.com
WSGIDaemonProcess linuxreport python-path=/path/to/LinuxReport
WSGIProcessGroup linuxreport
WSGIScriptAlias / /path/to/LinuxReport/wsgi.py
# SSL and other configurations...
</VirtualHost>
For automatic headline updates:
# Copy service files
sudo cp update-headlines.service /etc/systemd/system/
sudo cp update-headlines.timer /etc/systemd/system/
# Enable and start
sudo systemctl enable update-headlines.timer
sudo systemctl start update-headlines.timer
We welcome contributions! Please:
- Fork the repository
- Create a feature branch
- Run tests:
pytest tests/
- Submit a pull request
Feel free to request new RSS feeds or suggest improvements.
LinuxReport demonstrates that Python can be incredibly fast when properly architected. The system typically starts returning pages after less than 10 lines of Python code, dispelling myths about Python's performance.
Key performance metrics:
- Ultra-fast response times: Averaged 0.01 seconds over a 4-hour production period (on AMD EPYC, standard Python without PyPy)
- Zero-read performance: Multi-layer caching (page, sitebox) eliminates most database reads despite constant background feed updates
- Concurrent processing of 20+ RSS feeds
- Automatic scaling via Apache process pools
- Intelligent caching reduces redundant processing by 95%+
The architecture achieves this performance through smart cache layering that serves most requests from memory while background workers continuously update feeds, proving that well-designed caching can deliver bare-metal speeds without requiring specialized hardware or runtime optimizations.
Multi-Process Scalability: LinuxReport elegantly sidesteps Python's GIL limitations by using multiple Apache processes with intelligent cache invalidation. Each process maintains its own memory cache but uses fast SQLite queries to detect when feeds have changed (checking last_render_time
only when page cache expires). This eliminates the need for complex message queues, Redis, or inter-process communication while maintaining perfect cache consistency across all processes.
While FastAPI is a modern, high-performance framework with excellent async support, this project intentionally uses Flask for several reasons:
- Simplicity: Flask's synchronous model matches the project's needs perfectly
- Maturity: Battle-tested with vast ecosystem and community support
- Performance: Current thread pool + caching implementation achieves excellent performance
- Development Speed: Flask's simplicity enables rapid iteration and maintenance
FastAPI offers benefits like automatic API documentation and modern async support, but these are less relevant because:
- The site primarily serves HTML pages rather than JSON APIs
- Current synchronous code already performs excellently
- Existing thread pool implementation handles I/O efficiently
- The effort to migrate wouldn't justify the benefits for this use case
If considering a FastAPI migration, you would need to:
- Rewrite core application logic
- Modify Apache configuration
- Restructure the caching system
- Update all dependencies and extensions
This project is free and open source software released under the GNU Lesser General Public License v3.0 (LGPL v3). See the LICENSE file for complete details.
LinuxReport includes sophisticated CDN support for optimal performance:
- s3cmd Integration: Automated synchronization of static images to object storage
- Long Cache Headers: HTTP expiration headers set to instruct clients to cache images for extended periods
- Bandwidth Optimization: Significantly reduces server bandwidth usage and improves global load times
- Edge Delivery: Static assets served from CDN edge locations closest to users
The CDN configuration is easily managed through config.yaml
and automatically handles cache-busting when needed.