An asset management platform with AI-powered OCR processing, real-time anomaly detection, and multi-dashboard analytics interface.
https://asset-management-backend-prod-5d1c206f2263.herokuapp.com/dashboard
Login with test account:
- username: [email protected]
- password: AdminPass123
How to use:
- Browse mock portfolio data.
- In the Statement Reader section of the dashboard, upload sample statements from /code/sample-statements.
- See that the data is reflect in Property Fund of the Performance Trends section, as well as in the Parsed Rental Statements section of the dashbaord.
- See that uploaded pdfs influence properties related metrics across the dashboard.
- Log out and log in to see that uploaded data are preserved.
- Portfolio Management - Track and manage 5 investment portfolios including a dynamic Property Fund
- Real-Time Anomaly Detection - PostgreSQL triggers automatically detect 8 types of anomalies on data changes
- AI-Powered OCR - Groq API extracts data from rental statement PDFs with 90%+ accuracy
- Multi-Dashboard Interface - Three specialized UIs for different use cases
- Business Logic Layer - Clean architecture with separated concerns
- JWT Authentication - Secure token-based authentication
- Automated CI/CD - Heroku deployment with GitHub integration
┌─────────────────────────────────────────┐
│ Frontend Layer (React/Vue) │
│ Dashboard | OCR Upload | Analytics │
└──────────────────┬──────────────────────┘
│ HTTPS + JWT
↓
┌─────────────────────────────────────────┐
│ API Layer (Flask - 6 Blueprints) │
│ Auth | Portfolio | Dashboard | Anomaly │
│ Fee | OCR │
└──────────────────┬──────────────────────┘
│
↓
┌─────────────────────────────────────────┐
│ Business Logic Layer (Services) │
│ Statement | Anomaly | Dashboard | Auth │
└──────────────────┬──────────────────────┘
│
↓
┌─────────────────────────────────────────┐
│ Data Layer (SQLAlchemy ORM) │
│ User | Portfolio | Asset | Fee | ... │
└──────────────────┬──────────────────────┘
│
↓
┌─────────────────────────────────────────┐
│ Database (SQLite / PostgreSQL Prod) │
└─────────────────────────────────────────┘
| Component | Technology | Version |
|---|---|---|
| Backend | Flask | 2.3.3+ |
| Database | PostgreSQL (prod) / SQLite (dev) | Latest |
| ORM | SQLAlchemy | 3.0.5+ |
| Authentication | JWT + OAuth 2.0 | flask-jwt-extended 4.5+ |
| Frontend | React + TypeScript | + Material-UI |
| Alternative Frontend | Vue.js | financial-dashboard-ui |
| ML | Scikit-learn Isolation Forest | 1.4.0+ |
| AI/OCR | Groq AI API | llama-3.3-70b |
| PDF Processing | pdfplumber | 0.10+ |
| Deployment | Gunicorn + Heroku | Production-ready |
- Python 3.9+
- Node.js 16+ (for frontend development)
- PostgreSQL 12+ (production) or SQLite (development)
- Groq API Key (for OCR functionality)
- Google OAuth Credentials (optional, for social login)
-
Clone and navigate to backend:
cd code/backend -
Create and activate virtual environment:
python -m venv venv # Linux/Mac: source venv/bin/activate # Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Configure environment variables (create
.env):# Database DATABASE_URL=sqlite:///app.db # Local SQLite # Or for PostgreSQL: # DATABASE_URL=postgresql://user:pass@localhost/dbname # Security SECRET_KEY=your-secret-key-here-change-in-production # OCR (Groq AI) GROQ_API_KEY=your-groq-api-key # Google OAuth (optional) GOOGLE_CLIENT_ID=your-google-client-id GOOGLE_CLIENT_SECRET=your-google-client-secret # CORS CORS_ORIGINS=http://localhost:3000,http://localhost:5173 # Flask FLASK_ENV=development
-
Initialize database (auto-creates tables):
python create_parsed_statements_table.py
-
Run Flask application:
python app.py
Backend runs on: http://localhost:5000
-
Navigate to frontend:
cd code/frontend -
Install dependencies:
npm install
-
Start development server:
npm start
Frontend runs on: http://localhost:3000
-
Navigate to Vue dashboard:
cd code/financial-dashboard-ui -
Install and run:
npm install npm run dev
-
Register:
POST /api/auth/register{ "email": "[email protected]", "password": "securepassword", "first_name": "John", "last_name": "Doe" } -
Login:
POST /api/auth/login{ "email": "[email protected]", "password": "securepassword" } -
Response: JWT token
{ "access_token": "eyJhbGc..." }
- Click "Login with Google" on frontend
- Authorize in Google OAuth dialog
- Automatically creates user account on first login
- Returns JWT token for API access
Include token in API requests:
curl -H "Authorization: Bearer <token>" http://localhost:5000/api/auth/me| Method | Endpoint | Description | Auth Required |
|---|---|---|---|
| POST | /api/auth/register |
Register new user | No |
| POST | /api/auth/login |
Login with email/password | No |
| GET | /api/auth/me |
Get current user info | Yes |
| GET | /api/auth/google |
Initiate Google OAuth | No |
| GET | /api/auth/google/callback |
OAuth callback handler | No |
| Method | Endpoint | Description | Auth Required |
|---|---|---|---|
| GET | /api/portfolios |
List all portfolios | Yes |
| GET | /api/portfolios/<portfolio_id>/details |
Portfolio details with fees | Yes |
| Method | Endpoint | Description | Auth Required |
|---|---|---|---|
| GET | /api/dashboard/overview |
Key metrics (AUM, portfolio count, performance) | Yes |
| GET | /api/dashboard/fees/stats |
Fee analytics and management fee percentages | Yes |
| GET | /api/dashboard/fees/stats?months=<num> |
Fee stats for specific months | Yes |
| GET | /api/dashboard/top-performers |
Top 5 portfolios by returns | Yes |
| GET | /api/dashboard/performance |
Performance trends with filtering | Yes |
| GET | /api/dashboard/status |
System health and data counts | Yes |
| Method | Endpoint | Description | Auth Required |
|---|---|---|---|
| GET | /api/anomalies/<portfolio_id> |
Get anomalies for portfolio | Yes |
| POST | /api/detect-anomalies/<portfolio_id> |
Run ML anomaly detection | Yes |
| GET | /api/anomalies/recent |
Get 10 most recent anomalies | Yes |
| Method | Endpoint | Description | Auth Required |
|---|---|---|---|
| GET | /api/fees/monthly-summary |
Monthly fee summaries | Yes |
| Method | Endpoint | Description | Auth Required |
|---|---|---|---|
| POST | /api/ocr/parse |
Parse single PDF rental statement | Yes |
| POST | /api/ocr/parse-batch |
Batch parse multiple PDFs | Yes |
| GET | /api/ocr/statements |
Get all parsed statements | Yes |
| GET | /api/ocr/statements/<portfolio_id> |
Get statements for portfolio | Yes |
| GET | /api/ocr/health |
OCR system health check | No |
| DELETE | /api/ocr/statements/<statement_id> |
Delete statement | Yes |
Detect Anomalies:
curl -X POST \
-H "Authorization: Bearer <token>" \
http://localhost:5000/api/detect-anomalies/1Upload PDF Statement:
curl -X POST \
-H "Authorization: Bearer <token>" \
-F "file=@rental_statement.pdf" \
-F "portfolio_id=1" \
http://localhost:5000/api/ocr/parseGet Dashboard Overview:
curl -H "Authorization: Bearer <token>" \
http://localhost:5000/api/dashboard/overviewAlgorithm: Ensemble Isolation Forest
- 3 parallel models at different contamination levels (2%, 5%, 10%)
- Feature engineering: Scaled amounts, rolling mean/std, rate of change
- Deterministic output: Configurable random state for reproducibility
- Anomaly scoring: 0.0 (normal) to 1.0 (anomalous)
Model retraining: Automatically retrained with new fee data
Trained model cached at: code/backend/anomaly_model.pkl
- PDF Upload → Temporary file created
- Text Extraction → pdfplumber extracts raw text
- PII Redaction → Emails and phone numbers removed
- AI Parsing → Groq AI (llama-3.3-70b) extracts structured data
- Database Storage → 10 essential fields stored
- Cleanup → Temporary PDF file deleted
The system automatically redacts:
- ✅ Email addresses →
[EMAIL_REDACTED] - ✅ Phone numbers (US format) →
[PHONE_REDACTED]
Financial data is preserved for analysis.
{
"portfolio_id": 1,
"property_address": "123 Main St, Springfield, IL",
"total_rent": 2500.00,
"management_fee": 50.00,
"confidence": 0.95,
"statement_period": "2025-01",
"original_filename": "rental_statement.pdf",
"raw_data_json": {...}
}users
- id, email, password (hashed), first_name, last_name
- is_active, oauth_provider, oauth_id
- created_at
portfolios
- id, name, manager, total_assets, user_id, created_at
assets
- id, portfolio_id, symbol, name, quantity, purchase_price, current_price, purchase_date
fees
- id, portfolio_id, amount, date, fee_type, description
anomalies
- id, portfolio_id, fee_id, anomaly_score, detected_at, reviewed, severity
parsed_statements (from OCR)
- id, portfolio_id, property_address, total_rent, management_fee
- confidence, statement_period, original_filename, raw_data_json
- created_at, updated_at
asset_allocations, performance_history (and more)
-
Install Heroku CLI:
brew install heroku # Mac # Or download from heroku.com
-
Login to Heroku:
heroku login
-
Create Heroku app:
heroku create your-app-name
-
Set environment variables:
heroku config:set SECRET_KEY=your-secret-key heroku config:set GROQ_API_KEY=your-groq-key heroku config:set GOOGLE_CLIENT_ID=your-google-id heroku config:set GOOGLE_CLIENT_SECRET=your-google-secret
-
Provision PostgreSQL:
heroku addons:create heroku-postgresql:standard-0
-
Deploy:
git push heroku main
-
View logs:
heroku logs --tail
# Install PostgreSQL
brew install postgresql
# Create database
createdb asset_management
# Set DATABASE_URL
export DATABASE_URL=postgresql://localhost/asset_management
# Run app
python app.pycd code/backend
python -m pytest tests/ -v# Authentication tests
pytest tests/test_api_routes.py::TestAuthRoutes -v
# OCR tests
pytest tests/test_api_routes.py::TestOCRRoutes -v
# Anomaly detection tests
pytest tests/test_business_logic_integration.py::TestAnomalyServiceIntegration -vpytest tests/ --cov=. --cov-report=term-missingCurrent Coverage: ~55% (101 passed / 131 total tests)
✅ Authentication: JWT tokens + Google OAuth 2.0 ✅ Password Security: bcrypt hashing with salt ✅ Network Security: CORS, HTTPS (production), HSTS ✅ Data Protection: SQL injection prevention (parameterized queries) ✅ Privacy: PII redaction before AI processing ✅ Configuration: Secrets in environment variables (no hardcoded values) ✅ API Security: Content Security Policy (CSP) with Talisman
- Architecture: Layered (API → Business Logic → Data)
- Type Hints: Present on all function signatures
- Error Handling: Comprehensive exception handling with logging
- Logging: Structured logging with context
- Code Style: PEP 8 compliant, Black formatting
- Linting: flake8/ruff compatible
code/backend/
├── app.py # Flask app entry point
├── models.py # SQLAlchemy ORM models
├── db.py # Database configuration
├── config/
│ └── settings.py # Configuration management
├── routes/ # API Blueprints
│ ├── auth_routes.py
│ ├── portfolio_routes.py
│ ├── dashboard_routes.py
│ ├── anomaly_routes.py
│ ├── fee_routes.py
│ └── ocr_routes.py
├── business_logic/ # Service classes
│ ├── auth.py
│ ├── statement.py
│ ├── anomaly.py
│ ├── dashboard.py
│ └── portfolio.py
├── ml/
│ └── predict.py # Isolation Forest models
├── ocr/ # OCR v1 module
│ ├── statement_parser.py
│ ├── pdf_processor.py
│ └── groq_client.py
├── ocr2/ # OCR v2 module (advanced)
├── frontend/ # React dashboard
│ └── src/
│ ├── App.tsx
│ ├── Login.tsx
│ ├── OCRUpload.tsx
│ └── ...
├── tests/ # Test suite
├── requirements.txt # Python dependencies
└── package.json # Frontend dependencies
# Kill process on port 5000
lsof -ti:5000 | xargs kill -9
# Or use different port
export FLASK_RUN_PORT=5001
python app.py# Check DATABASE_URL
echo $DATABASE_URL
# For SQLite, ensure directory exists
mkdir -p instance/
# For PostgreSQL, verify connection
psql $DATABASE_URL -c "SELECT 1"- Tokens expire after 24 hours
- Use refresh token endpoint (if implemented)
- Or login again to get new token
- Verify
GROQ_API_KEYis set - Check rate limits on Groq dashboard
- Ensure sufficient API credits
For issues or questions:
- Check the logs:
heroku logs --tail(production) - Review test output:
pytest -v - Check
.envconfiguration - Verify API endpoint accessibility
See LICENSE.txt in project root
Developed by CS 673 Team 1 - Fall 2025
Last Updated: December 2025 Version: 1.0 Status: Production-Ready