Skip to content

BUMETCS673/cs673f25a2project-cs673a2f25-team1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Asset Management Platform

An asset management platform with AI-powered OCR processing, real-time anomaly detection, and multi-dashboard analytics interface.

TLDR

https://asset-management-backend-prod-5d1c206f2263.herokuapp.com/dashboard

Login with test account:

How to use:

  • Browse mock portfolio data.
  • In the Statement Reader section of the dashboard, upload sample statements from /code/sample-statements.
  • See that the data is reflect in Property Fund of the Performance Trends section, as well as in the Parsed Rental Statements section of the dashbaord.
  • See that uploaded pdfs influence properties related metrics across the dashboard.
  • Log out and log in to see that uploaded data are preserved.

Features

  • Portfolio Management - Track and manage 5 investment portfolios including a dynamic Property Fund
  • Real-Time Anomaly Detection - PostgreSQL triggers automatically detect 8 types of anomalies on data changes
  • AI-Powered OCR - Groq API extracts data from rental statement PDFs with 90%+ accuracy
  • Multi-Dashboard Interface - Three specialized UIs for different use cases
  • Business Logic Layer - Clean architecture with separated concerns
  • JWT Authentication - Secure token-based authentication
  • Automated CI/CD - Heroku deployment with GitHub integration

📋 Architecture

┌─────────────────────────────────────────┐
│      Frontend Layer (React/Vue)         │
│  Dashboard | OCR Upload | Analytics     │
└──────────────────┬──────────────────────┘
                   │ HTTPS + JWT
                   ↓
┌─────────────────────────────────────────┐
│     API Layer (Flask - 6 Blueprints)    │
│  Auth | Portfolio | Dashboard | Anomaly │
│  Fee | OCR                              │
└──────────────────┬──────────────────────┘
                   │
                   ↓
┌─────────────────────────────────────────┐
│   Business Logic Layer (Services)       │
│  Statement | Anomaly | Dashboard | Auth │
└──────────────────┬──────────────────────┘
                   │
                   ↓
┌─────────────────────────────────────────┐
│    Data Layer (SQLAlchemy ORM)          │
│  User | Portfolio | Asset | Fee | ...   │
└──────────────────┬──────────────────────┘
                   │
                   ↓
┌─────────────────────────────────────────┐
│  Database (SQLite / PostgreSQL Prod)    │
└─────────────────────────────────────────┘

🔧 Technology Stack

Component Technology Version
Backend Flask 2.3.3+
Database PostgreSQL (prod) / SQLite (dev) Latest
ORM SQLAlchemy 3.0.5+
Authentication JWT + OAuth 2.0 flask-jwt-extended 4.5+
Frontend React + TypeScript + Material-UI
Alternative Frontend Vue.js financial-dashboard-ui
ML Scikit-learn Isolation Forest 1.4.0+
AI/OCR Groq AI API llama-3.3-70b
PDF Processing pdfplumber 0.10+
Deployment Gunicorn + Heroku Production-ready

📦 Prerequisites

  • Python 3.9+
  • Node.js 16+ (for frontend development)
  • PostgreSQL 12+ (production) or SQLite (development)
  • Groq API Key (for OCR functionality)
  • Google OAuth Credentials (optional, for social login)

🚀 Quick Start

Backend Setup

  1. Clone and navigate to backend:

    cd code/backend
  2. Create and activate virtual environment:

    python -m venv venv
    
    # Linux/Mac:
    source venv/bin/activate
    
    # Windows:
    venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Configure environment variables (create .env):

    # Database
    DATABASE_URL=sqlite:///app.db                    # Local SQLite
    # Or for PostgreSQL:
    # DATABASE_URL=postgresql://user:pass@localhost/dbname
    
    # Security
    SECRET_KEY=your-secret-key-here-change-in-production
    
    # OCR (Groq AI)
    GROQ_API_KEY=your-groq-api-key
    
    # Google OAuth (optional)
    GOOGLE_CLIENT_ID=your-google-client-id
    GOOGLE_CLIENT_SECRET=your-google-client-secret
    
    # CORS
    CORS_ORIGINS=http://localhost:3000,http://localhost:5173
    
    # Flask
    FLASK_ENV=development
  5. Initialize database (auto-creates tables):

    python create_parsed_statements_table.py
  6. Run Flask application:

    python app.py

    Backend runs on: http://localhost:5000

Frontend Setup (React)

  1. Navigate to frontend:

    cd code/frontend
  2. Install dependencies:

    npm install
  3. Start development server:

    npm start

    Frontend runs on: http://localhost:3000

Frontend Setup (Vue.js - Alternative)

  1. Navigate to Vue dashboard:

    cd code/financial-dashboard-ui
  2. Install and run:

    npm install
    npm run dev

🔐 Authentication

Traditional Login (Email/Password)

  1. Register: POST /api/auth/register

    {
      "email": "[email protected]",
      "password": "securepassword",
      "first_name": "John",
      "last_name": "Doe"
    }
  2. Login: POST /api/auth/login

    {
      "email": "[email protected]",
      "password": "securepassword"
    }
  3. Response: JWT token

    {
      "access_token": "eyJhbGc..."
    }

Google OAuth (Social Login)

  1. Click "Login with Google" on frontend
  2. Authorize in Google OAuth dialog
  3. Automatically creates user account on first login
  4. Returns JWT token for API access

Using JWT Tokens

Include token in API requests:

curl -H "Authorization: Bearer <token>" http://localhost:5000/api/auth/me

📚 API Documentation

Authentication Endpoints

Method Endpoint Description Auth Required
POST /api/auth/register Register new user No
POST /api/auth/login Login with email/password No
GET /api/auth/me Get current user info Yes
GET /api/auth/google Initiate Google OAuth No
GET /api/auth/google/callback OAuth callback handler No

Portfolio Endpoints

Method Endpoint Description Auth Required
GET /api/portfolios List all portfolios Yes
GET /api/portfolios/<portfolio_id>/details Portfolio details with fees Yes

Dashboard Endpoints

Method Endpoint Description Auth Required
GET /api/dashboard/overview Key metrics (AUM, portfolio count, performance) Yes
GET /api/dashboard/fees/stats Fee analytics and management fee percentages Yes
GET /api/dashboard/fees/stats?months=<num> Fee stats for specific months Yes
GET /api/dashboard/top-performers Top 5 portfolios by returns Yes
GET /api/dashboard/performance Performance trends with filtering Yes
GET /api/dashboard/status System health and data counts Yes

Anomaly Detection Endpoints

Method Endpoint Description Auth Required
GET /api/anomalies/<portfolio_id> Get anomalies for portfolio Yes
POST /api/detect-anomalies/<portfolio_id> Run ML anomaly detection Yes
GET /api/anomalies/recent Get 10 most recent anomalies Yes

Fee Endpoints

Method Endpoint Description Auth Required
GET /api/fees/monthly-summary Monthly fee summaries Yes

OCR & Statement Endpoints

Method Endpoint Description Auth Required
POST /api/ocr/parse Parse single PDF rental statement Yes
POST /api/ocr/parse-batch Batch parse multiple PDFs Yes
GET /api/ocr/statements Get all parsed statements Yes
GET /api/ocr/statements/<portfolio_id> Get statements for portfolio Yes
GET /api/ocr/health OCR system health check No
DELETE /api/ocr/statements/<statement_id> Delete statement Yes

Example API Requests

Detect Anomalies:

curl -X POST \
  -H "Authorization: Bearer <token>" \
  http://localhost:5000/api/detect-anomalies/1

Upload PDF Statement:

curl -X POST \
  -H "Authorization: Bearer <token>" \
  -F "file=@rental_statement.pdf" \
  -F "portfolio_id=1" \
  http://localhost:5000/api/ocr/parse

Get Dashboard Overview:

curl -H "Authorization: Bearer <token>" \
  http://localhost:5000/api/dashboard/overview

📊 Machine Learning

Anomaly Detection Engine

Algorithm: Ensemble Isolation Forest

  • 3 parallel models at different contamination levels (2%, 5%, 10%)
  • Feature engineering: Scaled amounts, rolling mean/std, rate of change
  • Deterministic output: Configurable random state for reproducibility
  • Anomaly scoring: 0.0 (normal) to 1.0 (anomalous)

Model retraining: Automatically retrained with new fee data

Model Location

Trained model cached at: code/backend/anomaly_model.pkl

🤖 OCR & Document Processing

How It Works

  1. PDF Upload → Temporary file created
  2. Text Extraction → pdfplumber extracts raw text
  3. PII Redaction → Emails and phone numbers removed
  4. AI Parsing → Groq AI (llama-3.3-70b) extracts structured data
  5. Database Storage → 10 essential fields stored
  6. Cleanup → Temporary PDF file deleted

PII Redaction

The system automatically redacts:

  • ✅ Email addresses → [EMAIL_REDACTED]
  • ✅ Phone numbers (US format) → [PHONE_REDACTED]

Financial data is preserved for analysis.

Supported Fields

{
  "portfolio_id": 1,
  "property_address": "123 Main St, Springfield, IL",
  "total_rent": 2500.00,
  "management_fee": 50.00,
  "confidence": 0.95,
  "statement_period": "2025-01",
  "original_filename": "rental_statement.pdf",
  "raw_data_json": {...}
}

🗄️ Database Schema

Main Tables

users

  • id, email, password (hashed), first_name, last_name
  • is_active, oauth_provider, oauth_id
  • created_at

portfolios

  • id, name, manager, total_assets, user_id, created_at

assets

  • id, portfolio_id, symbol, name, quantity, purchase_price, current_price, purchase_date

fees

  • id, portfolio_id, amount, date, fee_type, description

anomalies

  • id, portfolio_id, fee_id, anomaly_score, detected_at, reviewed, severity

parsed_statements (from OCR)

  • id, portfolio_id, property_address, total_rent, management_fee
  • confidence, statement_period, original_filename, raw_data_json
  • created_at, updated_at

asset_allocations, performance_history (and more)

🚢 Deployment

Heroku Deployment

  1. Install Heroku CLI:

    brew install heroku  # Mac
    # Or download from heroku.com
  2. Login to Heroku:

    heroku login
  3. Create Heroku app:

    heroku create your-app-name
  4. Set environment variables:

    heroku config:set SECRET_KEY=your-secret-key
    heroku config:set GROQ_API_KEY=your-groq-key
    heroku config:set GOOGLE_CLIENT_ID=your-google-id
    heroku config:set GOOGLE_CLIENT_SECRET=your-google-secret
  5. Provision PostgreSQL:

    heroku addons:create heroku-postgresql:standard-0
  6. Deploy:

    git push heroku main
  7. View logs:

    heroku logs --tail

Local PostgreSQL (Alternative)

# Install PostgreSQL
brew install postgresql

# Create database
createdb asset_management

# Set DATABASE_URL
export DATABASE_URL=postgresql://localhost/asset_management

# Run app
python app.py

🧪 Testing

Run All Tests

cd code/backend
python -m pytest tests/ -v

Run Specific Test Suite

# Authentication tests
pytest tests/test_api_routes.py::TestAuthRoutes -v

# OCR tests
pytest tests/test_api_routes.py::TestOCRRoutes -v

# Anomaly detection tests
pytest tests/test_business_logic_integration.py::TestAnomalyServiceIntegration -v

Test Coverage

pytest tests/ --cov=. --cov-report=term-missing

Current Coverage: ~55% (101 passed / 131 total tests)

🔒 Security Features

Authentication: JWT tokens + Google OAuth 2.0 ✅ Password Security: bcrypt hashing with salt ✅ Network Security: CORS, HTTPS (production), HSTS ✅ Data Protection: SQL injection prevention (parameterized queries) ✅ Privacy: PII redaction before AI processing ✅ Configuration: Secrets in environment variables (no hardcoded values) ✅ API Security: Content Security Policy (CSP) with Talisman

📈 Code Quality

  • Architecture: Layered (API → Business Logic → Data)
  • Type Hints: Present on all function signatures
  • Error Handling: Comprehensive exception handling with logging
  • Logging: Structured logging with context
  • Code Style: PEP 8 compliant, Black formatting
  • Linting: flake8/ruff compatible

📝 Project Structure

code/backend/
├── app.py                      # Flask app entry point
├── models.py                   # SQLAlchemy ORM models
├── db.py                       # Database configuration
├── config/
│   └── settings.py            # Configuration management
├── routes/                     # API Blueprints
│   ├── auth_routes.py
│   ├── portfolio_routes.py
│   ├── dashboard_routes.py
│   ├── anomaly_routes.py
│   ├── fee_routes.py
│   └── ocr_routes.py
├── business_logic/            # Service classes
│   ├── auth.py
│   ├── statement.py
│   ├── anomaly.py
│   ├── dashboard.py
│   └── portfolio.py
├── ml/
│   └── predict.py            # Isolation Forest models
├── ocr/                       # OCR v1 module
│   ├── statement_parser.py
│   ├── pdf_processor.py
│   └── groq_client.py
├── ocr2/                      # OCR v2 module (advanced)
├── frontend/                  # React dashboard
│   └── src/
│       ├── App.tsx
│       ├── Login.tsx
│       ├── OCRUpload.tsx
│       └── ...
├── tests/                     # Test suite
├── requirements.txt           # Python dependencies
└── package.json              # Frontend dependencies

🐛 Troubleshooting

Port Already in Use

# Kill process on port 5000
lsof -ti:5000 | xargs kill -9

# Or use different port
export FLASK_RUN_PORT=5001
python app.py

Database Connection Issues

# Check DATABASE_URL
echo $DATABASE_URL

# For SQLite, ensure directory exists
mkdir -p instance/

# For PostgreSQL, verify connection
psql $DATABASE_URL -c "SELECT 1"

JWT Token Expired

  • Tokens expire after 24 hours
  • Use refresh token endpoint (if implemented)
  • Or login again to get new token

Groq API Errors

  • Verify GROQ_API_KEY is set
  • Check rate limits on Groq dashboard
  • Ensure sufficient API credits

📞 Support

For issues or questions:

  1. Check the logs: heroku logs --tail (production)
  2. Review test output: pytest -v
  3. Check .env configuration
  4. Verify API endpoint accessibility

📄 License

See LICENSE.txt in project root

👥 Team

Developed by CS 673 Team 1 - Fall 2025


Last Updated: December 2025 Version: 1.0 Status: Production-Ready

About

A web app that ingests raw rental statement in PDF format into structured table for analysis and visualization.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 9