Policy Analysis Platform

Prerequisites: This application requires Ollama to be installed and running for AI-powered analysis features.

An advanced document comparison tool that leverages semantic analysis and AI to help identify changes between policy documents, government memos, and other official documents.

Features

Document Upload & Processing: Support for PDF, TXT, and HTML documents
Semantic Document Comparison: Advanced algorithms to match and compare document sections
AI-Powered Analysis: LLM integration for intelligent change detection and summarization
Structured Data Extraction: Automatically extract definitions, requirements, actions, and deadlines
Visual Diff Generation: HTML-based diff views for easy change identification
Change Impact Classification: Categorize changes by impact level and type
Fallback Mechanisms: Robust error handling with simplified analysis when AI services are unavailable

Interface Overview

Home / Upload Page

Diff View

Technology Stack

Backend: Flask (Python web framework)
Database: SQLAlchemy with SQLite
Document Processing: PyPDF2, pdfplumber, BeautifulSoup4
AI/ML: Ollama for local LLM analysis
Semantic Matching: Sentence transformers and scikit-learn for document similarity
Frontend: Bootstrap 5 with vanilla JavaScript

Installation

Prerequisites

Python 3.11+
Ollama installed and running

Setup

Install dependencies
```
pip install -r requirements.txt
```

Run the application

python main.py

Or using Gunicorn for production:

gunicorn --bind 0.0.0.0:5000 --reuse-port --reload main:app

Usage

Uploading Documents

Navigate to the home page
Click "Choose File" and select a PDF, TXT, or HTML document
Enter a descriptive title for the document
Click "Upload Document"

Comparing Documents

Upload at least two documents
Navigate to the "Compare Documents" page
Select two documents from the dropdown menus
Click "Compare Documents"
Review the detailed comparison results, including:
- Section-by-section changes
- Added/removed content
- Modified sections with detailed analysis
- Overall summary of changes

Understanding Results

The comparison results include:

Matched Sections: Sections that exist in both documents with similarity analysis
Added Sections: Content that appears only in the newer document
Removed Sections: Content that was present in the original but removed
Change Statistics: Quantitative analysis of document changes
Impact Classification: AI-powered categorization of change significance

Configuration

Database Configuration

The application uses SQLite and automatically creates instance/diffpolicy.db

File Upload Limits

Maximum file size: 16MB
Supported formats: PDF, TXT, HTML, HTM
Files are stored in the uploads/ directory

AI Integration

The platform uses Ollama for AI-powered document analysis and includes fallback mechanisms when services are unavailable.

API Endpoints

GET / - Main upload page
POST /upload - Document upload handler
GET /compare - Document comparison form
POST /compare - Process document comparison
GET /document/<id> - View individual document
POST /analyze_section - API endpoint for section analysis

Project Structure

├── app.py                    # Flask application factory
├── main.py                   # Application entry point
├── models.py                 # Database models
├── routes.py                 # URL routes and handlers
├── document_processor.py     # Document parsing and extraction
├── semantic_matcher.py       # Section matching algorithms
├── diff_generator.py         # Comparison result generation
├── llm_analyzer.py          # LLM integration for analysis
├── simple_analyzer.py       # Fallback analysis without LLM
├── structured_parser.py     # Structured data extraction
├── templates/               # HTML templates
├── static/                  # CSS and JavaScript files
├── uploads/                 # Uploaded document storage
└── instance/                # Database and instance files

Development

Running in Development Mode

export FLASK_ENV=development
python main.py

Adding New Document Types

To support additional document formats:

Update ALLOWED_EXTENSIONS in routes.py
Add processing logic in document_processor.py
Update the upload form validation in templates

Extending AI Analysis

To add new analysis features:

Extend the LLMAnalyzer class in llm_analyzer.py
Update the fallback logic in simple_analyzer.py
Modify the comparison results structure in diff_generator.py

Troubleshooting

Common Issues

Import Errors: Ensure all dependencies are installed via pip install -r requirements.txt
Database Errors: Check database permissions and connection string
File Upload Errors: Verify the uploads/ directory exists and is writable
Memory Issues: Large documents may require increased memory allocation

Performance Optimization

For large documents, consider implementing pagination
Use database indexing for frequently queried fields
Cache semantic embeddings for repeated comparisons
Consider using a message queue for long-running analysis tasks

Contributing

Fork the repository
Create a feature branch
Make your changes with appropriate tests
Submit a pull request with a clear description

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Policy Analysis Platform

Features

Interface Overview

Home / Upload Page

Diff View

Technology Stack

Installation

Prerequisites

Setup

Usage

Uploading Documents

Comparing Documents

Understanding Results

Configuration

Database Configuration

File Upload Limits

AI Integration

API Endpoints

Project Structure

Development

Running in Development Mode

Adding New Document Types

Extending AI Analysis

Troubleshooting

Common Issues

Performance Optimization

Contributing

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
static		static
templates		templates
README.md		README.md
app.py		app.py
comparison.png		comparison.png
diff_generator.py		diff_generator.py
document_processor.py		document_processor.py
index.png		index.png
llm_analyzer.py		llm_analyzer.py
main.py		main.py
models.py		models.py
requirements.txt		requirements.txt
routes.py		routes.py
semantic_matcher.py		semantic_matcher.py
simple_analyzer.py		simple_analyzer.py
structured_parser.py		structured_parser.py

nelabdiel/diff_policies

Folders and files

Latest commit

History

Repository files navigation

Policy Analysis Platform

Features

Interface Overview

Home / Upload Page

Diff View

Technology Stack

Installation

Prerequisites

Setup

Usage

Uploading Documents

Comparing Documents

Understanding Results

Configuration

Database Configuration

File Upload Limits

AI Integration

API Endpoints

Project Structure

Development

Running in Development Mode

Adding New Document Types

Extending AI Analysis

Troubleshooting

Common Issues

Performance Optimization

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages