Intelligent Information Retrieval System

Setup and Run Instructions

Prerequisites

Node.js (for frontend)
Python 3.13+ (for backend)
uv (recommended) or pip for Python package management

Step 1: Data Collection (Crawling)

First, crawl the publication data from the source website:

cd backend/search_engine
python crawler.py

This will:

Crawl publication data from Coventry University's research portal
Respect robots.txt and implement polite crawling delays
Extract publication titles, authors, abstracts, and dates
Save the data to backend/search_engine/data/crawled_data.json

Step 2: Build Search Index

After crawling, build the search indexes:

cd backend/search_engine
python indexer.py

This will:

Process the crawled data and create field-based positional indexes
Build TF-IDF matrices for titles, authors, and abstracts
Save the indexes to backend/search_engine/data/index.joblib

Step 3: Train Document Classifier

Train the document classification model:

cd backend/classification
python classifier.py

This will:

Load training data from backend/classification/data/cleaned_data.csv
Train a Naive Bayes classifier with TF-IDF features
Evaluate the model using K-fold cross-validation
Save the trained model to backend/classification/data/document_classifier.joblib

Step 4: Install Dependencies

Backend Dependencies

# Using pip
pip install -r backend/requirements.txt

# Or using uv (recommended)
uv pip install -r backend/requirements.txt

Frontend Dependencies

npm install

Step 5: Run the Application

Start the Backend API

# Using python
python fastapi dev backend/api.py

# Or using uv (recommended)
uv run fastapi dev backend/api.py

The backend will be available at http://127.0.0.1:8000 and docs at http://127.0.0.1:8000/docs

Start the Frontend

npm run dev

The frontend will be available at http://localhost:5173/

Complete Workflow Summary

Crawl Data: python backend/search_engine/crawler.py
Build Index: python backend/search_engine/indexer.py
Train Classifier: python backend/classification/classifier.py
Install Dependencies: pip install -r backend/requirements.txt and npm install
Run Backend: python fastapi dev backend/api.py
Run Frontend: npm run dev

Features

Web Crawling: Automated data collection from research publications
Search Engine: Field-based search with TF-IDF ranking and phrase queries
Document Classification: Automatic categorization of research documents
Interactive Web Interface: Modern React-based frontend for search and exploration

Data Files Structure

backend/
├── search_engine/
│   └── data/
│       ├── crawled_data.json     # Raw crawled publication data
│       └── index.joblib          # Search indexes and TF-IDF models
└── classification/
    └── data/
        ├── cleaned_data.csv      # Training data for classifier
        └── document_classifier.joblib  # Trained classification model

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.vscode		.vscode
backend		backend
components		components
services		services
.gitignore		.gitignore
App.tsx		App.tsx
README.md		README.md
index.html		index.html
index.tsx		index.tsx
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
types.ts		types.ts
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Intelligent Information Retrieval System

Setup and Run Instructions

Prerequisites

Step 1: Data Collection (Crawling)

Step 2: Build Search Index

Step 3: Train Document Classifier

Step 4: Install Dependencies

Backend Dependencies

Frontend Dependencies

Step 5: Run the Application

Start the Backend API

Start the Frontend

Complete Workflow Summary

Features

Data Files Structure

About

Uh oh!

Releases

Packages

Languages

sabin26/intelligent-information-retrieval

Folders and files

Latest commit

History

Repository files navigation

Intelligent Information Retrieval System

Setup and Run Instructions

Prerequisites

Step 1: Data Collection (Crawling)

Step 2: Build Search Index

Step 3: Train Document Classifier

Step 4: Install Dependencies

Backend Dependencies

Frontend Dependencies

Step 5: Run the Application

Start the Backend API

Start the Frontend

Complete Workflow Summary

Features

Data Files Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages