Skip to content

codebasics/project-nlp-log-classification

Repository files navigation

Log Classification With Hybrid Classification Framework

This project implements a hybrid log classification system, combining three complementary approaches to handle varying levels of complexity in log patterns. The classification methods ensure flexibility and effectiveness in processing predictable, complex, and poorly-labeled data patterns.


Classification Approaches

  1. Regular Expression (Regex):

    • Handles the most simplified and predictable patterns.
    • Useful for patterns that are easily captured using predefined rules.
  2. Sentence Transformer + Logistic Regression:

    • Manages complex patterns when there is sufficient training data.
    • Utilizes embeddings generated by Sentence Transformers and applies Logistic Regression as the classification layer.
  3. LLM (Large Language Models):

    • Used for handling complex patterns when sufficient labeled training data is not available.
    • Provides a fallback or complementary approach to the other methods.

architecture


Folder Structure

  1. training/:

    • Contains the code for training models using Sentence Transformer and Logistic Regression.
    • Includes the code for regex-based classification.
  2. models/:

    • Stores the saved models, including Sentence Transformer embeddings and the Logistic Regression model.
  3. resources/:

    • This folder contains resource files such as test CSV files, output files, images, etc.
  4. Root Directory:

    • Contains the FastAPI server code (server.py).

Setup Instructions

  1. Install Dependencies: Make sure you have Python installed on your system. Install the required Python libraries by running the following command:

    pip install -r requirements.txt
  2. Run the FastAPI Server: To start the server, use the following command:

    uvicorn server:app --reload

    Once the server is running, you can access the API at:

    • http://127.0.0.1:8000/ (Main endpoint)
    • http://127.0.0.1:8000/docs (Interactive Swagger documentation)
    • http://127.0.0.1:8000/redoc (Alternative API documentation)

Usage

Upload a CSV file containing logs to the FastAPI endpoint for classification. Ensure the file has the following columns:

  • source
  • log_message

The output will be a CSV file with an additional column target_label, which represents the classified label for each log entry.


Disclaimer

Copyrights Reserved:
@Codebasics Inc
@LearnerX Pvt Ltd

This project, including its code and resources, is intended solely for educational purposes and should not be used for any commercial purposes without proper authorization.

About

Log classification using hybrid classification framework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published