Skip to content

mabonmn/TANGO

Repository files navigation

TANGO

Setup Instructions

Prerequisites

  • Python 3.8 or higher
  • pip (Python package installer)

Installation

  1. Clone the repository:

    git clone https://github.com/mabonmn/TANGO.git
    cd TANGO
  2. Create a virtual environment and activate it:

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
  3. Install the required dependencies:

    pip install -r requirements.txt

Running the Code

Jupyter Notebooks

  1. Launch Jupyter Notebook:

    jupyter notebook
  2. Open the desired notebook (e.g., Dev_Basic.ipynb, scraper.ipynb) and run the cells.

Python Scripts

  1. Run the Python script directly:
    python scraper.py

Main Scripts and Functionalities

MainNotebook_Eval.ipynb

  • Purpose: This notebook demonstrates basic development and testing of the core functionalities.
  • Sections:
    • Data Loading
    • Data Preprocessing
    • Model Training
    • Evaluation

scraper.py

  • Purpose: This script is used to scrape data from the Wikipedia English corpus and save it as a CSV file.
  • Usage:
    python scraper.py

Running runDataGen.py

  1. Ensure you have the original dataset in the dataset directory with the name train.csv.
  2. Run the script:
    python runDataGen.py
  3. The script will generate augmented datasets and save them to dataset/dataset_aug_train_all_new.csv.

Running bertEval.py

  1. Ensure you have the augmented dataset generated by runDataGen.py in the dataset directory with the name dataset_aug_train_all_new.csv.
  2. Run the script:
    python bertEval.py
  3. The script will evaluate the quality of sentence augmentations and save the results to dataset/BERTEval.csv.

Dataset Files

dataset_Small/dataset_aug_train_all_new_clean.csv

  • Purpose: This file contains augmented training data for the model.
  • Usage: Load this CSV file into your data processing pipeline to train the model with augmented data.

dataset_Small/dataset_aug_train_all_new.csv

  • Purpose: This file contains the original augmented training data.
  • Usage: Similar to the clean version, but may contain raw and unprocessed entries.

dataset_Small/dataset_train.csv

  • Purpose: This file contains the original training data.
  • Usage: Use this file for initial training and testing of the model.

About

TANGO: Translational Augmentations - Necessary Gain or Overhead?

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •