- Python 3.8 or higher
- pip (Python package installer)
-
Clone the repository:
git clone https://github.com/mabonmn/TANGO.git cd TANGO
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required dependencies:
pip install -r requirements.txt
-
Launch Jupyter Notebook:
jupyter notebook
-
Open the desired notebook (e.g.,
Dev_Basic.ipynb
,scraper.ipynb
) and run the cells.
- Run the Python script directly:
python scraper.py
- Purpose: This notebook demonstrates basic development and testing of the core functionalities.
- Sections:
- Data Loading
- Data Preprocessing
- Model Training
- Evaluation
- Purpose: This script is used to scrape data from the Wikipedia English corpus and save it as a CSV file.
- Usage:
python scraper.py
- Ensure you have the original dataset in the
dataset
directory with the nametrain.csv
. - Run the script:
python runDataGen.py
- The script will generate augmented datasets and save them to
dataset/dataset_aug_train_all_new.csv
.
- Ensure you have the augmented dataset generated by
runDataGen.py
in thedataset
directory with the namedataset_aug_train_all_new.csv
. - Run the script:
python bertEval.py
- The script will evaluate the quality of sentence augmentations and save the results to
dataset/BERTEval.csv
.
- Purpose: This file contains augmented training data for the model.
- Usage: Load this CSV file into your data processing pipeline to train the model with augmented data.
- Purpose: This file contains the original augmented training data.
- Usage: Similar to the clean version, but may contain raw and unprocessed entries.
- Purpose: This file contains the original training data.
- Usage: Use this file for initial training and testing of the model.