This project aims to make DeepMind's Language Perceiver easily usable for Multiclass Classification.
Table of Contents
- Clone the repo
git clone https://github.com/DvdNss/nlp-perceiver- Install requirements
pip install -r requirements.txtdata/: contains torch data filesmodel/: contains modelsresource/: contains readme imagessource/: contains main scriptsdatabuilder.py: loads, transforms and saves datasetstrain.py: training scriptmapping.py: mapping functionsevaluate.py: evaluation scriptpipeline.py: model pipeline (inference)inference_example.py: inference use case
app.py: streamlit app script
- Set correct mapping functions in
source/mapping.pyfor a given dataset
# Map inputs
def map_inputs(row: dict):
"""
Map inputs with a given format.
:param row: dataset row
:return:
"""
return row['text']
def map_targets(labels: List[int]):
"""
Map targets with a given format.
:param labels: list of labels
:return:
"""
targets = [0] * 28
for label in labels:
targets[label] = 1
return {'targets': targets}- Build the torch files using
source/databuilder.pyscript
python source/databuilder.py --dataset go_emotions --split train+validation --output_dir data --max_size max_sizeOnce the script stops running, there should be a .pt file in the
output_dirfor each split you selected.
- Train your model using
source/train.pyscript
python source/train.py --train_data train_data --validation_data validation_data --batch_size batch_size --lr lr --epochs epochs --output_dir output_dirA model will be saved in
output_direach epoch, which will be named as :
output_dir/perceiver-e<epoch>-acc<eval_acc>.pt.
- Evaluate your model using
source/evaluate.pyscript
python source/evaluate.py --model model_path --validation_data validation_data --batch_size batch_size- Inference using the
source/pipeline.pyscript (see use case ininference_example.py)
from pipeline import MultiLabelPipeline, inputs_to_dataset
model_path = '../model/perceiver-e2-acc0.pt'
# Load pipeline
pipeline = MultiLabelPipeline(model_path=model_path)
# Build a little dataset
inputs = ['This this a test.', 'Another test.', 'The final test.']
# Make inference
outputs = pipeline(inputs_to_dataset(inputs), batch_size=3)
print(outputs)- Finally, run streamlit app
streamlit run app.pyDavid NAISSE - @LinkedIn - [email protected]