Skip to content

Latest commit

 

History

History
58 lines (38 loc) · 1.23 KB

README.md

File metadata and controls

58 lines (38 loc) · 1.23 KB

Note, full code documentation will be available by EMNLP 2020 (Nov 16th).

Direct questions to Seraphina

story-gen-BART

Use the encoder.json and dict.txt already provided as a part of the repo, since it contains additional delimeter tokens relevant for story generations

 cd fairseq
 mkdir temp

Since this is a seq2seq task you need source and target files

  • Put in temp directory 4 files train.source, train.target, val.source, val.target

Now for BPE preprocess:

  sh create_bpe.sh

Binarize dataset:

  ```
  fairseq-preprocess \
    --source-lang "source" \
    --target-lang "target" \
    --trainpref "temp/train.bpe" \
    --validpref "temp/val.bpe" \
    --destdir "temp/" \
    --workers 60 \
    --srcdict dict.txt \
    --tgtdict dict.txt
  ```

Download Pretrained BART from here:

https://github.com/pytorch/fairseq/tree/4b986c51ed89f9895df2f30e57909301b4a4f19b/examples/bart

Train models:

```
sh run.sh
```

Update the field BART_PATH to suit where your pretained model.pt file is You can customize MAX_TOKENS and UPDATE_FREQ based on gpu memory / no of gpus

For Inference:

  python inference.py