Skip to content

isi-vista/story-gen-BART

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 

Repository files navigation

Note, full code documentation will be available by EMNLP 2020 (Nov 16th).

Direct questions to Seraphina

story-gen-BART

Use the encoder.json and dict.txt already provided as a part of the repo, since it contains additional delimeter tokens relevant for story generations

 cd fairseq
 mkdir temp

Since this is a seq2seq task you need source and target files

  • Put in temp directory 4 files train.source, train.target, val.source, val.target

Now for BPE preprocess:

  sh create_bpe.sh

Binarize dataset:

  ```
  fairseq-preprocess \
    --source-lang "source" \
    --target-lang "target" \
    --trainpref "temp/train.bpe" \
    --validpref "temp/val.bpe" \
    --destdir "temp/" \
    --workers 60 \
    --srcdict dict.txt \
    --tgtdict dict.txt
  ```

Download Pretrained BART from here:

https://github.com/pytorch/fairseq/tree/4b986c51ed89f9895df2f30e57909301b4a4f19b/examples/bart

Train models:

```
sh run.sh
```

Update the field BART_PATH to suit where your pretained model.pt file is You can customize MAX_TOKENS and UPDATE_FREQ based on gpu memory / no of gpus

For Inference:

  python inference.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 95.9%
  • Cuda 2.1%
  • Shell 1.0%
  • C++ 0.9%
  • Lua 0.1%
  • Batchfile 0.0%