Deep neural approach to Boundary and Disfluency Detection
This is part of my MSc project. More info:
My dissertation (ptbr)
•
EACL paper
•
STIL paper
•
PROPOR paper
•
LREC paper
First, clone this repository using git:
git clone https://github.com/mtreviso/deepbond.gitThen, cd to the DeepBond folder:
cd deepbondCreate a Python virtualenv and install all dependencies using:
python3 -m venv env
source env/bin/activate
pip install -r requirements.txtRun the install command:
python3 setup.py installPlease note that since Python 3 is required, all the above commands (pip/python) have to be bounded to the Python 3 version.
The data should be put in a folder called data in the root dir. Here is the basic ingredients that you might need:
- Corpus (see license): https://github.com/nilc-nlp/DNLT-BP
 - Word embeddings (word2vec skipgram): https://www.dropbox.com/s/rw3ti4ebctufp4j/embeddings.zip?dl=1
 - Prosodic information (only for Control and MCI): https://www.dropbox.com/s/0gmt2o2xeah13xk/prosodic.zip?dl=1
 
You can also send me an e-mail if you have any questions!
You can use deepbond in two ways:
The full list of arguments (CLI) and options (lib) can be seen via:
python3 -m deepbond --helpTake a look at the experiments folder for more examples.
MIT.
If you use deepbond, you can cite this paper:
@inproceedings{treviso2018sentence,
  author = "Marcos Vinícius Treviso and Sandra Maria Aluísio",
  title = "Sentence Segmentation and Disfluency Detection in Narrative Transcripts from Neuropsychological Tests",
  booktitle = "Computational Processing of the Portuguese Language (PROPOR)",
  year = "2018",
  publisher = "Springer International Publishing",
  pages = "409--418",
}
Or the more recent publication (results without prosodic information + CRF)
@inproceedings{casanova-etal-2020-evaluating,
    title = "Evaluating Sentence Segmentation in Different Datasets of Neuropsychological Language Tests in {B}razilian {P}ortuguese",
    author = {Casanova, Edresson  and
      Treviso, Marcos  and
      H{\"u}bner, Lilian  and
      Alu{\'\i}sio, Sandra},
    booktitle = "Proceedings of The 12th Language Resources and Evaluation Conference (LREC)",
    year = "2020",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    pages = "2605--2614",
    ISBN = "979-10-95546-34-4",
}