Skip to content

Deep neural approach to Boundary and Disfluency Detection - Based on my Master's work

License

Notifications You must be signed in to change notification settings

mtreviso/deepbond

Repository files navigation

deepbond

Deep neural approach to Boundary and Disfluency Detection

This is part of my MSc project. More info:
My dissertation (ptbr)EACL paperSTIL paperPROPOR paperLREC paper

Installation

First, clone this repository using git:

git clone https://github.com/mtreviso/deepbond.git

Then, cd to the DeepBond folder:

cd deepbond

Create a Python virtualenv and install all dependencies using:

python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

Run the install command:

python3 setup.py install

Please note that since Python 3 is required, all the above commands (pip/python) have to be bounded to the Python 3 version.

Data

The data should be put in a folder called data in the root dir. Here is the basic ingredients that you might need:

You can also send me an e-mail if you have any questions!

Usage

You can use deepbond in two ways:

The full list of arguments (CLI) and options (lib) can be seen via:

python3 -m deepbond --help

Take a look at the experiments folder for more examples.

License

MIT.

Cite

If you use deepbond, you can cite this paper:

@inproceedings{treviso2018sentence,
  author = "Marcos Vinícius Treviso and Sandra Maria Aluísio",
  title = "Sentence Segmentation and Disfluency Detection in Narrative Transcripts from Neuropsychological Tests",
  booktitle = "Computational Processing of the Portuguese Language (PROPOR)",
  year = "2018",
  publisher = "Springer International Publishing",
  pages = "409--418",
}

Or the more recent publication (results without prosodic information + CRF)

@inproceedings{casanova-etal-2020-evaluating,
    title = "Evaluating Sentence Segmentation in Different Datasets of Neuropsychological Language Tests in {B}razilian {P}ortuguese",
    author = {Casanova, Edresson  and
      Treviso, Marcos  and
      H{\"u}bner, Lilian  and
      Alu{\'\i}sio, Sandra},
    booktitle = "Proceedings of The 12th Language Resources and Evaluation Conference (LREC)",
    year = "2020",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    pages = "2605--2614",
    ISBN = "979-10-95546-34-4",
}