You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/source/usage.rst
+20-20
Original file line number
Diff line number
Diff line change
@@ -2,12 +2,12 @@ Usage
2
2
=====
3
3
4
4
5
-
Getting Start
5
+
Getting Started
6
6
-----
7
7
8
-
After installation, you will be able to import and create an object of type Narrative. In this object,
8
+
After installation, you will be able to import and create an object of the Narrative type. In this object,
9
9
the text2story package will perform all automatic annotations. In the next section, we cover all the
10
-
functionalities about the annotators.
10
+
functionalities of the annotators.
11
11
12
12
The Narrative Object
13
13
-----
@@ -19,34 +19,34 @@ for English, 'pt' for Portuguese, and so on.), the text of the narrative (the le
19
19
to the models applied in the pipeline), and the document creation date. The last one is especially important for news
20
20
stories which usually present a publication date.
21
21
22
-
The code bellow presents an example with a raw text English that is used to create a Narrative object.
22
+
The code below presents an example with a raw text English that is used to create a Narrative object.
23
23
24
24
.. literalinclude:: examples/narrative_object.py
25
25
:language: python
26
26
27
27
28
28
The narrative object is used then to process all the pipeline of annotators that will extract the narrative components.
29
-
The Section Annotators Module details how to build such pipeline.
29
+
The Section Annotators Module details how to build such a pipeline.
30
30
31
31
The Readers Module
32
32
-----
33
33
34
-
If the user want to read an already humanannotated dataset, it is possible to do such a thing using some object of the
34
+
If the user wants to read an already human-annotated dataset, it is possible to do such a thing using some object of the
35
35
type reader. text2story readers module supports the following formats: ACE, BRAT, CSV, ECB, Framenet, Propbank.
36
-
Each one of this module inherits the methods from the abstract class `Read`, which obliges all inherited classes to
36
+
Each one of these modules inherits the methods from the abstract class `Read`, which obliges all inherited classes to
37
37
implement the method `process` and `process_file`. The first method reads all files (text and annotations) from a
38
38
given directory, and the second reads only one file (text and its annotations).
39
39
40
-
It is assumed that both methods returns a list of `TokenCorpus`, which is type that contains a token and its
41
-
annotations, if they exists. This is also a class defined in readers module.
40
+
It is assumed that both methods return a list of `TokenCorpus`, which is the type that contains a token and its
41
+
annotations if they exist. This is also a class defined in the reader's module.
42
42
43
43
Next, a code example to read a directory with annotations in BRAT format.
44
44
45
45
.. literalinclude:: examples/read_brat_dir.py
46
46
:language: python
47
47
48
48
49
-
The next code illustrate how to use `ReadBrat` to read only one file.
49
+
The next code illustrates how to use `ReadBrat` to read only one file.
50
50
51
51
.. literalinclude:: examples/read_brat_file.py
52
52
:language: python
@@ -55,26 +55,26 @@ The next code illustrate how to use `ReadBrat` to read only one file.
55
55
The Annotators Module
56
56
-----
57
57
58
-
There are two type of annotators in the text2tstory: the native ones and the customized ones.
59
-
The first is composed by a set of pre-trained models that are part of the library, and
60
-
are all naturally integrated in our pipeline. The second type is composed by annotators that
61
-
anyone can built and integrate in our pipeline. For both, it is required to load the models for the
62
-
language of the used examples. The code bellow is used to load the models for the English language.
58
+
There are two types of annotators in the text2tstory: the native ones and the customized ones.
59
+
The first is composed of a set of pre-trained models that are part of the library, and
60
+
are all naturally integrated in our pipeline. The second type is composed of annotators that
61
+
anyone can build and integrate into our pipeline. For both, the models must be loaded in the
62
+
language of the examples used. The code below is used to load the models for the English language.
63
63
64
64
.. literalinclude:: examples/load_models.py
65
65
:language: python
66
66
67
67
68
68
.. note::
69
69
70
-
Before load models, it is required to install the model for tei2go. For instance, if you are going to use english models. You should execute `pip install https://huggingface.co/hugosousa/en_tei2go/resolve/main/en_tei2go-any-py3-none-any.whl`.
70
+
Before loading models, the model for tei2go must be installed. For instance, if you are going to use English models. You should execute `pip install https://huggingface.co/hugosousa/en_tei2go/resolve/main/en_tei2go-any-py3-none-any.whl`.
71
71
72
72
Next, we describe how the native and custom annotators work.
73
73
74
74
Native Annotators
75
75
^^^^^^^^^^^^^^^^^
76
76
77
-
The native annotators are the following modules: NLTK, PY_HEIDELTIME, BERTNERPT, TEI2GO, SPACY and ALLENNLP. Next, we detail
77
+
The native annotators are the following modules: NLTK, PY_HEIDELTIME, BERTNERPT, TEI2GO, SPACY, and ALLENNLP. Next, we detail
78
78
the usage of each one of these annotators.
79
79
80
80
Participants
@@ -83,8 +83,8 @@ For participants, we have the following annotators SPACY ('pt','en'), NLTK ('en
83
83
BERTNERPT ('pt'), and SRL ('pt').
84
84
85
85
The NLTK module uses a Named Entity Recognition (NER) model trained in the ACE dataset to identify participants in
86
-
the English language. So, after loading the english models, you can use the code bellow to extract
87
-
participants using the NLTK module. Others modules that employs NER to identify participants are SPACY
86
+
the English language. So, after loading the English models, you can use the code below to extract
87
+
participants using the NLTK module. Other modules that employ NER to identify participants are SPACY
88
88
(en_core_web_lg/'en', pt_core_news_lg/'pt') and BERTNERPT (https://huggingface.co/arubenruben/NER-PT-BERT-CRF-Conll2003).
89
89
Bellow, an example of using only NLTK to extract participants from a narrative.
90
90
@@ -106,7 +106,7 @@ Time
106
106
''''''''
107
107
108
108
For time expression, text2story has py_heideltime and tei2go to identify time expressions both in Portuguese and
109
-
English languages. The code is similar to the extraction of participants. See the example bellow:
109
+
English languages. The code is similar to the extraction of participants. See the example below:
0 commit comments