Skip to content

Commit 2571a59

Browse files
author
Evelin Amorim
committed
more documenation
1 parent d7c69e3 commit 2571a59

File tree

7 files changed

+484
-0
lines changed

7 files changed

+484
-0
lines changed

docs/Makefile

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Minimal makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line, and also
5+
# from the environment for the first two.
6+
SPHINXOPTS ?=
7+
SPHINXBUILD ?= sphinx-build
8+
SOURCEDIR = source
9+
BUILDDIR = build
10+
11+
# Put it first so that "make" without argument is like "make help".
12+
help:
13+
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14+
15+
.PHONY: help Makefile
16+
17+
# Catch-all target: route all unknown targets to Sphinx using the new
18+
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
19+
%: Makefile
20+
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

docs/make.bat

+35
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
@ECHO OFF
2+
3+
pushd %~dp0
4+
5+
REM Command file for Sphinx documentation
6+
7+
if "%SPHINXBUILD%" == "" (
8+
set SPHINXBUILD=sphinx-build
9+
)
10+
set SOURCEDIR=source
11+
set BUILDDIR=build
12+
13+
%SPHINXBUILD% >NUL 2>NUL
14+
if errorlevel 9009 (
15+
echo.
16+
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
17+
echo.installed, then set the SPHINXBUILD environment variable to point
18+
echo.to the full path of the 'sphinx-build' executable. Alternatively you
19+
echo.may add the Sphinx directory to PATH.
20+
echo.
21+
echo.If you don't have Sphinx installed, grab it from
22+
echo.https://www.sphinx-doc.org/
23+
exit /b 1
24+
)
25+
26+
if "%1" == "" goto help
27+
28+
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
29+
goto end
30+
31+
:help
32+
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
33+
34+
:end
35+
popd

docs/source/conf.py

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Configuration file for the Sphinx documentation builder.
2+
#
3+
# For the full list of built-in configuration values, see the documentation:
4+
# https://www.sphinx-doc.org/en/master/usage/configuration.html
5+
6+
# -- Project information -----------------------------------------------------
7+
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
8+
9+
project = 'text2story'
10+
copyright = '2024, LIAAD'
11+
author = 'LIAAD'
12+
release = '1.5.0'
13+
14+
# -- General configuration ---------------------------------------------------
15+
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
16+
17+
extensions = []
18+
19+
templates_path = ['_templates']
20+
exclude_patterns = []
21+
22+
23+
24+
# -- Options for HTML output -------------------------------------------------
25+
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
26+
27+
html_theme = 'alabaster'
28+
html_static_path = ['_static']

docs/source/custom_annotator.py

+102
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
import spacy
2+
3+
from text2story.core.exceptions import UninstalledModel, InvalidLanguage
4+
from text2story.core.utils import normalize_tag, chunknize_actors
5+
6+
# this stores the pipeline of models used to extract narrative components
7+
# for a given language (whose code is the key of this dictionary)
8+
pipeline = {}
9+
10+
def load(lang:str):
11+
"""
12+
Definition of load method is mandatory, otherwise the package will raise errors.
13+
If you do not want to define it, just define an empty method with the command pass
14+
15+
@param lang: The language code to load models. For instance (pt, en, fr, etc)
16+
@return:
17+
"""
18+
if not (spacy.util.is_package('fr_core_news_lg')):
19+
spacy.cli.download('fr_core_news_lg')
20+
pipeline['fr'] = spacy.load('fr_core_news_lg')
21+
22+
try:
23+
pipeline['fr_time'] = spacy.load(lang + "_tei2go")
24+
except OSError:
25+
model_name = lang + "_tei2go"
26+
command = f"pip install https://huggingface.co/hugosousa/{lang}_tei2go/resolve/main/{lang}_tei2go-any-py3-none-any.whl"
27+
raise UninstalledModel(model_name, command)
28+
29+
30+
def extract_participants(lang, text):
31+
"""
32+
Parameters
33+
----------
34+
lang : str
35+
the language of text to be annotated
36+
text : str
37+
the text to be annotated
38+
39+
Returns
40+
-------
41+
list[tuple[tuple[int, int], str, str]]
42+
the list of actors identified where each actor is represented by a tuple
43+
44+
Raises
45+
------
46+
InvalidLanguage if the language given is invalid/unsupported
47+
"""
48+
49+
if lang not in ['fr']:
50+
raise InvalidLanguage(lang)
51+
52+
doc = pipeline[lang](text)
53+
54+
iob_token_list = []
55+
for token in doc:
56+
start_character_offset = token.idx
57+
end_character_offset = token.idx + len(token)
58+
character_span = (start_character_offset, end_character_offset)
59+
pos = normalize_tag(token.pos_)
60+
ne = token.ent_iob_ + "-" + normalize_tag(token.ent_type_) if token.ent_iob_ != 'O' else 'O'
61+
62+
iob_token_list.append((character_span, pos, ne))
63+
64+
actor_list = chunknize_actors(iob_token_list)
65+
66+
return actor_list
67+
68+
def extract_times(lang, text, publication_time=None):
69+
"""
70+
Parameters
71+
----------
72+
lang : str
73+
the language of text to be annotated
74+
75+
text : str
76+
the text to be annotated
77+
78+
Returns
79+
-------
80+
list[tuple[tuple[int, int], str, str]]
81+
a list consisting of the times identified, where each time is represented by a tuple
82+
with the start and end character offset, it's value and type, respectively
83+
84+
Raises
85+
------
86+
InvalidLanguage if the language given is invalid/unsupported
87+
"""
88+
if lang not in ["fr"]:
89+
raise InvalidLanguage(lang)
90+
91+
timex_lst = pipeline["fr"](text).ents
92+
93+
ans = []
94+
for timex in timex_lst:
95+
96+
start = timex.start_char
97+
end = timex.end_char
98+
label = timex.label_
99+
text = timex.text
100+
101+
ans.append(((start, end), label, text))
102+
return ans

docs/source/index.rst

+35
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
.. text2story documentation master file, created by
2+
sphinx-quickstart on Fri May 31 15:28:44 2024.
3+
You can adapt this file completely to your liking, but it should at least
4+
contain the root `toctree` directive.
5+
6+
Welcome to text2story's documentation!
7+
======================================
8+
9+
.. .. toctree::
10+
.. :maxdepth: 2
11+
.. :caption: Contents:
12+
13+
**text2story** is Python library that intends to extract narrative components
14+
(events, participants, time expressions and their relations) in a easy and flexible way. In addition to
15+
that, it allows to visualize annotations of narratives (manual or automatic). Finally, a benchmark module
16+
is available for experiments.
17+
18+
Check out the :doc:`usage` section for further information, including
19+
how to :ref:`installation` the project.
20+
21+
.. note::
22+
23+
This project is under active development.
24+
25+
Contents
26+
==================
27+
28+
.. toctree::
29+
30+
usage
31+
32+
33+
.. * :ref:`genindex`
34+
.. * :ref:`modindex`
35+
.. * :ref:`search`

docs/source/installation.rst

+49
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
Installation
2+
=====
3+
4+
Installation of text2story requires some libraries that are not python ones. These libraries are important to
5+
the visualization module. Next, we detail
6+
7+
Linux / Ubuntu
8+
-------
9+
10+
The installation requires graphviz software, the latex suite and the software poppler to convert pdf to png.
11+
In Linux, to install these software open a terminal and type the following commands:
12+
13+
.. code-block:: bash
14+
sudo apt-get install graphviz libgraphviz-dev texlive-latex-base texlive-latex-extra poppler-utils
15+
16+
17+
After that, create a virtual environment using venv or other tool of your preference. For instance,
18+
using the following command in the prompt line:
19+
20+
.. code-block:: bash
21+
$ python3 -m venv venv
22+
23+
Then, activate the virtual enviroment in the prompt line. Like, the following command:
24+
25+
.. code-block:: bash
26+
$ source venv/bin/activate
27+
28+
After that, you are ready to install
29+
30+
31+
Windows
32+
-------
33+
34+
First, make sure you have Microsoft C++ Build Tools. Then install graphviz software by download one suitable version
35+
in this [link](https://graphviz.org/download/#windows). Next, install the latex-suite like these
36+
[tutorial](https://www.tug.org/texlive/windows.html#install) explains. Then, install Popple packed for windows,
37+
which you download [here](https://github.com/oschwartz10612/poppler-windows).
38+
39+
Finnally, you can install text2story using pip. If it did not recognize the graphviz installation, then you can
40+
use the following command for pip (tested in pip == 21.1.1).
41+
42+
.. code-block:: powershell
43+
pip install text2story --global-option=build_ext --global-option="-IC:\Program Files\Graphviz\include" --global-option="-LC:\Program Files\Graphviz\lib\"
44+
45+
46+
For newer version of pip (tested in pip == 23.1.2), you can type the following command:
47+
48+
.. code-block:: powershell
49+
pip install --use-pep517 --config-setting="--global-option=build_ext" --config-setting="--global-option=-IC:\Program Files\Graphviz\include" --config-setting="--global-option=-LC:\Program Files\Graphviz\lib"

0 commit comments

Comments
 (0)