Skip to content

Commit

Permalink
Merge pull request #107 from stephenhky/docstring
Browse files Browse the repository at this point in the history
Codes and documentation cleaned up
  • Loading branch information
stephenhky authored Sep 23, 2020
2 parents 10fd558 + 6f4faef commit bc1f2eb
Show file tree
Hide file tree
Showing 32 changed files with 4,084 additions and 3,068 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ If you would like to contribute, feel free to submit the pull requests. You can

## News

* 09/23/2020: `shorttext` 1.4.1 released.
* 09/02/2020: `shorttext` 1.4.0 released.
* 07/23/2020: `shorttext` 1.3.0 released.
* 06/05/2020: `shorttext` 1.2.6 released.
Expand Down Expand Up @@ -144,7 +145,6 @@ If you would like to contribute, feel free to submit the pull requests. You can

## Possible Future Updates

- [x] Including transformer-based models;
- [ ] Use of DASK;
- [ ] Dividing components to other packages;
- [ ] More available corpus.
103 changes: 10 additions & 93 deletions docs/codes.rst
Original file line number Diff line number Diff line change
@@ -1,87 +1,30 @@
API
===

Training Data Retrieval
-----------------------
API unlisted in tutorials are listed here.

Module `shorttext.data.data_retrieval`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: shorttext.data.data_retrieval
:members:

Text Preprocessing
------------------

Module `shorttext.utils.textpreprocessing`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: shorttext.utils.textpreprocessing
:members:

Topic Models
------------

Module `shorttext.generators.bow.LatentTopicModeling`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: shorttext.generators.bow.LatentTopicModeling
:members:

Module `shorttext.generators.bow.GensimTopicModeling`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Shorttext Models Smart Loading
------------------------------

.. automodule:: shorttext.generators.bow.GensimTopicModeling
:members:

Module `shorttext.generators.bow.AutoEncodingTopicModeling`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: shorttext.generators.bow.AutoEncodingTopicModeling
:members:


Module `shorttext.classifiers.topic.TopicVectorDistanceClassification`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: shorttext.classifiers.bow.topic.TopicVectorDistanceClassification
:members:

Module `shorttext.classifiers.topic.SkLearnClassification`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: shorttext.classifiers.bow.topic.SkLearnClassification
.. automodule:: shorttext.smartload
:members:

Supervised Classification using Word Embedding
----------------------------------------------

Module `shorttext.classifiers.embed.sumvec.SumEmbedVecClassification`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Module `shorttext.generators.seq2seq.s2skeras`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: shorttext.classifiers.embed.sumvec.SumEmbedVecClassification
.. automodule:: shorttext.generators.seq2seq.s2skeras
:members:


Module `shorttext.classifiers.embed.sumvec.VarNNSumEmbedVecClassification`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: shorttext.classifiers.embed.sumvec.VarNNSumEmbedVecClassification
:members:

Module `shorttext.classifiers.embed.nnlib.VarNNEmbedVecClassification`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: shorttext.classifiers.embed.nnlib.VarNNEmbedVecClassification
:members:

Maximum Entropy Classifiers
---------------------------

Module `shorttext.classifiers.bow.maxent.MaxEntClassification`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: shorttext.classifiers.bow.maxent.MaxEntClassification
:members:

Neural Networks
---------------
Expand All @@ -92,11 +35,6 @@ Module `shorttext.classifiers.embed.sumvec.frameworks`
.. automodule:: shorttext.classifiers.embed.sumvec.frameworks
:members:

Module `shorttext.classifiers.embed.nnlib.frameworks`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: shorttext.classifiers.embed.nnlib.frameworks
:members:

Utilities
---------
Expand All @@ -113,44 +51,27 @@ Module `shorttext.utils.gensim_corpora`
.. automodule:: shorttext.utils.gensim_corpora
:members:

Module `shorttext.utils.wordembed`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: shorttext.utils.wordembed
:members:

Module `shorttext.utils.compactmodel_io`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: shorttext.utils.compactmodel_io
:members:

Stacked Generalization
----------------------

Module `shorttext.stack`
^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: shorttext.stack.stacking
:members:

Metrics
-------

Module `shorttext.metrics.dynprog`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: shorttext.metrics.dynprog.dldist
:members:

.. automodule:: shorttext.metrics.dynprog.jaccard
:members:
:members: soft_intersection_list

Module `shorttext.metrics.wassersterin`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: shorttext.metrics.wasserstein.wordmoverdist
:members:
:members: word_mover_distance_linprog

Spell Correction
----------------
Expand All @@ -161,11 +82,7 @@ Module `shorttext.spell`
.. automodule:: shorttext.spell.basespellcorrector
:members:

.. automodule:: shorttext.spell.norvig
:members:

.. automodule:: shorttext.spell.sakaguchi
:members:



Expand Down
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@
# The short X.Y version.
version = u'1.4'
# The full version, including alpha/beta/rc tags.
release = u'1.4.0'
release = u'1.4.1'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down
22 changes: 14 additions & 8 deletions docs/news.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
News
====

* 09/23/2020: `shorttext` 1.4.1 released.
* 09/02/2020: `shorttext` 1.4.0 released.
* 07/23/2020: `shorttext` 1.3.0 released.
* 06/05/2020: `shorttext` 1.2.6 released.
Expand Down Expand Up @@ -60,6 +61,11 @@ News
What's New
----------

Release 1.4.0 (September 23, 2020)
----------------------------------

* Documentation and codes cleaned up.

Release 1.4.0 (September 2, 2020)
---------------------------------

Expand All @@ -77,20 +83,20 @@ Release 1.2.6 (June 20, 2020)
* Removed Python-2 codes (`urllib2`).

Release 1.2.5 (May 20, 2020)
-----------------------------
----------------------------

* Update on `gensim` package usage and requirements;
* Removed some deprecated functions.

Release 1.2.4 (May 13, 2020)
-----------------------------
----------------------------

* Update on `scikit-learn` requirements to `>=0.23.0`.
* Directly dependence on `joblib`;
* Support for Python 3.8 added.

Release 1.2.3 (April 28, 2020)
-----------------------------
------------------------------

* PyUP scan implemented;
* Support for Python 3.5 decommissioned.
Expand All @@ -101,13 +107,13 @@ Release 1.2.2 (April 7, 2020)
* Removed dependence on `PyStemmer`, which is replaced by `snowballstemmer`.

Release 1.2.1 (March 23, 2020)
--------------------------------
------------------------------

* Added port number adjustability for word-embedding API;
* Removal of Spacy dependency.

Release 1.2.0 (March 21, 2020)
--------------------------------
------------------------------

* API for word-embedding algorithm for one-time loading.

Expand Down Expand Up @@ -141,7 +147,7 @@ Release 1.1.2 (June 5, 2019)
* Updated codes for Fasttext moddel loading as the previous function was deprecated.

Release 1.1.1 (April 23, 2019)
-----------------------------
------------------------------

* Bug fixed. (Acknowledgement: `Hamish Dickson
<https://github.com/hamishdickson>`_ )
Expand All @@ -154,7 +160,7 @@ Release 1.1.0 (March 3, 2019)


Release 1.0.8 (February 14, 2019)
--------------------------------
---------------------------------

* Minor bugs fixed.

Expand Down Expand Up @@ -185,7 +191,7 @@ Release 1.0.5 (January 13, 2019)


Release 1.0.4 (October 3, 2018)
------------------------------
-------------------------------

* Package `keras` requirement updated;
* Less dependence on `pandas`.
Expand Down
5 changes: 5 additions & 0 deletions docs/tutorial_charbaseonehot.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,11 @@ We can also convert a list of sentences by

You can decide whether or not to output a sparse matrix by specifiying the parameter `sparse`.


.. automodule:: shorttext.generators.charbase.char2vec
:members:


Reference
---------

Expand Down
12 changes: 12 additions & 0 deletions docs/tutorial_charbaseseq2seq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ To use it, create an instance of the class :class:`shorttext.generators.Sentence

The above code is the same as :doc:`tutorial_charbaseonehot` .

.. automodule:: shorttext.generators.charbase.char2vec
:members: initSentenceToCharVecEncoder


Training
--------

Expand All @@ -30,6 +34,10 @@ And then train this neural network model:

This model takes several hours to train on a laptop.


.. autoclass:: shorttext.generators.seq2seq.charbaseS2S.CharBasedSeq2SeqGenerator
:members:

Decoding
--------

Expand All @@ -51,6 +59,10 @@ And can be loaded by:

>>> seq2seqer2 = shorttext.generators.seq2seq.charbaseS2S.loadCharBasedSeq2SeqGenerator('/path/to/norvigtxt_iter5model.bin')

.. automodule:: shorttext.generators.seq2seq.charbaseS2S
:members: loadCharBasedSeq2SeqGenerator


Reference
---------

Expand Down
37 changes: 13 additions & 24 deletions docs/tutorial_dataprep.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@ the subject keywords, as below:
'Holy Trinity', 'eschatology', 'scripture', 'ecclesiology', 'predestination',
'divine degree', 'creedal confessionalism', 'scholasticism', 'prayer', 'eucharist']}


.. automodule:: shorttext.data.data_retrieval
:members: subjectkeywords

Example Training Data 2: NIH RePORT
-----------------------------------

Expand All @@ -55,31 +59,9 @@ randomly drawn from the original data.

However, there are other configurations:

nihreports(txt_col='PROJECT_TITLE', label_col='FUNDING_ICs', sample_size=512)
Return an example data set, sampled from NIH RePORT (Research Portfolio
Online Reporting Tools).

Return an example data set from NIH (National Institutes of Health),
data publicly available from their RePORT
website. (`link
<https://exporter.nih.gov/ExPORTER_Catalog.aspx>`_).
The data is with `txt_col` being either project titles ('PROJECT_TITLE')
or proposal abstracts ('ABSTRACT_TEXT'), and label_col being the names of the ICs (Institutes or Centers),
with 'IC_NAME' the whole form, and 'FUNDING_ICs' the abbreviated form).

Dataset directly adapted from the NIH data from `R` package `textmineR
<https://cran.r-project.org/web/packages/textmineR/index.html>`_.
.. automodule:: shorttext.data.data_retrieval
:members: nihreports

:param txt_col: column for the text (Default: 'PROJECT_TITLE')
:param label_col: column for the labels (Default: 'FUNDING_ICs')
:param sample_size: size of the sample. Set to None if all rows. (Default: 512)
:return: example data set
:type txt_col: str
:type label_col: str
:type sample_size: int
:rtype: dict

If `sample_size` is specified to be `None`, all the data will be retrieved without sampling.

Example Training Data 3: Inaugural Addresses
--------------------------------------------
Expand All @@ -95,6 +77,9 @@ Enter:

>>> trainclassdict = shorttext.data.inaugural()

.. automodule:: shorttext.data.data_retrieval
:members: inaugural


User-Provided Training Data
---------------------------
Expand Down Expand Up @@ -130,4 +115,8 @@ To load this data file, just enter:

>>> trainclassdict = shorttext.data.retrieve_csvdata_as_dict('/path/to/file.csv')

.. automodule:: shorttext.data.data_retrieval
:members: retrieve_csvdata_as_dict


Home: :doc:`index`
Loading

0 comments on commit bc1f2eb

Please sign in to comment.