Merge pull request #107 from stephenhky/docstring

Codes and documentation cleaned up
stephenhky · Sep 23, 2020 · bc1f2eb · bc1f2eb
2 parents 10fd558 + 6f4faef
commit bc1f2eb
Show file tree

Hide file tree

Showing 32 changed files with 4,084 additions and 3,068 deletions.
diff --git a/README.md b/README.md
@@ -87,6 +87,7 @@ If you would like to contribute, feel free to submit the pull requests. You can
 
 ## News
 
+* 09/23/2020: `shorttext` 1.4.1 released.
 * 09/02/2020: `shorttext` 1.4.0 released.
 * 07/23/2020: `shorttext` 1.3.0 released.
 * 06/05/2020: `shorttext` 1.2.6 released.
@@ -144,7 +145,6 @@ If you would like to contribute, feel free to submit the pull requests. You can
 
 ## Possible Future Updates
 
-- [x] Including transformer-based models;
 - [ ] Use of DASK;
 - [ ] Dividing components to other packages;
 - [ ] More available corpus.
diff --git a/docs/codes.rst b/docs/codes.rst
@@ -1,87 +1,30 @@
 API
 ===
 
-Training Data Retrieval
------------------------
+API unlisted in tutorials are listed here.
 
-Module `shorttext.data.data_retrieval`
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. automodule:: shorttext.data.data_retrieval
-   :members:
-
-Text Preprocessing
-------------------
-
-Module `shorttext.utils.textpreprocessing`
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. automodule:: shorttext.utils.textpreprocessing
-   :members:
-
-Topic Models
-------------
-
-Module `shorttext.generators.bow.LatentTopicModeling`
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. automodule:: shorttext.generators.bow.LatentTopicModeling
-   :members:
-
-Module `shorttext.generators.bow.GensimTopicModeling`
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Shorttext Models Smart Loading
+------------------------------
 
-.. automodule:: shorttext.generators.bow.GensimTopicModeling
-   :members:
-
-Module `shorttext.generators.bow.AutoEncodingTopicModeling`
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. automodule:: shorttext.generators.bow.AutoEncodingTopicModeling
-   :members:
-
-
-Module `shorttext.classifiers.topic.TopicVectorDistanceClassification`
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. automodule:: shorttext.classifiers.bow.topic.TopicVectorDistanceClassification
-   :members:
-
-Module `shorttext.classifiers.topic.SkLearnClassification`
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. automodule:: shorttext.classifiers.bow.topic.SkLearnClassification
+.. automodule:: shorttext.smartload
    :members:
 
 Supervised Classification using Word Embedding
 ----------------------------------------------
 
-Module `shorttext.classifiers.embed.sumvec.SumEmbedVecClassification`
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Module `shorttext.generators.seq2seq.s2skeras`
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-.. automodule:: shorttext.classifiers.embed.sumvec.SumEmbedVecClassification
+.. automodule:: shorttext.generators.seq2seq.s2skeras
    :members:
 
+
 Module `shorttext.classifiers.embed.sumvec.VarNNSumEmbedVecClassification`
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 .. automodule:: shorttext.classifiers.embed.sumvec.VarNNSumEmbedVecClassification
    :members:
 
-Module `shorttext.classifiers.embed.nnlib.VarNNEmbedVecClassification`
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. automodule:: shorttext.classifiers.embed.nnlib.VarNNEmbedVecClassification
-   :members:
-
-Maximum Entropy Classifiers
----------------------------
-
-Module `shorttext.classifiers.bow.maxent.MaxEntClassification`
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. automodule:: shorttext.classifiers.bow.maxent.MaxEntClassification
-   :members:
 
 Neural Networks
 ---------------
@@ -92,11 +35,6 @@ Module `shorttext.classifiers.embed.sumvec.frameworks`
 .. automodule:: shorttext.classifiers.embed.sumvec.frameworks
    :members:
 
-Module `shorttext.classifiers.embed.nnlib.frameworks`
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. automodule:: shorttext.classifiers.embed.nnlib.frameworks
-   :members:
 
 Utilities
 ---------
@@ -113,44 +51,27 @@ Module `shorttext.utils.gensim_corpora`
 .. automodule:: shorttext.utils.gensim_corpora
    :members:
 
-Module `shorttext.utils.wordembed`
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. automodule:: shorttext.utils.wordembed
-   :members:
-
 Module `shorttext.utils.compactmodel_io`
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 .. automodule:: shorttext.utils.compactmodel_io
    :members:
 
-Stacked Generalization
-----------------------
-
-Module `shorttext.stack`
-^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. automodule:: shorttext.stack.stacking
-   :members:
 
 Metrics
 -------
 
 Module `shorttext.metrics.dynprog`
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-.. automodule:: shorttext.metrics.dynprog.dldist
-   :members:
-
 .. automodule:: shorttext.metrics.dynprog.jaccard
-   :members:
+   :members: soft_intersection_list
 
 Module `shorttext.metrics.wassersterin`
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 .. automodule:: shorttext.metrics.wasserstein.wordmoverdist
-   :members:
+   :members: word_mover_distance_linprog
 
 Spell Correction
 ----------------
@@ -161,11 +82,7 @@ Module `shorttext.spell`
 .. automodule:: shorttext.spell.basespellcorrector
    :members:
 
-.. automodule:: shorttext.spell.norvig
-   :members:
 
-.. automodule:: shorttext.spell.sakaguchi
-   :members:
 
 
 

diff --git a/docs/conf.py b/docs/conf.py
@@ -58,7 +58,7 @@
 # The short X.Y version.
 version = u'1.4'
 # The full version, including alpha/beta/rc tags.
-release = u'1.4.0'
+release = u'1.4.1'
 
 # The language for content autogenerated by Sphinx. Refer to documentation
 # for a list of supported languages.

diff --git a/docs/news.rst b/docs/news.rst
@@ -1,6 +1,7 @@
 News
 ====
 
+* 09/23/2020: `shorttext` 1.4.1 released.
 * 09/02/2020: `shorttext` 1.4.0 released.
 * 07/23/2020: `shorttext` 1.3.0 released.
 * 06/05/2020: `shorttext` 1.2.6 released.
@@ -60,6 +61,11 @@ News
 What's New
 ----------
 
+Release 1.4.0 (September 23, 2020)
+----------------------------------
+
+* Documentation and codes cleaned up.
+
 Release 1.4.0 (September 2, 2020)
 ---------------------------------
 
@@ -77,20 +83,20 @@ Release 1.2.6 (June 20, 2020)
 * Removed Python-2 codes (`urllib2`).
 
 Release 1.2.5 (May 20, 2020)
------------------------------
+----------------------------
 
 * Update on `gensim` package usage and requirements;
 * Removed some deprecated functions.
 
 Release 1.2.4 (May 13, 2020)
------------------------------
+----------------------------
 
 * Update on `scikit-learn` requirements to `>=0.23.0`.
 * Directly dependence on `joblib`;
 * Support for Python 3.8 added.
 
 Release 1.2.3 (April 28, 2020)
------------------------------
+------------------------------
 
 * PyUP scan implemented;
 * Support for Python 3.5 decommissioned.
@@ -101,13 +107,13 @@ Release 1.2.2 (April 7, 2020)
 * Removed dependence on `PyStemmer`, which is replaced by `snowballstemmer`.
 
 Release 1.2.1 (March 23, 2020)
---------------------------------
+------------------------------
 
 * Added port number adjustability for word-embedding API;
 * Removal of Spacy dependency.
 
 Release 1.2.0 (March 21, 2020)
---------------------------------
+------------------------------
 
 * API for word-embedding algorithm for one-time loading.
 
@@ -141,7 +147,7 @@ Release 1.1.2 (June 5, 2019)
 * Updated codes for Fasttext moddel loading as the previous function was deprecated.
 
 Release 1.1.1 (April 23, 2019)
------------------------------
+------------------------------
 
 * Bug fixed. (Acknowledgement: `Hamish Dickson
   <https://github.com/hamishdickson>`_ )
@@ -154,7 +160,7 @@ Release 1.1.0 (March 3, 2019)
 
 
 Release 1.0.8 (February 14, 2019)
---------------------------------
+---------------------------------
 
 * Minor bugs fixed.
 
@@ -185,7 +191,7 @@ Release 1.0.5 (January 13, 2019)
 
 
 Release 1.0.4 (October 3, 2018)
-------------------------------
+-------------------------------
 
 * Package `keras` requirement updated;
 * Less dependence on `pandas`.

diff --git a/docs/tutorial_charbaseonehot.rst b/docs/tutorial_charbaseonehot.rst
@@ -44,6 +44,11 @@ We can also convert a list of sentences by
 
 You can decide whether or not to output a sparse matrix by specifiying the parameter `sparse`.
 
+
+.. automodule:: shorttext.generators.charbase.char2vec
+   :members:
+
+
 Reference
 ---------
 

diff --git a/docs/tutorial_charbaseseq2seq.rst b/docs/tutorial_charbaseseq2seq.rst
@@ -16,6 +16,10 @@ To use it, create an instance of the class :class:`shorttext.generators.Sentence
 
 The above code is the same as :doc:`tutorial_charbaseonehot` .
 
+.. automodule:: shorttext.generators.charbase.char2vec
+   :members: initSentenceToCharVecEncoder
+
+
 Training
 --------
 
@@ -30,6 +34,10 @@ And then train this neural network model:
 
 This model takes several hours to train on a laptop.
 
+
+.. autoclass:: shorttext.generators.seq2seq.charbaseS2S.CharBasedSeq2SeqGenerator
+   :members:
+
 Decoding
 --------
 
@@ -51,6 +59,10 @@ And can be loaded by:
 
 >>> seq2seqer2 = shorttext.generators.seq2seq.charbaseS2S.loadCharBasedSeq2SeqGenerator('/path/to/norvigtxt_iter5model.bin')
 
+.. automodule:: shorttext.generators.seq2seq.charbaseS2S
+   :members: loadCharBasedSeq2SeqGenerator
+
+
 Reference
 ---------
 

diff --git a/docs/tutorial_dataprep.rst b/docs/tutorial_dataprep.rst
@@ -32,6 +32,10 @@ the subject keywords, as below:
       'Holy Trinity', 'eschatology', 'scripture', 'ecclesiology', 'predestination',
       'divine degree', 'creedal confessionalism', 'scholasticism', 'prayer', 'eucharist']}
 
+
+.. automodule:: shorttext.data.data_retrieval
+    :members: subjectkeywords
+
 Example Training Data 2: NIH RePORT
 -----------------------------------
 
@@ -55,31 +59,9 @@ randomly drawn from the original data.
 
 However, there are other configurations:
 
-nihreports(txt_col='PROJECT_TITLE', label_col='FUNDING_ICs', sample_size=512)
-    Return an example data set, sampled from NIH RePORT (Research Portfolio
-    Online Reporting Tools).
-
-    Return an example data set from NIH (National Institutes of Health),
-    data publicly available from their RePORT
-    website. (`link
-    <https://exporter.nih.gov/ExPORTER_Catalog.aspx>`_).
-    The data is with `txt_col` being either project titles ('PROJECT_TITLE')
-    or proposal abstracts ('ABSTRACT_TEXT'), and label_col being the names of the ICs (Institutes or Centers),
-    with 'IC_NAME' the whole form, and 'FUNDING_ICs' the abbreviated form).
-
-    Dataset directly adapted from the NIH data from `R` package `textmineR
-    <https://cran.r-project.org/web/packages/textmineR/index.html>`_.
+.. automodule:: shorttext.data.data_retrieval
+    :members: nihreports
 
-    :param txt_col: column for the text (Default: 'PROJECT_TITLE')
-    :param label_col: column for the labels (Default: 'FUNDING_ICs')
-    :param sample_size: size of the sample. Set to None if all rows. (Default: 512)
-    :return: example data set
-    :type txt_col: str
-    :type label_col: str
-    :type sample_size: int
-    :rtype: dict
-
-If `sample_size` is specified to be `None`, all the data will be retrieved without sampling.
 
 Example Training Data 3: Inaugural Addresses
 --------------------------------------------
@@ -95,6 +77,9 @@ Enter:
 
 >>> trainclassdict = shorttext.data.inaugural()
 
+.. automodule:: shorttext.data.data_retrieval
+    :members: inaugural
+
 
 User-Provided Training Data
 ---------------------------
@@ -130,4 +115,8 @@ To load this data file, just enter:
 
 >>> trainclassdict = shorttext.data.retrieve_csvdata_as_dict('/path/to/file.csv')
 
+.. automodule:: shorttext.data.data_retrieval
+    :members: retrieve_csvdata_as_dict
+
+
 Home: :doc:`index`