Skip to content

Commit

Permalink
Merge pull request #63 from stephenhky/gcp
Browse files Browse the repository at this point in the history
Moved corpus datasets and testing word2vec model to Google Cloud
  • Loading branch information
stephenhky authored Mar 3, 2019
2 parents 340f645 + 22a5978 commit f739cc0
Show file tree
Hide file tree
Showing 9 changed files with 1,524 additions and 1,537 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ If you would like to contribute, feel free to submit the pull requests. You can

## News

* 03/01/2019: `shorttext` 1.1.0 released.
* 03/03/2019: `shorttext` 1.1.0 released.
* 02/14/2019: `shorttext` 1.0.8 released.
* 01/30/2019: `shorttext` 1.0.7 released.
* 01/29/2019: `shorttext` 1.0.6 released.
Expand Down
Binary file removed data/USInaugural.zip
Binary file not shown.
Binary file removed data/nih_full.csv.zip
Binary file not shown.
7 changes: 4 additions & 3 deletions docs/news.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
News
====

* 03/01/2019: `shorttext` 1.1.0 released.
* 03/03/2019: `shorttext` 1.1.0 released.
* 02/14/2019: `shorttext` 1.0.8 released.
* 01/30/2019: `shorttext` 1.0.7 released.
* 01/29/2019: `shorttext` 1.0.6 released.
Expand Down Expand Up @@ -45,10 +45,11 @@ News
What's New
----------

Release 1.1.0 (March 1, 2019)
Release 1.1.0 (March 3, 2019)
-----------------------------

* Size of embedded vectors set to 300 again when necessary. (Possibly break compatibility)
* Size of embedded vectors set to 300 again when necessary; (possibly break compatibility)
* Moving corpus data from Github to Google Cloud Storage.


Release 1.0.8 (February 14, 2019)
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,6 @@ def package_description():
scripts=['bin/ShortTextCategorizerConsole',
'bin/ShortTextWordEmbedSimilarity',
'bin/switch_kerasbackend'],
# include_package_data=False,
#include_package_data=False,
test_suite="test",
zip_safe=False)
4 changes: 2 additions & 2 deletions shorttext/data/data_retrieval.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ def inaugural():
:rtype: dict
"""
zfile = zipfile.ZipFile(get_or_download_data("USInaugural.zip",
"https://github.com/stephenhky/PyShortTextCategorization/blob/master/data/USInaugural.zip?raw=true",
"http://storage.googleapis.com/pyshorttext/USPresidentInaugural/USInaugural.zip",
asbytes=True),
)
address_jsonstr = zfile.open("addresses.json").read()
Expand Down Expand Up @@ -144,7 +144,7 @@ def nihreports(txt_col='PROJECT_TITLE', label_col='FUNDING_ICs', sample_size=512
raise KeyError('Undefined label column: '+label_col+'. Must be FUNDING_ICs or IC_NAME.')

zfile = zipfile.ZipFile(get_or_download_data('nih_full.csv.zip',
'https://github.com/stephenhky/PyShortTextCategorization/blob/master/data/nih_full.csv.zip?raw=true',
'http://storage.googleapis.com/pyshorttext/nih_grant_public/nih_full.csv.zip',
asbytes=True),
'r',
zipfile.ZIP_DEFLATED)
Expand Down
Loading

0 comments on commit f739cc0

Please sign in to comment.