Releases: facebookresearch/fastText
v0.9.2
We are happy to announce the release of version 0.9.2.
WebAssembly
We are excited to release fastText bindings for WebAssembly. Classification tasks are widely used in web applications and we believe giving access to the complete fastText API from the browser will notably help our community to build nice tools. See our documentation to learn more.
Autotune: automatic hyperparameter optimization
Finding the best hyperparameters is crucial for building efficient models. However, searching the best hyperparameters manually is difficult. This release includes the autotune feature that allows you to find automatically the best hyperparameters for your dataset.
You can find more information on how to use it here.
Python
fastText loves Python. In this release, we have:
- several bug fixes for prediction functions
- nearest neighbors and analogies for Python
- a memory leak fix
- website tutorials with Python examples
The autotune feature is fully integrated with our Python API. This allows us to have a more stable autotune optimization loop from Python and to synchronize the best hyper-parameters with the _FastText
model object.
Pre-trained models tool
We release two helper scripts:
- download_model.py to automatically download pre-trained vectors from our website
- reduce_model.py to reduce the word-vectors' size using PCA.
They can also be used directly from our Python API.
More metrics
When you test a trained model, you can now have more detailed results for the precision/recall metrics of a specific label or all labels.
Paper source code
This release contains the source code of the unsupervised multilingual alignment paper.
Community feedback and contributions
We want to thank our community for giving us feedback on Facebook and on GitHub.
v0.9.1
We are happy to announce the release of version 0.9.1.
New release of python module
The main goal of this release is to merge two existing python modules: the official fastText
module which was available on our github repository and the unofficial fasttext
module which was available on pypi.org.
You can find an overview of the new API here, and more insight in our blog post.
Refactoring
This version includes a massive rewrite of internal classes. The training and test are now split into three different classes : Model
that takes care of the computational aspect, Loss
that handles loss and applies gradients to the output matrix, and State
that is responsible of holding the model's state inside each thread.
That makes the code more straighforward to read but also gives a smaller memory footprint, because the data needed for loss computation is now hold only once unlike before where there was one for each thread.
Misc
- Compilation issues fix for recent versions of Mac OS X.
- Better unicode handling :
on_unicode_error
argument that helps to handle unicode issues one can face with some datasets- bug fix related to different behaviour of pybind11's
py::str
class between python2 and python3
- script for unsupervised alignment
- public file hosting changed from
aws
tofbaipublicfiles
- we added a Code of Conduct file.
Thank you !
As always, we want to thank you for your help and your precious feedback which helps making this project better.
0.2.0
We are happy to announce the change of the license from BSD+patents to MIT and the release of fastText 0.2.0.
The main purpose of this release is to set a beta C++ API of the FastText
class. The class now behaves as a computational library: we moved the display and some usage error handlings outside of it (mainly to main.cc
and fasttext_pybind.cc
). It is still compatible with older versions of the class, but some methods are now marked as deprecated and will probably be removed in the next release.
In this respect, we also introduce the official support for python. The python binding of fastText is a client of the FastText
class.
Here is a short summary of the 104 commits since 0.1.0 :
New :
- Introduction of the “OneVsAll” loss function for multi-label classification, which corresponds to the sum of binary cross-entropy computed independently for each label. This new loss can be used with the
-loss ova
or-loss one-vs-all
command line option ( 8850c51 ). - Computation of the precision and recall metrics for each label ( be1e597 ).
- Removed printing functions from
FastText
class ( 256032b ). - Better default for number of threads ( 501b9b1 ).
- Python support ( f10ec1f ).
- More tests for circleci/python ( eb9703a, 97fcde8, 1de0624 ).
Bug fixes :
- Normalize buffer vector in analogy queries.
- Typo fixes and clarifications on website.
- Improvements on python install issues :
setup.py
OS X compiler flags, pybind11 include. - Fix: getSubwords for EOS.
- Fix: ETA time.
- Fix: division by 0 in word analogy evaluation.
- Fix for the infinite loop on ARM cpu.
Operations :
Worth noting :
- We added circleci build badges to the
README.md
- We modified the style to be in compliance with Facebook C++ style.
- We added coverage option for
Makefile
andsetup.py
in order to build for measuring the coverage.
Thank you fastText community!
We want to thank you all for being a part of this community and sharing your passion with us. Some of these improvements would not have been possible without your help.
v0.1.0
First official 0.1 release