see bert-sentence-similarity.ipynb
Using quora dataset of duplicated question pairs
see aai_semantic_similarity_for_short_sentences.ipynb
- Download trained model from my google.com/drive
embeddings_map500k.pklthe model trained with cross/cosine similarityxgboost_pair.pkl: the model trained with xgboost
- Run server with
docker-compose
version: '3'
services:
oxford3k-flashcard:
image: vuthailinh/short-sentences-similarity:latest
volumes:
- "./embeddings_map500k.pkl:/data/embeddings_map500k.pkl"
- "./xgboost_pair.pkl:/data/xgboost_pair.pkl"
ports:
- "5000:5000"- Start UI application, see vitalivu/semantic-similarity-search-ui
2021-05-24 18:49:52,506 [INFO] sentence_transformers.SentenceTransformer Load pretrained SentenceTransformer: paraphrase-distilroberta-base-v1
2021-05-24 18:49:52,507 [INFO] sentence_transformers.SentenceTransformer Did not find folder paraphrase-distilroberta-base-v1
2021-05-24 18:49:52,507 [INFO] sentence_transformers.SentenceTransformer Search model on server: http://sbert.net/models/paraphrase-distilroberta-base-v1.zip
2021-05-24 18:49:52,507 [INFO] sentence_transformers.SentenceTransformer Load SentenceTransformer from folder: ~/.cache/torch/sentence_transformers/sbert.net_models_paraphrase-distilroberta-base-v1
2021-05-24 18:49:53,217 [INFO] sentence_transformers.SentenceTransformer Use pytorch device: cpu
2021-05-24 18:50:12,319 [INFO] sentence_transformers.cross_encoder.CrossEncoder Use pytorch device: cpu
[18:50:17] WARNING: ../src/gbm/gbtree.cc:348: Loading from a raw memory buffer on CPU only machine. Changing predictor to auto.
[18:50:17] WARNING: ../src/gbm/gbtree.cc:355: Loading from a raw memory buffer on CPU only machine. Changing tree_method to hist.
[18:50:17] WARNING: ../src/learner.cc:223: No visible GPU is found, setting `gpu_id` to -1
2021-05-24 18:50:17,892 [INFO] lvnet.nlp.sssearch Preload took 19.8146 seconds
2021-05-24 18:50:17,893 [INFO] lvnet.nlp.sssearch Took 5.5641 seconds to import model
2021-05-24 18:50:17,893 [INFO] lvnet.nlp.sssearch SSSearch started in 25.3862
* Serving Quart app 'main'
* Environment: production
* Please use an ASGI server (e.g. Hypercorn) directly in production
* Debug mode: False
* Running on http://127.0.0.1:5000 (CTRL + C to quit)
[2021-05-24 18:50:17,961] Running on http://127.0.0.1:5000 (CTRL + C to quit)
2021-05-24 18:50:17,961 [INFO] quart.serving Running on http://127.0.0.1:5000 (CTRL + C to quit)
See main.py