Project based on the competition hosted by kaggle to mine research papers so that they can be searched with ease by people trying to keep up with research related to the corona virus.
- The tf_idf_similarity file does an analysis on the titles of papers provided in the biorxiv folder. It essentially takes in a query and returns the articles most relevant to that query.
- Doc2Vec implementation where input is a document(abstract, body,sentence) of a paper and output is an embedded vector of the document. Doc2Vec.ipynb has the implementation of doc2vec from scratch in PyTorch.