cluster_outlier

This repository contains the code and embeddings for Outlier Detection and Word Similarity estimation.

The script to run the task is outlier_detection/prototype.py and it can be run on the Camacho-Collados_Dataset.
A preprocessed version of the Camacho-Collados_Dataset can be found in preprocessed_datasets/Camacho-Collados_Dataset.txt.
Embeddings are filtered to contain only the required vectors, and they are provided in embeddings/

The basic command to run the script:

   python prototype.py --input $DIR_DATA --embedding $DIR_EMBEDDING

To learn about the options provided by the script, run the following command:
```
  python prototype.py --help
```
Tip : With minor changes, the script can be run on a preprocessed version of Blair datasets, also provided in the folder preprocessed_datasets/. The script requires to accomodate the varying number of items in each Blair cluster.

The script to run the task is word_sim/wordsim.py and it can be run on the MEN, SimLEX, and WordSIM353 datasets.

The command to run the script:

   python wordsim.py --input $DIR_DATA --embedding $DIR_EMBEDDING

To learn about the options provided by the script, run the following command:
```
   python wordsim.py --help
```

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
embeddings		embeddings
outlier_detection		outlier_detection
preprocessed_datasets		preprocessed_datasets
word_sim		word_sim
README.md		README.md
_config.yml		_config.yml

Provide feedback