For every given pair of sentences -- (sentence-1, sentence-2), we need to determine if sentence-2 can be logically inferred given sentence-1.
Sentence1
: String column of human entered text, Sentence 1Sentence2
: String column of human entered text, Sentence 2gold_label
: Categorical column inferring logical relation between sentence1 and sentence2
- Length of document in sentence1:
- Length of document in sentence2:
- Heatmap of correlation between the features:
- Bidirectional LSTM Model performance(not good due to less data):
- Selected model's performance for predicting the testing
gold_label
.
- Since the dataset was very small, training a Neural network was not a good idea so I choose to move ahead with ML algorithms.
- So, working on a large dataset can improve the learning.
- Advanced NLP techniques can be implemented to find the semantic relationship between both the sentences to get a better result.
- Due to lack of time I decided to follow this approach but with various iterations during the development, model's performance can increase significantly.
Data Cleaning
was done signifantly well but can be done using other approaches.Feature engineering
is one important part which require good knowledge of NLP which can be worked upon in future.- Dimensionality reduction based on experimentation on using
PCA
ort-SNE
can be perfromed to optimize model performance and remove useless features. Hypothesis testing
can be done in making useful decissions about the feature, whether they contribute in predicting rightgold_label
or not.Word ebedding
can be implemented to get a better semantic relationship between words.- Working with more better Neural Networks will be a better choice for this kind of problem, although
bidirectional LSTM
should perform well with large dataset. - Finally once we get a good model performance over the data, we can implement hyperparameter tuning to tune those small knobs in the
bidirectional LSTM
model to extract the best performance out of it. - for any suggestions contact me at [email protected]