Skip to content

Employed Natural Language Processing techniques such as removal of Stopwords, Punctuations and Hyperlinks to prepare the Dataset(consisting 404290 rows) and also applied techniques such as Tokenization and Stemming

Notifications You must be signed in to change notification settings

sonalgaud12/Quora_QuestionPair

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Quora_QuestionPair

Employed Natural Language Processing techniques such as removal of Stopwords, Punctuations and Hyperlinks to prepare the Dataset(consisting 404290 rows) and also applied techniques such as Tokenization and Stemming • Extracted Basic features and Advance Features consisting of Fuzz features and explored the features importance • Transformed the texts to numerical vectors using TF-IDF Vectorizer and fitted Logistic Regression and Xgboost Model and did Hyperparameter Tuning on Xgboost model to get Auc score of 0.91 and accuracy 83%

About

Employed Natural Language Processing techniques such as removal of Stopwords, Punctuations and Hyperlinks to prepare the Dataset(consisting 404290 rows) and also applied techniques such as Tokenization and Stemming

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published