Skip to content

SalihCanAydogdu/NLP_MedicalTranscriptions

Repository files navigation

Project Purpose: This project aims to classify medical specialties through medical transcripts.

Dataset: Medical transcript dataset from Kaggle was used. (Kaggle link: https://www.kaggle.com/tboyle10/medicaltranscriptions)

Preprocessing: Noise reduction, data cleaning and lemmatization processes were applied.

Text Vectorization: Texts were converted to vectors using Bag-of-Words and TF-IDF methods. Dimensionality reduction was done with PCA.

Algorithms: Multinomial Naïve Bayes, Random Forest, Xgboost, LightGBM and CNN + LSTM (Ensemble Learning) algorithms were applied.

Conclusion: Xgboost and Ensemble Learning methods gave the most successful results. The project can be improved by strengthening it with more data.

About

NLP Project for Medical Transcriptions.For more details, you can read my report and read.me

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published