Project Purpose: This project aims to classify medical specialties through medical transcripts.
Dataset: Medical transcript dataset from Kaggle was used. (Kaggle link: https://www.kaggle.com/tboyle10/medicaltranscriptions)
Preprocessing: Noise reduction, data cleaning and lemmatization processes were applied.
Text Vectorization: Texts were converted to vectors using Bag-of-Words and TF-IDF methods. Dimensionality reduction was done with PCA.
Algorithms: Multinomial Naïve Bayes, Random Forest, Xgboost, LightGBM and CNN + LSTM (Ensemble Learning) algorithms were applied.
Conclusion: Xgboost and Ensemble Learning methods gave the most successful results. The project can be improved by strengthening it with more data.