GitHub - SalihCanAydogdu/NLP_MedicalTranscriptions: NLP Project for Medical Transcriptions.For more details, you can read my report and read.me

Project Purpose: This project aims to classify medical specialties through medical transcripts.

Dataset: Medical transcript dataset from Kaggle was used. (Kaggle link: https://www.kaggle.com/tboyle10/medicaltranscriptions)

Preprocessing: Noise reduction, data cleaning and lemmatization processes were applied.

Text Vectorization: Texts were converted to vectors using Bag-of-Words and TF-IDF methods. Dimensionality reduction was done with PCA.

Algorithms: Multinomial Naïve Bayes, Random Forest, Xgboost, LightGBM and CNN + LSTM (Ensemble Learning) algorithms were applied.

Conclusion: Xgboost and Ensemble Learning methods gave the most successful results. The project can be improved by strengthening it with more data.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitattributes		.gitattributes
CSE431_NLP_SalihCanAydogdu.pdf		CSE431_NLP_SalihCanAydogdu.pdf
README.md		README.md
mtsamples.csv		mtsamples.csv
nlo_seconPart_second.ipynb		nlo_seconPart_second.ipynb

Provide feedback