π A Deep Learning-based personality classification model using LSTMs, Bi-Directional LSTMs, and BERT to predict MBTI personality types from text data.
This project classifies MBTI personality types based on text data. It is structured into three key steps:
- Data Visualization & Preprocessing β Cleaning and preparing text data.
- Model Training β Training LSTM, Bi-Directional LSTM, and BERT models.
- Model Evaluation β Comparing model accuracy, loss, and overall performance.
β LSTM Model β Sequential model with embeddings and LSTM layers
β Bi-Directional LSTM Model β Enhances sequence learning with bidirectional LSTMs
β BERT Model β Transformer-based NLP model for improved contextual understanding
β Performance Comparison β Evaluating all models based on accuracy and loss
git clone https://github.com/JaspreetSingh-exe/Personality-Prediction-Using-Deep-Learning.git
cd Personality-Prediction-Using-Deep-Learning
pip install -r requirements.txt
π Step 1: Data Visualization & Preprocessing
jupyter notebook data_visualization.ipynb
ποΈββοΈ Step 2: Model Training
jupyter notebook training.ipynb
π Step 3: Model Evaluation & Comparison
jupyter notebook evaluate_model.ipynb
The dataset used for this project consists of text-based personality traits labeled according to the Myers-Briggs Type Indicator (MBTI). Each entry contains a series of posts written by a user and their corresponding personality type. The dataset is preprocessed by:
- Removing stopwords and special characters to clean text data.
- Tokenizing and padding sequences for uniform input.
- Splitting into training and testing sets for model evaluation.
The dataset helps train models to classify personality types based on textual inputs.
π¦ Personality Prediction Using Deep Learning
βββ data_visualization.ipynb # Exploratory Data Analysis & Preprocessing
βββ training.ipynb # Model Training (LSTM, Bi-LSTM, BERT)
βββ evaluate_model.ipynb # Model Evaluation & Comparison
βββ cleaned_mbti_data.csv # Preprocessed dataset
βββ README.md # Project Documentation
βββ requirements.txt # Dependencies List
βββ model_comparison_results.csv # Performance metrics
βββ models/
β βββ lstm_model.h5 # Trained LSTM Model
β βββ bilstm_model.h5 # Trained Bi-LSTM Model
β βββ bert_model.h5 # Trained BERT Model
LSTM is a type of Recurrent Neural Network (RNN) that is well-suited for sequential data processing.
LSTM is a type of Recurrent Neural Network (RNN) that is well-suited for sequential data processing.
from keras.models import Sequential
from keras.layers import LSTM, Embedding, Dense, Dropout
model = Sequential([
Embedding(input_dim=10000, output_dim=256, input_length=1500),
LSTM(100, dropout=0.2, recurrent_dropout=0.2),
Dense(16, activation='softmax')
])
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
π Paper Link
A Bi-Directional LSTM (Bi-LSTM) processes input sequences forward and backward, improving context capture.
A Bi-Directional LSTM (Bi-LSTM) processes input sequences forward and backward, improving context capture.
from keras.layers import Bidirectional
model = Sequential([
Embedding(input_dim=10000, output_dim=256, input_length=1500),
Bidirectional(LSTM(100, return_sequences=True)),
Dropout(0.3),
Bidirectional(LSTM(50)),
Dense(16, activation='softmax')
])
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
π Paper Link
BERT is a transformer-based NLP model trained on large datasets that captures context from both left and right directions.
BERT is a transformer-based NLP model trained on large datasets that captures context from both left and right directions.
from transformers import TFBertModel, AutoTokenizer
import tensorflow as tf
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
bert_layer = TFBertModel.from_pretrained("bert-base-uncased")
input_word_ids = tf.keras.layers.Input(shape=(1500,), dtype=tf.int32, name="input_word_ids")
bert_outputs = bert_layer(input_word_ids)[0]
output = tf.keras.layers.Dense(16, activation="softmax")(bert_outputs[:, 0, :])
bert_model = tf.keras.models.Model(inputs=input_word_ids, outputs=output)
bert_model.compile(loss="categorical_crossentropy",
optimizer=tf.keras.optimizers.Adam(learning_rate=0.00001),
metrics=["accuracy"])
π Paper Link
Model | Accuracy |
---|---|
LSTM | 25.4 % |
Bi-Directional LSTM | 53.0 % |
BERT | 85.8 % |
Want to improve this project? Contributions are welcome!
- Fork the repo
- Create a new branch
- Submit a pull request
This project is licensed under the MIT License.
For queries, reach out to: βοΈ jaspreetsingh01110@gmail.com