Titanic Passenger Survival Prediction

This repository contains a machine learning pipeline developed to predict the survival of passengers aboard the Titanic. It is part of a series of projects undertaken during a Data Science internship at CodSoft.

Project Overview

The objective of this project is to build a supervised machine learning model that predicts whether a passenger survived the Titanic disaster. This problem is a classic classification task using structured data. The solution employs a Random Forest Classifier and incorporates robust preprocessing and feature engineering techniques.

Dataset

Source: Kaggle - Titanic: Machine Learning from Disaster
Files Used:
- train.csv: Labeled training data
- test.csv: Unlabeled test data for prediction
- gender_submission.csv: Sample submission format

Project Workflow

Data Preprocessing

Handling Missing Values:
- Age: Mean imputation
- Fare: Median imputation
- Embarked: Mode imputation
Encoding:
- Categorical variables encoded using OneHotEncoder
Feature Scaling:
- Standardization of numerical features using StandardScaler

Feature Engineering

New informative features were created to improve model performance:

FamilySize: Combines SibSp and Parch to reflect total family size
IsAlone: Binary indicator of whether the passenger was traveling alone
Title: Extracted from the name to capture social status

Model Building

Model Used: RandomForestClassifier
Pipeline: Preprocessing and classification are combined using Pipeline
Model Evaluation:
- Accuracy on training data
- Cross-validation scores to check generalization
- Feature importance analysis

Output

Predictions are saved as:
- titanic_predictions.csv: Basic submission format
- titanic_predictions_detailed.csv: Includes prediction probabilities and confidence

Technologies Used

Python 3.x
Jupyter Notebook
Pandas, NumPy
Scikit-learn
Matplotlib, Seaborn

Acknowledgments

This project is one of the three required tasks for the Data Science Internship at CodSoft.
I have actively used the Data Science course by Krish Naik and various YouTube tutorials to guide my learning and implementation throughout this project.

Upcoming Tasks

As part of the CodSoft internship, I will also complete two additional data science projects:

Movie Rating Prediction
Iris Flower Classification (or another task from the provided list)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
gender_submission.xls		gender_submission.xls
test.xls		test.xls
titanic prediction.ipynb		titanic prediction.ipynb
titanic_predictions.csv		titanic_predictions.csv
titanic_predictions_detailed.xls		titanic_predictions_detailed.xls
train.xls		train.xls

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Titanic Passenger Survival Prediction

Project Overview

Dataset

Project Workflow

Data Preprocessing

Feature Engineering

Model Building

Output

Technologies Used

Acknowledgments

Upcoming Tasks

About

Uh oh!

Releases

Packages

Languages

Trishund/Titanic-Survival-Prediction-Project

Folders and files

Latest commit

History

Repository files navigation

Titanic Passenger Survival Prediction

Project Overview

Dataset

Project Workflow

Data Preprocessing

Feature Engineering

Model Building

Output

Technologies Used

Acknowledgments

Upcoming Tasks

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages