Data-cleaning-in-python

From August to October 2022, I undertook a significant data cleaning project in Melbourne, utilizing Python as the primary tool for data processing and preparation.

The initial challenge encountered was the inconsistency in date formats across various data sets. Recognizing the need for uniformity, I employed robust techniques in Python to standardize these formats. This not only improved data coherence but also made downstream tasks such as analysis and modeling more efficient.

In data cleaning, handling outliers is a crucial step. For this task, I used the Interquartile Range (IQR) method, a statistical technique effective in outlier detection. By calculating the IQR for each variable in our data sets, I could identify and handle outliers, thus enhancing the accuracy and reliability of the data.

Another significant issue in any data project is handling missing data. Ignoring them or mishandling could lead to skewed or inaccurate results. Therefore, I implemented linear regression models to impute or estimate the missing values in the data sets. This technique proved effective in ensuring the completeness of our data, allowing for more accurate and meaningful analysis.

The project was successful, resulting in clean, reliable data ready for downstream use in various business analyses and modeling tasks. This effort highlighted the importance of careful data cleaning and preparation in any data-driven decision-making process.

Hi, I'm Bob! 👋

🚀 About Me

I'm a Data analyst

Authors

Bob Mai

Installation

Install my-project with npm

  Jupyter notebook

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Data cleaning in python.ipynb		Data cleaning in python.ipynb
README.md		README.md
branches.csv		branches.csv
dirty_data.csv		dirty_data.csv
dirty_data_solution.csv		dirty_data_solution.csv
edges.csv		edges.csv
missing_data.csv		missing_data.csv
missing_data_solution.csv		missing_data_solution.csv
nodes.csv		nodes.csv
outlier_data.csv		outlier_data.csv
outlier_data_solution.csv		outlier_data_solution.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data-cleaning-in-python

Hi, I'm Bob! 👋

🚀 About Me

Authors

Installation

🔗 Links

About

Uh oh!

Releases

Packages

Languages

jrmforit/Data-cleaning-in-python

Folders and files

Latest commit

History

Repository files navigation

Data-cleaning-in-python

Hi, I'm Bob! 👋

🚀 About Me

Authors

Installation

🔗 Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages