Welcome!
This document contains your project tasks and guidelines.
Each participant should complete the task for their selected track only:
- π Data Analyst
- π Data Scientist
- π€ Machine Learning Engineer
π Deadline: Submit your project by 26th of June, 2025 Push your work to GitHub and paste your public repo link in the submission form.
π Download Dataset (Google Drive)
You are to analyze the supermarket sales dataset to uncover trends and key business insights.
-
Data Cleaning
- Handle nulls and fix data types
- Convert date/time columns
- Create new columns (e.g., Total Sales = Unit Price Γ Quantity)
-
Data Visualization Create at least 3 visuals (Excel / Power BI / Tableau):
- Top cities / product lines
- Payment method insights
- Gender & customer type behavior
- Sales trend over time
- Gross Income vs Quantity
-
Insights Summary Briefly answer:
- What drives the most sales?
- Underperforming areas?
- Gender/customer-type behavior differences?
- Month with highest/lowest sales and why?
-
Push to GitHub Create folder:
supermarket-sales-analysis/
Include:- Visual dashboard file
- 2β3 screenshots
README.md
with summary of findings
π Download Dataset (Google Drive)
You are to build a predictive model that determines whether a customer is likely to default on their loan based on the dataset provided.
-
Preprocessing & Cleaning
- Handle missing values
- Convert categorical features to numeric
- Remove/impute outliers
- Create meaningful derived features (optional)
-
Exploratory Data Analysis (EDA) Create at least 2β3 visualizations:
- Income vs loan default
- Age group vs loan default
- Loan grade vs interest rate
β Explain observed patterns in your notebook or README
-
Model Building
- Use Logistic Regression, Decision Tree, or Random Forest
- Evaluate using:
- Accuracy, Precision, Recall
- Confusion Matrix
- Use train/test split or cross-validation
-
Results Explanation Summarize:
- Key predictive features
- Business insights from the model
-
Push to GitHub Create folder:
loan-default-prediction/
Include:notebook.ipynb
README.md
requirements.txt
(optional)
π Intel Image Dataset (Kaggle)
Use the Train folder only.
Pick any 3 classes (e.g., forest, mountain, street)
Do not use the "Test" or "Predict" folders provided.
π Perform your own train/test split in your code.
-
Preprocessing
- Resize images (e.g., 64Γ64)
- Normalize pixel values
- Encode labels
-
Model Building
- Train a CNN (or use transfer learning: VGG, MobileNet, etc.)
- Evaluate performance (accuracy or confusion matrix)
-
Streamlit Deployment
- Build an app that lets users upload an image
- Show the predicted class
-
Push to GitHub Create folder:
intel-image-classification/
Include:notebook.ipynb
streamlit_app.py
predict.py
README.md
describing your classes and setup
- Only submit the GitHub link for your assigned track
- Make sure your repo is public
- Include screenshots or dashboards if needed
- You may add a short video walkthrough (optional)
π©π½βπ» Good luck! Show us your data skills.