DSTI ML Labs Project

📚 Project Overview

This project focuses on predicting book ratings using the Goodreads Books dataset from Kaggle. The goal is to apply machine learning techniques, including data exploration, feature engineering, model training, and evaluation, to achieve accurate predictions.

🚀 How to Run the Project

⚠️ Note: Due to Bertrandt’s IT policy, restrictions prevent the use of Anaconda and direct access to Google Drive from my laptop. The following steps outline a process to bypass these limitations:

Clone the Repository
• Clone the main_branch of this GitHub repository to your local computer, or download the zip file.
Upload to Google Drive
• Add the repository folder to your Google Drive account to make the file structure accessible in Google Colab.
Open in Google Colab
• Launch a Google Colab session. • Navigate to the repository folder in Colab’s file browser.
Run the Notebook
• Execute the notebook main.ipynb to start the project.

🎯 Project Objectives

Using the dataset books.csv, the task is to: 1. Train a machine learning model to predict book ratings. 2. Conduct exploratory data analysis (EDA), feature engineering, and selection. 3. Build, train, and evaluate models using appropriate metrics.

📝 Project Evaluation Criteria

The project will be evaluated based on the following rubric (score: 5 points total):

Data Analysis
• Data cleaning, exploratory analysis, and visualizations of relevant attributes (1 point).
Feature Selection
• Feature engineering, pruning, and justification for the choices made (1 point).
Model Training
• Explanation for selected model(s), and comparison of performance across models (1 point).
Model Evaluation
• Evaluation metric, results interpretation, and discussion (1 point).
Project Report
• A concise report summarizing the approach, results, and key insights (1 point).

Bonus Points (up to 1 point):
• Reproducibility: A complete requirements.txt and README (0.5 point).
• Hosting: Hosting on platforms like GitHub, Docker, AWS, or Heroku (0.5 point).

📂 Directory Structure (inspired by CookieCutter)

The project structure follows the CookieCutter standard for reproducibility and organization:

├── LICENSE                   <- Project license.
├── README.md                 <- This README file.
├── data
│   ├── processed             <- Processed data ready for modeling.
│   └── raw                   <- Original, unmodified data files.
│
├── models                    <- Serialized models and predictions.
│
├── notebooks                 <- Jupyter notebooks for experimentation.
│
├── reports                   <- Generated analyses and reports.
│   └── figures               <- Graphics and figures for reporting.
│
└── requirements.txt          <- List of dependencies for reproducing the environment.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DSTI ML Labs Project

📚 Project Overview

🚀 How to Run the Project

🎯 Project Objectives

📝 Project Evaluation Criteria

📂 Directory Structure (inspired by CookieCutter)

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
models		models
notebooks		notebooks
reports/figures		reports/figures
README.md		README.md
imdb.png		imdb.png
main.ipynb		main.ipynb
requirements.txt		requirements.txt

clemcoste/DSTI_ML_with_Python_Project

Folders and files

Latest commit

History

Repository files navigation

DSTI ML Labs Project

📚 Project Overview

🚀 How to Run the Project

🎯 Project Objectives

📝 Project Evaluation Criteria

📂 Directory Structure (inspired by CookieCutter)

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages