This project aims to build a movie recommendation system using content based machine learning technique. The system processes and analyzes movie data to recommend similar movies based on a given input.
- Data Collection
- Data Pre-processing
- Building the ML Model
- Creating the Website
- Data Import - Data is imported from the TMDB 5000 movie dataset (available on Kaggle).
- Data Cleaning - Remove attributes such as revenue, release date, runtime, etc.
- Data Preprocessing
- Convert all text to lowercase.
- Remove all punctuations and non-word characters (special symbols, etc.).
- Remove stop words (are, the, is, etc.).
- Apply stemming to reduce words to their root form.
- Building the Model
- Concatenate tags from the dataset.
- Build a dataframe that records the frequency of tags.
- Transform each row into a vector.
- Use an N-dimensional vector space for cosine similarity to calculate distances between vectors.
- Python: The primary programming language for data processing and model building.
- Streamlit: Used for creating the web application.
- DataSet: TMDB 5000 movie dataset from Kaggle.
- Development Tools: Jupyter Notebook and PyCharm.
Feel free to submit issues and pull requests. Contributions are welcome!