A content-based movie recommendation system built with Python, Streamlit, and machine learning techniques. The system recommends movies based on similarity scores calculated from movie metadata such as genres, keywords, cast, and crew.
- Content-Based Filtering: Recommends movies based on similarity in content (genres, keywords, cast, director)
- Interactive Web Interface: User-friendly Streamlit application
- Movie Posters: Displays movie posters using TMDB API
- Real-time Recommendations: Instant recommendations for any movie in the dataset
- Visual Display: Shows 5 recommended movies in a clean column layout
- Python: Core programming language
- Pandas: Data manipulation and analysis
- NumPy: Numerical computations
- Scikit-learn: Machine learning and text processing
- Streamlit: Web application framework
- TMDB API: Movie poster fetching
- Pickle: Model and data serialization
The system uses the TMDB 5000 Movie Dataset:
- tmdb_5000_movies.csv: Contains movie information (title, genres, keywords, overview, etc.)
- tmdb_5000_credits.csv: Contains cast and crew information
- Total Movies: ~4800 movies
- Features Used: Genres, keywords, cast, director, overview
- Python 3.8+
- Git
-
Clone the repository
git clone https://github.com/Code-With-Samuel/Movie-Recommendation-System.git cd Movie-Recommendation-System -
Create and activate virtual environment
python -m venv .venv # On Windows .venv\Scripts\activate # On macOS/Linux source .venv/bin/activate
-
Install dependencies
pip install -r requirements.txt
-
Run the application
streamlit run app.py
Movie-Recommendation-System/
├── app.py # Main Streamlit application
├── movie-recommender-system.ipynb # Jupyter notebook with data processing
├── movies.pkl # Processed movies data
├── movie_dict.pkl # Movie dictionary for quick lookup
├── similarity.pkl # Cosine similarity matrix
├── tmdb_5000_movies.csv # Raw movies dataset
├── tmdb_5000_credits.csv # Raw credits dataset
├── requirements.txt # Python dependencies
├── images/
│ └── image.png # Application screenshot
└── README.md # This file
- Loads movies and credits datasets
- Merges datasets on movie titles
- Extracts and cleans features:
- Genres (top 3)
- Keywords (top 3)
- Cast (top 3 actors)
- Director
- Overview
- Combines all text features into tags
- Applies text preprocessing:
- Lowercase conversion
- Stop word removal
- Stemming
- Uses TF-IDF Vectorizer to convert text to numerical vectors
- Calculates cosine similarity between all movie pairs
- Stores similarity matrix for fast lookup
- Takes user-selected movie
- Finds similarity scores with all other movies
- Sorts by similarity (descending)
- Returns top 5 most similar movies (excluding the input movie)
- Streamlit provides interactive UI
- Dropdown for movie selection
- Displays recommendations with posters
- Uses TMDB API to fetch movie posters
The application features:
- Movie Selection Dropdown: Choose from ~4800 movies
- Recommend Button: Get instant recommendations
- Visual Display: 5-column layout showing movie titles and posters
- Responsive Design: Works on desktop and mobile devices
- Dataset Size: ~4800 movies
- Features Used: 5000+ text features after TF-IDF
- Similarity Matrix: 4800×4800 cosine similarity matrix
- Response Time: <1 second for recommendations
- Memory Usage: ~200MB (mainly similarity matrix)
To use movie posters, you need a TMDB API key:
- Sign up at TMDB
- Get your API key from Settings > API
- Replace the API key in
app.pyline 7:response= requests.get('https://api.themoviedb.org/3/movie/{}?api_key=YOUR_API_KEY&language=en-US'.format(movie_id))
-
Large File Error: If you encounter Git LFS issues:
git lfs install git lfs track "*.pkl" git add .gitattributes git add *.pkl git commit -m "Add LFS tracking" git push
-
Missing Dependencies: Ensure all packages are installed:
pip install streamlit pandas numpy scikit-learn requests
-
API Key Issues: Verify your TMDB API key is valid and has sufficient quota
-
Memory Issues: The similarity matrix is large (~200MB). Ensure sufficient RAM is available.
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- TMDB for providing the movie dataset and API
- Scikit-learn for machine learning tools
- Streamlit for the web framework
- Pandas for data manipulation
Created by Code-With-Samuel
If you have any questions or suggestions, feel free to:
- Open an issue on GitHub
- Reach out via the repository discussions
⭐ If you find this project helpful, please give it a star!
