COMPANY: CODETECH IT SOLUTIONS
NAME: MOHIT DAMLE
INTERN ID: CT12WSJC
DOMAIN: BACKEND WEB DEVELOPMENT
DURATION: 12 WEEKS
MENTOR: NEELA SANTOSH
A powerful movie recommendation engine that leverages natural language processing and machine learning to provide personalized movie suggestions based on semantic understanding of user preferences, movie content, and sentiment analysis.
This project implements a semantic movie recommender system that leverages KaggleHub movie datasets, Hugging Face Large Language Models (LLMs), embedding models, LangChain for orchestrating the workflow, LangChain Community components for integration, and Chroma as a vector database for efficient similarity search and a Gradio dashboard for user interaction. It employs techniques like data exploration, text classification, vector search, and sentiment analysis to provide personalized movie recommendations.
- Description
- Project Overview
- Output Website Screenshot
- Features
- How its Works
- Technologies Used
- Setup and Installation
- Gradio Dashboard
- Contributing
- License
- Acknowledgements
The goal of this project is to build a movie recommender system that goes beyond traditional collaborative filtering by understanding the semantic meaning of movie descriptions and user preferences. It utilizes:
- KaggleHub Datasets: Movie metadata, including plots, genres, and user reviews.
- Hugging Face LLMs: For tasks like text classification and sentiment analysis.
- Hugging Face Embedding Models: To create vector representations of movie descriptions and user queries.
- LangChain: For orchestrating LLM interactions, managing prompts, and creating conversational interfaces.
- Vector Search: To find movies with similar semantic meanings.
- Chroma: For efficient vector storage and retrieval.
- Gradio: To create an interactive web interface.
- Semantic Understanding: Utilizes Hugging Face language models to comprehend movie plots, genres, and themes
- Content-Based Filtering: Recommends movies based on content similarity using vector embeddings
- Sentiment Analysis: Analyzes user reviews to factor emotional responses into recommendations
- Interactive Dashboard: Built with Gradio for easy exploration and testing
- Text Classification: Categorizes movies based on multiple attributes
- Vector Search: Fast retrieval of similar movies using vector embeddings
-
Data Processing:
- Movies are cleaned and processed from the Kaggle dataset
- Text fields (plot, synopsis, reviews) are normalized
-
Feature Extraction:
- Hugging Face embedding models convert text to vector representations
- Sentiment analysis classifies review sentiment
-
Recommendation Engine:
- Vector similarity search finds semantically similar movies
- Results are filtered based on user preferences
- Recommendations are scored using multiple factors
-
User Interface:
- Gradio provides an interactive dashboard
- Users can search by movie title or describe preferences
- System displays recommendations with explanation
- Data Source: Kaggle Movies Dataset via KaggleHub
- NLP Models: Hugging Face Transformers
- Embeddings: Hugging Face Sentence Transformers
- UI: Gradio Dashboard
- Analysis: Python, Pandas, NumPy
- Visualization: Matplotlib, Seaborn
-
Clone the repository:
git clone <repository_url> cd movie_recommender_system
-
Create a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Linux/macOS venv\Scripts\activate # On Windows
-
Install dependencies:
pip install -r requirements.txt
-
Download necessary datasets and place them in the
data/directory. -
Download the huggingface models and api, if necessary, or configure the notebook to download them on demand.
-
Download and prepare the dataset:
import kagglehub # Download latest version path = kagglehub.dataset_download("path/to/movie/dataset") print("Path to dataset files:", path)
-
Run the notebooks:
-
Execute the notebooks in the
notebooks/directory in the specified order:data_exploration.ipynb: For dataset analysis and preprocessing. Find missing data, create new tagged description with every movie unique id and overview. Make new cleaned movie csv file for text classification.vector_search.ipynb: Convert the whole document into meaningful chunks of data then convert them into document embeddings with HuggingfaceEmbedding model and store that data in chroma vector database.text_classification.ipynb: For classifying movie genres or other relevant categories with zero-shot classification.sentiment_analysis.ipynb: To analyze movie emotional tone classify them and make it new filter for search engine.
-
-
Launch the Gradio dashboard:
python gradio_dashboard.py
This will start the Gradio web interface, which you can access in your browser.
The gradio_dashboard.py script creates an interactive web interface using Gradio. Users can:
- Enter a movie description or query.
- Receive personalized movie recommendations.
- View movie details and sentiment analysis results.
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Make your changes and commit them.
- Push your changes to your fork.
- Submit a pull request.
This project is licensed under the MIT License.
- Kaggle for providing the movie datasets
- Hugging Face for transformers and embedding models
- LangChain for the LLM application framework
- LangChain Community for integration components
- Chroma for the vector database
- Gradio for the interactive UI framework

