Welcome to Celebrity Face Matcher 🎭, a revolutionary project fusing cutting-edge AI with star-powered recognition. Using state-of-the-art deep learning models, this system transforms your selfies into celebrity connections!
📚 Table of Contents
Celebrity Face Matcher revolutionizes the way we connect with celebrity culture. Through advanced AI algorithms and vector search technology, we create meaningful connections between everyday faces and their celebrity counterparts.
"In every face lies a story waiting to be told through the lens of stardom."
Our mission is to:
- Empower Individuality: Show users that every face is unique and special.
- Boost Self-Confidence: Help users see themselves as the stars they are.
- Democratize Celebrity Culture: Break traditional barriers by demonstrating that anyone can resemble a celebrity.
- Enable Real-Time Storytelling: Transform everyday images into interactive, cinematic narratives.
- Explain the Match: Provide in-depth, technical explanations of why the system believes a particular celebrity is a match.
- Python 3.12+: The primary programming language.
- Deep Learning Models:
- MTCNN: For robust face detection and landmark extraction.
- InceptionResnetV1: A CNN trained on VGGFace2 that generates a 512-dimensional facial embedding.
- CLIP: Trained on millions of image-text pairs, this model learns a shared embedding space.
- Vector Database:
- Pinecone: A managed vector database that stores high-dimensional embeddings.
- Database:
- MongoDB Atlas: Stores celebrity metadata and image paths from the CelebA dataset.
- APIs and Tools:
- Kaggle API: For downloading the CelebA dataset.
- PyTorch: For model inference.
- Hugging Face InferenceClient: Powers our LLM (e.g., Mistral-7B) which generates natural language explanations.
- Other Libraries:
- OpenCV: For image processing and visualization.
- Pandas: For CSV processing.
- dotenv: For environment variable management.
The project is organized into several key components:
-
Data Ingestion:
- The
app/data/celeba_ingestion.pymodule downloads a subset of the CelebA dataset using the Kaggle API, processes the CSV files, and stores celebrity records in MongoDB.
- The
-
Image Processing Pipeline:
- Face Detection & Alignment (MTCNN): Detects faces and extracts facial landmarks, then aligns the face into a canonical view.
- Embedding Extraction (InceptionResnetV1): Generates a robust 512-dimensional embedding capturing facial features.
- Pipeline Orchestration: Integrates detection, alignment, and encoding, generating composite images that juxtapose the query image with its best match.
-
Vector Storage & Retrieval:
- Pinecone Vector Database: Manages the index containing over 10,000 precomputed celebrity embeddings, using ANN search with cosine similarity to retrieve the most similar embeddings.
-
Query & Inference:
- Query Pipeline: Processes a user-uploaded image through the entire pipeline and queries Pinecone to retrieve top matches.
- LLM Explanation Module: Uses CLIP to generate captions and a Hugging Face LLM to generate a detailed explanation of the match.
- Python 3.12 or later
- MongoDB Atlas account
- Pinecone account
- Kaggle account
-
Clone the Repository:
git clone https://github.com/maxhartml/Celebrity-Face-Matcher.git cd Celebrity-Face-Matcher -
Set Up a Virtual Environment:
python -m venv env source env/bin/activate # On Windows: env\Scripts\activate
-
Install the Dependencies:
pip install -r requirements.txt
-
Configure Environment Variables: Create a
.envfile in the project root with the following:# MongoDB MONGODB_URI=your_mongodb_connection_string MONGODB_DB_NAME=celebrityDB MONGODB_COLLECTION=celebrities # Pinecone PINECONE_API_KEY=your_pinecone_api_key PINECONE_INDEX_NAME=celebrity-embeddings PINECONE_CLOUD=aws PINECONE_REGION=us-east-1 # HuggingFace HF_API_TOKEN=your_hf_api_totken
-
Set Up Kaggle Credentials: Follow Kaggle’s instructions to place your
kaggle.jsonfile in~/.kaggle/or set theKAGGLE_CONFIG_DIRenvironment variable.
To download and ingest a subset of the CelebA dataset into MongoDB, run:
python -m app.data.celeba_ingestionTo process all celebrity records from the database, extract their embeddings, and store them in Pinecone, run:
python -m app.orchestration.orchestratorTo query the Pinecone index with an image and view the results via a user-friendly interface:
-
Launch the Streamlit UI:
streamlit run streamlit_app.py
Contributions are welcome! Please fork the repository and submit pull requests. Adhere to the code style guidelines and ensure your changes are well tested.
This project is licensed under the MIT License.
We hope you find Celebrity Face Matcher both inspiring and technically intriguing. Enjoy exploring your unique star quality and the sophisticated AI powering your match!
If you have any questions or need further assistance, please open an issue or contact us.
Happy coding! 🌟