🌟 Celebrity Face Matcher 🌟

Where AI Meets Your Inner Star ✨

Welcome to Celebrity Face Matcher 🎭, a revolutionary project fusing cutting-edge AI with star-powered recognition. Using state-of-the-art deep learning models, this system transforms your selfies into celebrity connections!

📚 Table of Contents

Overview
Vision
Technology Stack
Architecture and Pipeline
Installation and Setup
Usage
Streamlit Interface
Project Structure
Contributing
License

🌟 Overview

Celebrity Face Matcher revolutionizes the way we connect with celebrity culture. Through advanced AI algorithms and vector search technology, we create meaningful connections between everyday faces and their celebrity counterparts.

🎯 Vision

"In every face lies a story waiting to be told through the lens of stardom."

Our mission is to:

Empower Individuality: Show users that every face is unique and special.
Boost Self-Confidence: Help users see themselves as the stars they are.
Democratize Celebrity Culture: Break traditional barriers by demonstrating that anyone can resemble a celebrity.
Enable Real-Time Storytelling: Transform everyday images into interactive, cinematic narratives.
Explain the Match: Provide in-depth, technical explanations of why the system believes a particular celebrity is a match.

🛠️ Technology Stack

Python 3.12+: The primary programming language.
Deep Learning Models:
- MTCNN: For robust face detection and landmark extraction.
- InceptionResnetV1: A CNN trained on VGGFace2 that generates a 512-dimensional facial embedding.
- CLIP: Trained on millions of image-text pairs, this model learns a shared embedding space.
Vector Database:
- Pinecone: A managed vector database that stores high-dimensional embeddings.
Database:
- MongoDB Atlas: Stores celebrity metadata and image paths from the CelebA dataset.
APIs and Tools:
- Kaggle API: For downloading the CelebA dataset.
- PyTorch: For model inference.
- Hugging Face InferenceClient: Powers our LLM (e.g., Mistral-7B) which generates natural language explanations.
Other Libraries:
- OpenCV: For image processing and visualization.
- Pandas: For CSV processing.
- dotenv: For environment variable management.

🏗️ Architecture and Pipeline

The project is organized into several key components:

Data Ingestion:
- The app/data/celeba_ingestion.py module downloads a subset of the CelebA dataset using the Kaggle API, processes the CSV files, and stores celebrity records in MongoDB.
Image Processing Pipeline:
- Face Detection & Alignment (MTCNN): Detects faces and extracts facial landmarks, then aligns the face into a canonical view.
- Embedding Extraction (InceptionResnetV1): Generates a robust 512-dimensional embedding capturing facial features.
- Pipeline Orchestration: Integrates detection, alignment, and encoding, generating composite images that juxtapose the query image with its best match.
Vector Storage & Retrieval:
- Pinecone Vector Database: Manages the index containing over 10,000 precomputed celebrity embeddings, using ANN search with cosine similarity to retrieve the most similar embeddings.
Query & Inference:
- Query Pipeline: Processes a user-uploaded image through the entire pipeline and queries Pinecone to retrieve top matches.
- LLM Explanation Module: Uses CLIP to generate captions and a Hugging Face LLM to generate a detailed explanation of the match.

⚙️ Installation and Setup

Prerequisites

Python 3.12 or later
MongoDB Atlas account
Pinecone account
Kaggle account

Steps

Clone the Repository:

git clone https://github.com/maxhartml/Celebrity-Face-Matcher.git
cd Celebrity-Face-Matcher

Set Up a Virtual Environment:

python -m venv env
source env/bin/activate  # On Windows: env\Scripts\activate

Install the Dependencies:
```
pip install -r requirements.txt
```

Configure Environment Variables: Create a .env file in the project root with the following:

# MongoDB
MONGODB_URI=your_mongodb_connection_string
MONGODB_DB_NAME=celebrityDB
MONGODB_COLLECTION=celebrities

# Pinecone
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_INDEX_NAME=celebrity-embeddings
PINECONE_CLOUD=aws
PINECONE_REGION=us-east-1

# HuggingFace
HF_API_TOKEN=your_hf_api_totken

Set Up Kaggle Credentials: Follow Kaggle’s instructions to place your kaggle.json file in ~/.kaggle/ or set the KAGGLE_CONFIG_DIR environment variable.

🚀 Usage

Data Ingestion

To download and ingest a subset of the CelebA dataset into MongoDB, run:

python -m app.data.celeba_ingestion

Running the Image Processing Pipeline

To process all celebrity records from the database, extract their embeddings, and store them in Pinecone, run:

python -m app.orchestration.orchestrator

Querying the Pinecone Index & Running the Streamlit UI

To query the Pinecone index with an image and view the results via a user-friendly interface:

Launch the Streamlit UI:
```
streamlit run streamlit_app.py
```

🤝 Contributing

Contributions are welcome! Please fork the repository and submit pull requests. Adhere to the code style guidelines and ensure your changes are well tested.

📜 License

This project is licensed under the MIT License.

We hope you find Celebrity Face Matcher both inspiring and technically intriguing. Enjoy exploring your unique star quality and the sophisticated AI powering your match!

If you have any questions or need further assistance, please open an issue or contact us.

Happy coding! 🌟

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
app		app
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
candidate_captions.json		candidate_captions.json
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌟 Celebrity Face Matcher 🌟

Where AI Meets Your Inner Star ✨

🌟 Overview

🎯 Vision

🛠️ Technology Stack

🏗️ Architecture and Pipeline

⚙️ Installation and Setup

Prerequisites

Steps

🚀 Usage

Data Ingestion

Running the Image Processing Pipeline

Querying the Pinecone Index & Running the Streamlit UI

🤝 Contributing

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🌟 Celebrity Face Matcher 🌟

Where AI Meets Your Inner Star ✨

🌟 Overview

🎯 Vision

🛠️ Technology Stack

🏗️ Architecture and Pipeline

⚙️ Installation and Setup

Prerequisites

Steps

🚀 Usage

Data Ingestion

Running the Image Processing Pipeline

Querying the Pinecone Index & Running the Streamlit UI

🤝 Contributing

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages