🎨 Diffusion Style Transfer

Production-grade text-to-image generation with style conditioning and content safety guardrails.

Built with Stable Diffusion XL, IP-Adapter, and CLIP — designed for creative pipelines where brand consistency and content safety are non-negotiable.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Generation Pipeline                       │
│                                                             │
│  ┌──────────┐    ┌──────────────┐    ┌──────────────────┐  │
│  │  Prompt   │───▶│ Safety Gate  │───▶│   SDXL Base      │  │
│  │  Input    │    │ (pre-screen) │    │   + Refiner      │  │
│  └──────────┘    └──────────────┘    │   + IP-Adapter    │  │
│                                      └────────┬─────────┘  │
│                                               │             │
│  ┌──────────────────────────────────────────────┐           │
│  │            Post-Generation Safety            │           │
│  │  ┌────────────┐ ┌───────────┐ ┌───────────┐ │           │
│  │  │   NSFW     │ │  Content  │ │   Brand   │ │           │
│  │  │ Classifier │ │  Rating   │ │Consistency│ │           │
│  │  │(diffusers) │ │(G/PG/PG13)│ │  (CLIP)   │ │           │
│  │  └────────────┘ └───────────┘ └───────────┘ │           │
│  └──────────────────────────┬───────────────────┘           │
│                             │                               │
│                    ┌────────▼────────┐                      │
│                    │  Safe Output /  │                      │
│                    │  Blur+Block     │                      │
│                    └─────────────────┘                      │
└─────────────────────────────────────────────────────────────┘

Style Conditioning:
  Reference Image ──▶ IP-Adapter ──▶ Style-Conditioned Latents
  Reference Image ──▶ CLIP Encoder ──▶ Style Similarity Score

Features

Feature	Description
SDXL Base + Refiner	Ensemble of expert denoisers for maximum quality
IP-Adapter Style Transfer	Image-prompted style conditioning with tunable strength
NSFW Detection	Diffusers safety_checker + CLIP zero-shot fallback
Content Rating	Automated G / PG / PG-13 classification via CLIP
Brand Consistency	CLIP cosine similarity to reference brand images
Prompt Safety Screening	Pre-generation blocked concept filtering
Batch Generation	Generate + evaluate + save with full audit trail

Tech Stack

Stable Diffusion XL — State-of-the-art text-to-image
HuggingFace Diffusers — Pipeline framework
IP-Adapter — Image-prompted style conditioning
OpenCLIP ViT-H-14 — Style similarity & content classification
PyTorch 2.2+ with CUDA / float16

Quick Start

# Clone and install
git clone https://github.com/YOUR_USERNAME/diffusion-style-transfer.git
cd diffusion-style-transfer
pip install -r requirements.txt

# Run generation notebook
cd notebooks
python generate.py          # or open as Jupyter notebook

# Run style transfer
python style_transfer.py

See GUIDE.md for detailed setup instructions.

Project Structure

diffusion-style-transfer/
├── README.md                  # This file
├── GUIDE.md                   # Step-by-step execution guide
├── requirements.txt           # Python dependencies
├── configs/
│   └── model_config.yaml      # Model, scheduler, safety thresholds
├── notebooks/
│   ├── generate.py            # Text-to-image generation (percent script)
│   └── style_transfer.py      # Style transfer & scoring (percent script)
├── src/
│   ├── __init__.py
│   ├── pipeline.py            # End-to-end generation pipeline
│   ├── safety.py              # NSFW, content rating, brand consistency
│   └── style.py               # Style encoding, similarity, IP-Adapter utils
└── sample_outputs/            # Generated images (git-ignored)
    └── .gitkeep

Sample Outputs

Run the notebooks on a GPU to generate these.

Prompt	Rating	Style
Enchanted castle at golden hour	G	Baseline
Underwater kingdom with coral palaces	G	Watercolor
Woodland creatures tea party	G	Storybook
Young astronaut discovering alien garden	G	Cel-shaded

Content Safety Design

This project treats content safety as a first-class concern, not an afterthought:

Pre-generation: Prompts are screened against a configurable blocklist before any GPU time is spent
Post-generation: Every image passes through NSFW detection, content rating, and (optionally) brand consistency scoring
Fail-safe: Unsafe images are automatically blurred and flagged — never silently passed through
Configurable thresholds: All safety parameters are in configs/model_config.yaml
Audit trail: Every GenerationResult includes full safety metadata

This approach is directly applicable to entertainment and media production pipelines where brand integrity and audience-appropriate content are paramount.

License

MIT

Built as a portfolio demonstration of production AI image generation with safety-first design.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎨 Diffusion Style Transfer

Architecture

Features

Tech Stack

Quick Start

Project Structure

Sample Outputs

Content Safety Design

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
notebooks		notebooks
sample_outputs		sample_outputs
src		src
.gitignore		.gitignore
GUIDE.md		GUIDE.md
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🎨 Diffusion Style Transfer

Architecture

Features

Tech Stack

Quick Start

Project Structure

Sample Outputs

Content Safety Design

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages