Skip to content

Latest commit

 

History

History
1139 lines (936 loc) · 45 KB

File metadata and controls

1139 lines (936 loc) · 45 KB

Transparent TCP Proxy - Complete Technical Guide

Table of Contents

  1. Overview
  2. Network Stack Fundamentals
  3. TCP vs Application Layer
  4. Dual-Proxy Architecture: Proxy-Aware vs Proxy-Unaware Apps
  5. How Transparent Proxying Works
  6. HTTP vs HTTPS Handling
  7. TLS Handshake Deep Dive
  8. SNI Extraction
  9. CONNECT Tunnel Mechanism
  10. Implementation Approaches
  11. Wireshark vs Proxy View
  12. Troubleshooting & Edge Cases

Overview

This transparent TCP proxy intercepts pfctl-redirected traffic on macOS and forwards it through Privoxy. It handles both HTTP and HTTPS traffic without requiring client configuration.

Key Features

  • ✅ HTTP traffic: Full content inspection and rewriting
  • ✅ HTTPS traffic: SNI-based routing with CONNECT tunnels
  • ✅ No client configuration required (transparent)
  • ✅ Preserves end-to-end encryption for HTTPS
  • ❌ Cannot handle protocols without destination info

Network Stack Fundamentals

OSI Layer Model

┌─────────────────────────────────────────────────────────────┐
│ Layer 7: APPLICATION (HTTP, TLS, FTP, SMTP)                │
│ ├─ What your proxy sees and works with                     │
│ ├─ HTTP requests, TLS handshakes, SNI extraction           │
│ └─ Business logic, content filtering, protocol parsing     │
├─────────────────────────────────────────────────────────────┤
│ Layer 4: TRANSPORT (TCP, UDP)                              │
│ ├─ Handled automatically by kernel                         │
│ ├─ Sequence numbers, acknowledgments, flow control         │
│ └─ Connection management, reliability, retransmission      │
├─────────────────────────────────────────────────────────────┤
│ Layer 3: NETWORK (IP)                                      │
│ ├─ Routing, addressing, fragmentation                      │
│ └─ Source/destination IP addresses                         │
├─────────────────────────────────────────────────────────────┤
│ Layer 2: DATA LINK (Ethernet, WiFi)                        │
│ └─ MAC addresses, frame formatting                         │
├─────────────────────────────────────────────────────────────┤
│ Layer 1: PHYSICAL (Cables, Radio)                          │
│ └─ Electrical signals, wireless transmission               │
└─────────────────────────────────────────────────────────────┘

What Each Layer Sees

Raw Network Packet (Wireshark View):

[Ethernet Header][IP Header: src=192.168.1.100, dst=1.2.3.4][TCP Header: seq=12345, ack=67890][Application Data]

Application Layer (Proxy View):

[Application Data] ← Only this part!

TCP vs Application Layer

TCP Layer Responsibilities

# TCP handles automatically (you never code this):
if packet_lost():
    retransmit_packet()
    
if out_of_order():
    reorder_and_buffer()
    
if receiver_buffer_full():
    pause_sending()  # Flow control
    
if network_congested():
    slow_down()  # Congestion control

Application Layer Responsibilities

# Your proxy handles application concerns:
if is_http_request(data):
    host = extract_host_from_http(data)
    rewrite_for_proxy(data, host)
    
elif is_tls_handshake(data):
    sni = extract_sni_from_tls(data)
    create_connect_tunnel(sni)

Key Differences

Aspect TCP Layer Application Layer
Purpose Reliable byte delivery Business logic & protocols
Handles Connection management Content parsing & routing
Sees Seq numbers, ACKs HTTP headers, TLS handshakes
Guarantees Ordered, reliable delivery Protocol semantics
Your Proxy Kernel handles this Your code works here

Dual-Proxy Architecture: Proxy-Aware vs Proxy-Unaware Apps

The Core Problem

When iOS devices have a global proxy configuration, different apps behave differently:

  • 📱 Proxy-Aware Apps (Safari, Chrome): Honor the proxy settings and send requests directly to the configured proxy
  • 🚫 Proxy-Unaware Apps (Netflix, Games, System Services): Ignore proxy settings and try to connect directly to servers

This creates a challenge: How do you filter traffic from proxy-unaware apps while not interfering with proxy-aware apps?

The Brilliant Solution: Selective pfctl Rules

The solution lies in strategic pfctl rule ordering that creates two separate paths:

# 1. EXEMPT proxy-aware traffic (to Mac/Privoxy)
no rdr on bridge100 inet proto tcp from 192.168.2.4 to 194.165.185.154

# 2. INTERCEPT everything else (proxy-unaware apps) 
rdr pass on bridge100 inet proto tcp from 192.168.2.4 to any -> 127.0.0.1 port 8001

Network Topology

                                    ┌─────────────────┐
                                    │   Mac (Host)    │
                                    │ 194.165.185.154 │
                                    │                 │
                                    │  ┌─────────────┐│
                                    │  │   Privoxy   ││
                                    │  │ Port: 48080 ││ ← Global Proxy
                                    │  └─────────────┘│
                                    │                 │
                                    │  ┌─────────────┐│
                                    │  │Transparent  ││
                                    │  │   Proxy     ││ ← Transparent Proxy
                                    │  │ Port: 8001  ││
                                    │  └─────────────┘│
                                    └─────────────────┘
                                              │
                                              │ Internet Sharing
                                              │ bridge100
                                              │
                                    ┌─────────────────┐
                                    │  iOS Device     │
                                    │  192.168.2.4    │
                                    │                 │
                                    │ Global Proxy:   │
                                    │ 194.165.185.154 │
                                    │ Port: 48080     │
                                    └─────────────────┘

Traffic Flow Analysis

Path 1: Proxy-Aware Apps (Safari, Chrome)

┌─────────────┐    ┌──────────┐    ┌─────────────┐    ┌─────────┐
│ Safari/     │    │  pfctl   │    │   Privoxy   │    │Internet │
│ Chrome      │    │  Rules   │    │ 48080       │    │Servers  │
└─────────────┘    └──────────┘    └─────────────┘    └─────────┘
        │                 │                │               │
        │─── Request ─────>│                │               │
        │   to 194.165.   │                │               │
        │   185.154:48080 │                │               │
        │                 │                │               │
        │                 │─── NO REDIRECT ─── (Rule 1)    │
        │                 │   (no rdr rule  │               │
        │                 │    matches)     │               │
        │                 │                │               │
        │─────── Direct connection ─────────>│               │
        │                                   │─── Forward ──>│
        │<────── Filtered response ──────────│<── Response ──│

Key Points:

  • ✅ Apps already know about the proxy (configured in iOS settings)
  • ✅ They directly connect to 194.165.185.154:48080 (Privoxy)
  • Rule 1 exempts this traffic from redirection
  • Normal proxy flow - requests go straight to Privoxy
  • No transparent proxy involvement

Path 2: Proxy-Unaware Apps (Netflix, Games)

┌─────────────┐    ┌──────────┐    ┌─────────────┐    ┌─────────┐    ┌─────────┐
│ Netflix/    │    │  pfctl   │    │Transparent  │    │ Privoxy │    │Internet │
│ Games       │    │  Rules   │    │  Proxy      │    │  48080  │    │Servers  │
└─────────────┘    └──────────┘    └─────────────┘    └─────────┘    └─────────┘
        │                 │                │               │               │
        │─── Request ─────>│                │               │               │
        │   to external   │                │               │               │
        │   server        │                │               │               │
        │   (e.g., 1.2.3.4:443)           │               │               │
        │                 │                │               │               │
        │                 │─── INTERCEPT ──── (Rule 2)     │               │
        │                 │   (rdr rule    │               │               │
        │                 │    redirects)  │               │               │
        │                 │                │               │               │
        │                 │                │─── Parse ─────>│               │
        │                 │                │   & Forward    │               │
        │                 │                │               │─── Request ──>│
        │<──────────────── Response chain ─────────────────│<── Response ──│

Key Points:

  • ❌ Apps don't know about proxy settings (ignore them)
  • ❌ They try to connect directly to external servers (e.g., 1.2.3.4:443)
  • Rule 2 intercepts this traffic and redirects to transparent proxy
  • Transparent proxy extracts destination (SNI/Host) and forwards via Privoxy
  • Forced filtering through Privoxy without app knowledge

pfctl Rule Logic Deep Dive

Rule Processing Order

pfctl processes rules sequentially and applies the first matching rule:

# Rule 1: EXEMPT - Traffic TO the Mac (proxy-aware apps)
no rdr on bridge100 inet proto tcp from 192.168.2.4 to 194.165.185.154
#    ↑     ↑                        ↑                 ↑
#    │     │                        │                 └─ Destination: Mac IP
#    │     │                        └─ Source: iOS device
#    │     └─ Interface: bridge100 (Internet Sharing)
#    └─ Action: NO redirect (let it pass through)

# Rule 2: INTERCEPT - Everything else (proxy-unaware apps)  
rdr pass on bridge100 inet proto tcp from 192.168.2.4 to any -> 127.0.0.1 port 8001
#   ↑    ↑                        ↑                 ↑     ↑
#   │    │                        │                 │     └─ Redirect target
#   │    │                        │                 └─ Destination: ANY (catch-all)
#   │    │                        └─ Source: iOS device
#   │    └─ Interface: bridge100
#   └─ Action: Redirect to transparent proxy

Why This Works

  1. Safari wants to reach 194.165.185.154:48080 (Privoxy)

    • Matches Rule 1: no rdr ... to 194.165.185.154
    • Result: No redirection, direct connection to Privoxy ✅
  2. Netflix wants to reach netflix.com (e.g., 52.84.253.71:443)

    • Does NOT match Rule 1 (destination is not 194.165.185.154)
    • Matches Rule 2: rdr ... to any
    • Result: Redirected to transparent proxy ✅

Protocol-Level Analysis

Proxy-Aware App Traffic (Safari)

# Safari's HTTP request to proxy
POST http://api.example.com/data HTTP/1.1
Host: api.example.com
Proxy-Connection: keep-alive
User-Agent: Safari/17.0
...

# Safari's HTTPS via CONNECT
CONNECT api.example.com:443 HTTP/1.1
Host: api.example.com:443
Proxy-Connection: keep-alive

Characteristics:

  • Absolute URLs in HTTP requests
  • CONNECT method for HTTPS
  • Proxy-Connection headers
  • Direct socket connection to 194.165.185.154:48080

Proxy-Unaware App Traffic (Netflix)

# Netflix's HTTP request (looks normal)
GET /api/v1/movies HTTP/1.1
Host: netflix.com
User-Agent: Netflix/8.0
...

# Netflix's HTTPS (raw TLS)
16 03 03 01 fc 01 00 01 f8 03 03 ...  ← TLS ClientHello with SNI

Characteristics:

  • Relative URLs in HTTP requests
  • Raw TLS handshakes for HTTPS
  • Host headers and SNI present
  • No proxy awareness - tries to connect directly to destination

The PAC File Problem Solved

The Original Problem

# Original rule (TOO BROAD)
rdr pass on bridge100 inet proto tcp from 192.168.2.4 to any -> 127.0.0.1 port 8001

What happened:

  1. Safari (proxy-aware) tried to fetch PAC file from 194.165.185.154:8088/proxy.pac
  2. pfctl redirected this to transparent proxy (127.0.0.1:8001)
  3. Transparent proxy couldn't handle PAC requests properly
  4. Result: Safari couldn't get proxy configuration → "No Internet"

The Solution

# New rules (SELECTIVE)
no rdr on bridge100 inet proto tcp from 192.168.2.4 to 194.165.185.154  # EXEMPT
rdr pass on bridge100 inet proto tcp from 192.168.2.4 to any -> 127.0.0.1 port 8001  # INTERCEPT

What happens now:

  1. Safari tries to fetch PAC file from 194.165.185.154:8088/proxy.pac
  2. Rule 1 exempts this (destination is 194.165.185.154)
  3. Direct connection to Mac succeeds
  4. Safari gets PAC → configures proxy → works normally ✅

Traffic Distribution

┌─────────────────────────────────────────────────────────────────┐
│                       iOS Device Traffic                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────────────┐    ┌─────────────────────────────────┐ │
│  │   Proxy-Aware Apps  │    │    Proxy-Unaware Apps          │ │
│  │                     │    │                                 │ │
│  │ • Safari            │    │ • Netflix                       │ │
│  │ • Chrome            │    │ • Instagram                     │ │
│  │ • App Store         │    │ • Games                         │ │
│  │ • Mail              │    │ • System Services               │ │
│  │ • Settings          │    │ • Background Updates            │ │
│  │                     │    │                                 │ │
│  │ Destination:        │    │ Destination:                    │ │
│  │ 194.165.185.154     │    │ External IPs                    │ │
│  │ (Mac)               │    │ (Internet)                      │ │
│  └─────────────────────┘    └─────────────────────────────────┘ │
│           │                                    │                │
│           │                                    │                │
│           ▼                                    ▼                │
│  ┌─────────────────────┐    ┌─────────────────────────────────┐ │
│  │ Rule 1: NO REDIRECT │    │ Rule 2: REDIRECT TO 8001        │ │
│  │ (Direct to Privoxy) │    │ (Via Transparent Proxy)         │ │
│  └─────────────────────┘    └─────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Benefits of This Architecture

1. Seamless Operation

  • Zero configuration required for proxy-unaware apps
  • Normal operation for proxy-aware apps
  • No conflicts between the two approaches

2. Optimal Performance

  • Direct connection for proxy-aware apps (one less hop)
  • Transparent interception only when needed
  • Reduced load on transparent proxy

3. Robust Filtering

  • All traffic eventually goes through Privoxy
  • Consistent filtering regardless of app proxy awareness
  • No bypass mechanisms for stubborn apps

4. Easy Debugging

  • Clear separation of traffic types in logs
  • Predictable behavior based on destination IP
  • Simple troubleshooting - check which rule matched

Configuration Best Practices

1. Rule Ordering

# ALWAYS put exemption rules BEFORE catch-all rules
no rdr on bridge100 inet proto tcp from 192.168.2.4 to 194.165.185.154  # Specific exemption
rdr pass on bridge100 inet proto tcp from 192.168.2.4 to any -> 127.0.0.1 port 8001  # General catch-all

2. Interface Specificity

# Be specific about interfaces to avoid unintended redirects
rdr pass on bridge100 inet proto tcp from 192.168.2.4 to any -> 127.0.0.1 port 8001
#
#           └─ Only Internet Sharing bridge, not en0 (main network)

3. Service Port Exclusions

# Exclude proxy service ports to prevent loops
no rdr on bridge100 proto tcp to 127.0.0.1 port { 8001, 48080, 8088, 8080 }

Real-World Testing

Verify Proxy-Aware App Path

# Check direct connection to Privoxy
sudo lsof -i :48080
# Should show direct connections from iOS device

# Monitor Privoxy logs
tail -f /opt/homebrew/var/log/privoxy/logfile
# Should show Safari/Chrome requests with absolute URLs

Verify Proxy-Unaware App Path

# Check transparent proxy connections
sudo lsof -i :8001
# Should show intercepted connections

# Monitor transparent proxy logs
python3 transparent_tcp_proxy.py --verbose
# Should show Netflix/app traffic with SNI extraction

Test pfctl Rules

# Show active NAT rules
sudo pfctl -s nat

# Show rule statistics
sudo pfctl -s info

# Debug rule matching
sudo pfctl -s state | grep 192.168.2.4

This dual-proxy architecture is the key insight that makes transparent proxying work seamlessly with iOS devices that have global proxy configurations. By understanding app behavior and strategically exempting proxy-aware traffic, we achieve the best of both worlds: compatibility and transparency.


How Transparent Proxying Works

Overall Flow

iOS App → pfctl redirect → Transparent Proxy → Privoxy → Internet

Detailed Connection Flow

┌─────────┐    ┌─────────┐    ┌─────────────┐    ┌─────────┐    ┌─────────┐
│ iOS App │    │  pfctl  │    │ Transproxy  │    │ Privoxy │    │ Server  │
└─────────┘    └─────────┘    └─────────────┘    └─────────┘    └─────────┘
      │              │                │               │              │
      │──TCP SYN─────>│                │               │              │
      │              │──redirect──────>│               │              │
      │<─TCP SYN-ACK──│<───────────────│               │              │
      │──TCP ACK─────>│───────────────>│               │              │
      │              │                │               │              │
      │──HTTP/TLS────>│───────────────>│               │              │
      │              │                │──parse & ─────>│              │
      │              │                │  route         │──forward────>│
      │              │                │<──response─────│<─response────│
      │<─response────│<───────────────│               │              │

pfctl Configuration

# Redirect HTTP traffic to proxy
rdr pass on en0 inet proto tcp from any to any port 80 -> 127.0.0.1 port 8001

# Redirect HTTPS traffic to proxy  
rdr pass on en0 inet proto tcp from any to any port 443 -> 127.0.0.1 port 8001

HTTP vs HTTPS Handling

HTTP (Port 80) - Plain Text

Client                                Server
  │                                     │
  │── TCP Connection ──────────────────>│
  │                                     │
  │── HTTP Request ────────────────────>│  ← Readable
  │   GET /api HTTP/1.1                 │
  │   Host: example.com                 │
  │   Content-Type: application/json    │
  │   {"user_id": 123}                  │
  │                                     │
  │<── HTTP Response ───────────────────│  ← Readable
  │   200 OK                            │
  │   {"name": "John"}                  │

What Proxy Can Extract:

  • ✅ HTTP method (GET, POST, etc.)
  • ✅ URL path (/api)
  • ✅ Host header (example.com)
  • ✅ All headers (Content-Type, etc.)
  • ✅ Request/response body
  • ✅ Complete content inspection

HTTPS (Port 443) - Encrypted

Client                                Server
  │                                     │
  │── TCP Connection ──────────────────>│
  │                                     │
  │── TLS ClientHello ─────────────────>│  ← Readable (handshake)
  │   SNI: example.com                  │
  │<── TLS ServerHello ────────────────│
  │── TLS Key Exchange ───────────────>│
  │<── TLS Finished ───────────────────│
  │                                     │
  │== ENCRYPTED TUNNEL ESTABLISHED ====│
  │                                     │
  │── [Encrypted HTTP] ───────────────>│  ← Unreadable
  │   [encrypted: GET /api HTTP/1.1]    │
  │   [encrypted: {"user_id": 123}]     │
  │                                     │
  │<── [Encrypted Response] ───────────│  ← Unreadable
  │   [encrypted: 200 OK]               │
  │   [encrypted: {"name": "John"}]     │

What Proxy Can Extract:

  • ✅ SNI hostname (example.com) from handshake
  • ✅ TLS version and cipher info
  • ❌ HTTP method (encrypted)
  • ❌ URL path (encrypted)
  • ❌ Headers (encrypted)
  • ❌ Request/response body (encrypted)
  • ❌ No content inspection possible

TLS Handshake Deep Dive

TLS ClientHello Structure

┌─────────────────────────────────────────────────────────────┐
│ TLS Record Header (5 bytes)                                 │
│ ├─ Record Type: 0x16 (Handshake)                           │
│ ├─ Version: 0x0303 (TLS 1.2)                              │
│ └─ Length: 252 bytes                                       │
├─────────────────────────────────────────────────────────────┤
│ Handshake Header (4 bytes)                                  │
│ ├─ Type: 0x01 (ClientHello)                               │
│ └─ Length: 248 bytes                                       │
├─────────────────────────────────────────────────────────────┤
│ Client Version (2 bytes): 0x0303                           │
├─────────────────────────────────────────────────────────────┤
│ Random (32 bytes): 52b1a2c3d4e5f6...                      │
├─────────────────────────────────────────────────────────────┤
│ Session ID Length (1 byte) + Session ID (variable)         │
├─────────────────────────────────────────────────────────────┤
│ Cipher Suites Length (2 bytes) + Cipher Suites (variable)  │
├─────────────────────────────────────────────────────────────┤
│ Compression Methods Length (1 byte) + Methods (variable)   │
├─────────────────────────────────────────────────────────────┤
│ Extensions Length (2 bytes)                                │
├─────────────────────────────────────────────────────────────┤
│ Extension 1: Type (2) + Length (2) + Data (variable)       │
│ Extension 2: Type (2) + Length (2) + Data (variable)       │
│ ...                                                         │
│ SNI Extension: Type=0x0000 + Length + SNI Data            │ ← Target!
│ ...                                                         │
└─────────────────────────────────────────────────────────────┘

Byte-Level Example

Hex: 16 03 03 00 fc 01 00 00 f8 03 03 52 b1 a2 c3...
     │  │  │  │  │  │  │  │  │  │  │  │
     │  │  │  │  │  │  │  │  │  │  │  └─ Random data starts
     │  │  │  │  │  │  │  │  │  │  └─ TLS version in handshake
     │  │  │  │  │  │  │  │  │  └─ Handshake length (high)
     │  │  │  │  │  │  │  │  └─ Handshake length (low)
     │  │  │  │  │  │  │  └─ Handshake type (ClientHello)
     │  │  │  │  │  │  └─ Handshake length (high)
     │  │  │  │  │  └─ Handshake length (med)
     │  │  │  │  └─ Handshake length (low)
     │  │  │  └─ Record length (high)
     │  │  └─ Record length (low)
     │  └─ TLS version (minor)
     └─ Record type (Handshake)

Detection Code

def is_tls_handshake(data):
    if len(data) < 6:
        return False
    
    # Check TLS record type (0x16 = Handshake)
    if data[0] != 0x16:
        return False
        
    # Check TLS version (0x03xx)
    if data[1] != 0x03:
        return False
        
    # Check handshake type (0x01 = ClientHello)
    if len(data) > 5 and data[5] == 0x01:
        return True
        
    return True  # Assume TLS record

SNI Extraction

Why SNI Matters

  • Problem: HTTPS encrypts the Host header
  • Solution: SNI (Server Name Indication) in TLS handshake
  • Purpose: Tells server which certificate to use
  • Benefit: Proxy can extract destination hostname

SNI Extension Format

SNI Extension Structure:
┌─────────────────────────────────────┐
│ Extension Type: 0x0000              │
├─────────────────────────────────────┤
│ Extension Length: 19                │
├─────────────────────────────────────┤
│ Server Name List Length: 17         │
├─────────────────────────────────────┤
│ Name Type: 0 (hostname)             │
├─────────────────────────────────────┤
│ Name Length: 14                     │
├─────────────────────────────────────┤
│ Server Name: "example.com"          │
└─────────────────────────────────────┘

SNI Extraction Code

def extract_sni_from_tls(data):
    try:
        # Navigate to extensions section
        pos = 5 + 4 + 2 + 32  # Skip headers, version, random
        
        # Skip session ID
        session_id_len = data[pos]
        pos += 1 + session_id_len
        
        # Skip cipher suites
        cipher_suites_len = struct.unpack('!H', data[pos:pos+2])[0]
        pos += 2 + cipher_suites_len
        
        # Skip compression methods
        compression_len = data[pos]
        pos += 1 + compression_len
        
        # Parse extensions
        extensions_len = struct.unpack('!H', data[pos:pos+2])[0]
        pos += 2
        
        end_pos = pos + extensions_len
        while pos < end_pos and pos + 4 <= len(data):
            ext_type = struct.unpack('!H', data[pos:pos+2])[0]
            ext_len = struct.unpack('!H', data[pos+2:pos+4])[0]
            pos += 4
            
            # SNI extension (type 0)
            if ext_type == 0 and pos + ext_len <= len(data):
                sni_data = data[pos:pos+ext_len]
                if len(sni_data) >= 5:
                    # Parse SNI list
                    name_len = struct.unpack('!H', sni_data[3:5])[0]
                    if len(sni_data) >= 5 + name_len:
                        hostname = sni_data[5:5+name_len].decode('utf-8')
                        return hostname
            pos += ext_len
            
    except Exception:
        pass
    return None

SNI Limitations

Not always present:

  • Legacy clients (pre-2003)
  • Direct IP connections
  • Misconfigured applications
  • Privacy-conscious clients

Usually present:

  • Modern browsers
  • Most HTTP libraries
  • Mobile apps
  • API clients

CONNECT Tunnel Mechanism

HTTP CONNECT Method

The CONNECT method creates a TCP tunnel through an HTTP proxy:

CONNECT example.com:443 HTTP/1.1
Host: example.com:443
Proxy-Connection: keep-alive

HTTPS Tunneling Flow

Client               Proxy                Privoxy              Target Server
  │                    │                     │                      │
  │── TLS ClientHello ─>│                     │                      │
  │   SNI: example.com  │                     │                      │
  │                    │── CONNECT ─────────>│                      │
  │                    │   example.com:443   │                      │
  │                    │                     │── TCP connect ─────>│
  │                    │<── HTTP/1.1 200 ───│                      │
  │                    │   Connection OK     │                      │
  │                    │── TLS ClientHello ─>│── TLS ClientHello ──>│
  │<── TLS ServerHello ─│<── TLS ServerHello ─│<── TLS ServerHello ──│
  │                    │                     │                      │
  │<══ Encrypted ══════│<══ Encrypted ══════│<══ Encrypted ════════│
  │   TLS Traffic      │   TLS Traffic      │   TLS Traffic       │

Detailed Steps

  1. Client sends TLS ClientHello to proxy
  2. Proxy extracts SNI from ClientHello
  3. Proxy sends CONNECT request to Privoxy
  4. Privoxy connects to real server
  5. Privoxy responds with 200 Connection established
  6. Proxy forwards buffered ClientHello
  7. Bidirectional tunneling begins

CONNECT Implementation

def handle_https_traffic(client_sock, initial_tls_data, host, port=443):
    # Connect to Privoxy
    privoxy_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    privoxy_sock.connect((PRIVOXY_IP, PRIVOXY_PORT))
    
    # Send CONNECT request
    connect_request = f'CONNECT {host}:{port} HTTP/1.1\r\nHost: {host}:{port}\r\n\r\n'
    privoxy_sock.sendall(connect_request.encode('utf-8'))
    
    # Wait for 200 response
    response = read_http_response_headers(privoxy_sock)
    if '200 Connection established' in response.decode():
        # Forward initial TLS data
        privoxy_sock.sendall(initial_tls_data)
        
        # Start transparent tunneling
        forward_data(client_sock, privoxy_sock)

Why This Works

  • No MITM: Proxy doesn't decrypt TLS
  • Transparent: Client thinks it's talking to real server
  • Privoxy integration: Benefits from Privoxy's filtering
  • End-to-end security: TLS encryption preserved

Implementation Approaches

1. Current Approach: Application Layer

# Pros: Simple, portable, secure
# Cons: Limited to HTTP/HTTPS

def handle_client(client_sock, client_addr):
    data = client_sock.recv(4096)
    
    if is_http_request(data):
        # Parse HTTP, extract Host header
        handle_http_traffic(client_sock, data, host)
        
    elif is_tls_handshake(data):
        # Parse TLS, extract SNI
        handle_https_traffic(client_sock, data, host)
        
    else:
        # Unknown protocol - drop connection
        log("Cannot handle unknown protocol")

2. Alternative: TCP Layer (Raw Sockets)

# Pros: Protocol agnostic, true transparency
# Cons: Complex, requires root, platform-specific

def intercept_tcp_packets():
    raw_sock = socket.socket(socket.AF_PACKET, socket.SOCK_RAW)
    
    while True:
        packet = raw_sock.recv(65535)
        
        if is_syn_packet(packet):
            # Send fake SYN-ACK to client
            send_fake_syn_ack(packet)
            
            # Create real connection to server
            server_sock = connect_to_real_server(packet)
            
            # Bridge the connections
            bridge_tcp_streams(client_sock, server_sock)

Comparison

Approach Complexity Root Required Protocol Support Portability
Application Layer Low No HTTP/HTTPS only High
TCP Layer High Yes All TCP protocols Low

Wireshark vs Proxy View

What Wireshark Captures (Complete Packets)

Frame 1: 74 bytes on wire
Ethernet II, Src: aa:bb:cc:dd:ee:ff, Dst: 11:22:33:44:55:66
Internet Protocol Version 4, Src: 192.168.1.100, Dst: 1.2.3.4
    Version: 4
    Header Length: 20 bytes
    Total Length: 60
    Identification: 0x1234
    Source: 192.168.1.100
    Destination: 1.2.3.4
Transmission Control Protocol, Src Port: 12345, Dst Port: 443
    Source Port: 12345
    Destination Port: 443
    Sequence number: 3824992001    ← TCP details visible
    Acknowledgment number: 1847293847
    Flags: 0x018 (PSH, ACK)
    Window size: 65535
Transport Layer Security
    TLSv1.3 Record Layer: Handshake Protocol: Client Hello
        Content Type: Handshake (22)
        Version: TLS 1.2 (0x0303)
        Handshake Protocol: Client Hello
            Server Name: example.com    ← SNI visible

What Your Proxy Sees (Application Data Only)

# Only the TLS record data:
data = b'\x16\x03\x03\x00\xfc\x01\x00\x00\xf8\x03\x03...'

# No access to:
# - Ethernet headers
# - IP addresses  
# - TCP sequence numbers
# - Network timing

TCP Handshake Example

Wireshark shows complete handshake:

12:34:56.789 192.168.1.100:54321 → 1.2.3.4:443 [SYN] Seq=0 Win=65535
12:34:56.790 1.2.3.4:443 → 192.168.1.100:54321 [SYN, ACK] Seq=0 Ack=1 Win=65535  
12:34:56.791 192.168.1.100:54321 → 1.2.3.4:443 [ACK] Seq=1 Ack=1 Win=65535
12:34:56.792 192.168.1.100:54321 → 1.2.3.4:443 [PSH, ACK] Seq=1 Ack=1 Len=253
    TLS 1.3 Client Hello [SNI: example.com]

Your proxy only sees the last frame's data:

[2025-08-14 12:34:56] New connection from ('192.168.1.100', 54321)
[2025-08-14 12:34:56] Received data (253 bytes)
[2025-08-14 12:34:56] Detected HTTPS traffic
[2025-08-14 12:34:56] HTTPS SNI: example.com

Troubleshooting & Edge Cases

Common Issues

1. Incomplete TLS Handshake

Problem: Large ClientHello split across multiple recv() calls

# Solution: Complete TLS record reading
def receive_complete_tls_handshake(sock, timeout=10):
    # Read TLS header first
    header = sock.recv(5)
    record_len = struct.unpack('!H', header[3:5])[0]
    
    # Read complete payload
    data = header
    remaining = record_len
    while remaining > 0:
        chunk = sock.recv(min(remaining, 4096))
        data += chunk
        remaining -= len(chunk)
    
    return data

2. Missing SNI

Problem: Some clients don't include SNI

# Solutions:
# 1. Use pfctl state tables (platform-specific)
# 2. Default host configuration
# 3. Log and drop connection

DEFAULT_HOST = os.getenv('TP_DEFAULT_HOST', None)
if not sni and DEFAULT_HOST:
    log(f"No SNI found, using default: {DEFAULT_HOST}")
    sni = DEFAULT_HOST

3. Non-HTTP/HTTPS Protocols

Problem: FTP, SMTP, custom protocols

# Current behavior: Drop connection
def handle_unknown_protocol(client_sock, data, addr):
    log(f"Unknown protocol from {addr}: {data[:20].hex()}")
    # Could implement protocol-specific handlers
    # Or forward based on pfctl destination info

Performance Considerations

1. Threading Model

# Current: One thread per connection
t = threading.Thread(target=handle_client, args=(sock, addr), daemon=True)

# Alternative: Async I/O
async def handle_client_async(client_sock, client_addr):
    # Use asyncio for better scalability

2. Buffer Sizes

# Default: 4096 bytes for initial read
data = client_sock.recv(4096)

# Large TLS handshakes might need more
# But larger buffers increase memory usage

3. Connection Pooling

# Current: New connection to Privoxy for each client
privoxy_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
privoxy_sock.connect((PRIVOXY_IP, PRIVOXY_PORT))

# Improvement: Connection pool for better performance

Monitoring & Debugging

1. Logging Levels

# Verbose mode (default)
TP_VERBOSE=1 python transparent_tcp_proxy.py

# Quiet mode
python transparent_tcp_proxy.py --quiet

2. Traffic Analysis

# Log first bytes of unknown protocols
log(f"Data (hex): {initial_data[:50].hex()}")
log(f"Data (ascii): {repr(initial_data[:50])}")

# Track connection counts
# Monitor Privoxy response codes
# Measure latency

3. Testing

# Test HTTP
curl -v http://example.com/

# Test HTTPS  
curl -v https://example.com/

# Test with different TLS clients
openssl s_client -connect example.com:443 -servername example.com

Configuration

Environment Variables

export TP_LISTEN_IP=0.0.0.0      # Listen interface
export TP_LISTEN_PORT=8001        # Listen port
export TP_PRIVOXY_IP=127.0.0.1   # Privoxy host
export TP_PRIVOXY_PORT=48082      # Privoxy port
export TP_VERBOSE=1               # Enable logging

Command Line Options

# Basic usage
python transparent_tcp_proxy.py

# Custom configuration
python transparent_tcp_proxy.py \
  --listen-ip 127.0.0.1 \
  --listen-port 8001 \
  --privoxy-ip 127.0.0.1 \
  --privoxy-port 8118 \
  --improved-tls \
  --quiet

# Help
python transparent_tcp_proxy.py --help

pfctl Rules

# Create pfctl rules file
cat > /tmp/proxy_rules.conf << EOF
rdr pass on en0 inet proto tcp from any to any port 80 -> 127.0.0.1 port 8001
rdr pass on en0 inet proto tcp from any to any port 443 -> 127.0.0.1 port 8001
EOF

# Load rules
sudo pfctl -f /tmp/proxy_rules.conf -e

# Check status
sudo pfctl -s nat

# Disable
sudo pfctl -d

Security Considerations

1. No MITM for HTTPS

  • ✅ TLS encryption preserved end-to-end
  • ✅ Client validates server certificates directly
  • ✅ No fake certificates needed
  • ❌ Cannot inspect HTTPS content

2. Privilege Requirements

  • ✅ No root privileges needed (application layer)
  • ✅ Standard socket permissions sufficient
  • ⚠️ pfctl rules require admin access

3. Privacy

  • ✅ HTTPS content remains private
  • ⚠️ SNI hostnames are logged
  • ⚠️ HTTP content is visible
  • ⚠️ Connection metadata tracked

4. Attack Surface

  • ⚠️ Proxy can be DoS target
  • ⚠️ Malformed TLS could crash parser
  • ✅ Limited blast radius (userspace only)

Future Improvements

1. Platform-Specific Destination Extraction

# macOS: Parse pfctl state tables
def get_original_destination_macos(sock):
    # Use pfctl -s state to find original dest
    # Parse connection state information
    
# Linux: Use SO_ORIGINAL_DST
def get_original_destination_linux(sock):
    SO_ORIGINAL_DST = 80
    return sock.getsockopt(socket.SOL_IP, SO_ORIGINAL_DST, 16)

2. Protocol Support

# Add handlers for other protocols
def handle_ftp_traffic(client_sock, data):
    # Parse FTP control channel
    # Extract PASV/PORT commands
    
def handle_smtp_traffic(client_sock, data):
    # Parse SMTP EHLO/HELO
    # Extract destination from envelope

3. Performance Optimizations

# Async I/O for better scaling
import asyncio

async def handle_client_async(reader, writer):
    data = await reader.read(4096)
    # Non-blocking I/O processing
    
# Connection pooling for Privoxy
class PrivoxyPool:
    def get_connection(self):
        # Reuse existing connections

4. Enhanced Monitoring

# Metrics collection
class ProxyMetrics:
    def __init__(self):
        self.http_requests = 0
        self.https_requests = 0
        self.unknown_protocols = 0
        self.errors = 0
    
    def export_metrics(self):
        # Prometheus/StatsD integration

Conclusion

This transparent proxy demonstrates the fundamental differences between network layers and how application-layer proxying can effectively handle most real-world traffic. While it has limitations (HTTPS content inspection, non-HTTP protocols), it provides a good balance of functionality, simplicity, and security for transparent proxying scenarios.

The key insight is that transparency doesn't require low-level packet manipulation - by working at the application layer and extracting destination information from protocol handshakes, we can achieve effective transparent proxying with much simpler and more maintainable code.