Skip to content

Latest commit

 

History

History
711 lines (525 loc) · 15.7 KB

File metadata and controls

711 lines (525 loc) · 15.7 KB

HomeTroubleshooting Guide

Troubleshooting Guide

Solutions for common cachekit issues and error messages


Common Errors

Circuit Breaker Errors

Issue: Circuit breaker is open and requests are failing

What it means:

  • Too many transient errors (ConnectionError, TimeoutError) detected
  • Circuit breaker protection is preventing cascading failures
  • Cache is temporarily disabled to avoid overwhelming backend

Solutions:

  1. Check Redis availability:
redis-cli ping
# Should output: PONG
  1. Verify Redis connection string:
# Check what URL is being used
env | grep REDIS
export CACHEKIT_REDIS_URL=redis://localhost:6379/0
  1. Wait for circuit breaker to reset:
  • Circuit breaker automatically resets after timeout
  • Default: 60 seconds (configurable)
  • During recovery: requests execute function without caching
  1. Increase timeout if network is slow:
export CACHEKIT_SOCKET_TIMEOUT=5.0
export CACHEKIT_SOCKET_CONNECT_TIMEOUT=5.0

Example handling:

import logging
from cachekit import cache

logger = logging.getLogger(__name__)

@cache()
def safe_computation(data):
    try:
        return expensive_operation(data)
    except Exception as e:
        logger.error(f"Computation error: {e}")
        raise  # Let circuit breaker catch it
Serialization Failures

Issue: "Could not serialize data" or TypeError during caching

What it means:

  • Cache attempted to serialize function result
  • Data type is not compatible with chosen serializer
  • MessagePack (default) only supports basic types

Solutions:

  1. For custom objects, convert to a supported type before caching:
from cachekit import cache

# Convert to dict before caching
@cache
def get_custom_object():
    obj = MyCustomClass()
    return obj.__dict__  # or use obj.to_dict() / dataclasses.asdict(obj)
  1. For DataFrames, use ArrowSerializer:
from cachekit import cache
from cachekit.serializers import ArrowSerializer
import pandas as pd

@cache(serializer=ArrowSerializer())
def get_dataframe():
    return pd.DataFrame({"a": [1, 2, 3]})
  1. For JSON-compatible data, use default (MessagePack):
# Default serializer handles: dict, list, str, int, float, bool, None
@cache()
def get_json_data():
    return {"key": "value", "count": 42}
  1. For Pydantic models, convert to dict first:
from pydantic import BaseModel
from cachekit import cache

class User(BaseModel):
    id: int
    name: str

# Convert model to dict before caching
@cache()
def get_user(user_id: int) -> dict:
    user = fetch_user_model(user_id)  # Returns Pydantic model
    return user.model_dump()  # Explicit conversion

Why not auto-detect Pydantic models? See Serializer Guide - Caching Pydantic Models for the detailed rationale.

  1. Check what serializer is installed:
from cachekit.serializers import DEFAULT_SERIALIZER
print(f"Active serializer: {DEFAULT_SERIALIZER}")
Connection Issues

Issue: Redis connection timeout or refused

Error messages:

ConnectionError: Error -2 connecting to localhost:6379
TimeoutError: Connection timeout
ConnectionRefusedError: [Errno 111] Connection refused

Solutions:

  1. Start Redis locally:
# Using Docker (recommended)
docker run -d -p 6379:6379 redis:latest

# Verify connection
redis-cli ping
# Output: PONG
  1. Verify connection URL:
import redis

# Test connection before using decorator
try:
    r = redis.from_url("redis://localhost:6379/0")
    print(r.ping())
except Exception as e:
    print(f"Connection failed: {e}")
  1. Check firewall/network:
# On same machine
redis-cli -h localhost -p 6379 ping

# Across network (replace host)
redis-cli -h redis-server.example.com -p 6379 ping
  1. For timeout issues, increase timeout values:
export CACHEKIT_SOCKET_TIMEOUT=5.0
export CACHEKIT_SOCKET_CONNECT_TIMEOUT=5.0
  1. Verify Redis is running:
# Check if port 6379 is listening
netstat -tulpn | grep 6379
# or
lsof -i :6379
Encryption Issues

Issue: Decryption failures or key-related errors

Error messages:

"CACHEKIT_MASTER_KEY not set"
"CACHEKIT_MASTER_KEY must be hex-encoded, minimum 32 bytes"
"Decryption failed: authentication tag verification failed"

Solutions:

See Zero-Knowledge Encryption - Troubleshooting

Common causes:

  1. Master key not set when using @cache.secure()
  2. Master key format invalid (not hex-encoded)
  3. Master key rotated (can't decrypt old cached data)
  4. Data corruption during storage/retrieval

Quick fix:

# Generate valid encryption key
export CACHEKIT_MASTER_KEY=$(openssl rand -hex 32)

# Clear cache if key was rotated
redis-cli FLUSHDB

# Restart application
python app.py

CachekitIO Backend Issues

Can't Connect to cachekit.io

Issue: Requests to cachekit.io fail immediately or time out

What it means:

  • API key not configured
  • Wrong endpoint URL
  • Network/firewall blocking outbound HTTPS

Solutions:

  1. Verify API key is set:
echo $CACHEKIT_API_KEY
# Should output your key — if blank, set it:
export CACHEKIT_API_KEY=your_api_key_here
  1. Check the API URL:
# Default — leave unset unless self-hosting
echo $CACHEKIT_API_URL
# Expected: unset or https://api.cachekit.io
  1. Test network connectivity:
curl -sf https://api.cachekit.io/healthz
# Should return 200 OK — if it hangs, check firewall/proxy
401 Unauthorized

Issue: cachekit.io returns 401 Unauthorized

What it means:

  • API key is invalid, revoked, or expired
  • Key is set but doesn't match the project

Solutions:

  1. Confirm the key is correct:
# Compare against the key shown in your cachekit.io dashboard
echo $CACHEKIT_API_KEY
  1. Request a new key at cachekit.io and rotate:
export CACHEKIT_API_KEY=new_key_here
  1. Check for trailing whitespace or newlines if the key was copy-pasted:
python -c "import os; k=os.getenv('CACHEKIT_API_KEY',''); print(repr(k))"
# Key must not start/end with spaces or \n
429 Rate Limited

Issue: cachekit.io returns 429 Too Many Requests

What it means:

  • Request rate exceeds your plan's limit
  • Burst traffic spike hitting per-second cap

Solutions:

  1. Reduce cache miss rate (more hits = fewer upstream calls):
# Increase TTL to reduce backend round-trips
@cache(ttl=3600)  # 1-hour TTL instead of short TTL
def expensive_query(id):
    return fetch(id)
  1. The circuit breaker handles backoff automatically — no manual retry logic needed. If you're hitting 429 consistently, reduce request concurrency or upgrade your plan.

  2. Check your current usage at cachekit.io dashboard.

SSRF Rejection

Issue: Request rejected with SSRF protection error

What it means:

  • A custom CACHEKIT_API_URL points to an internal/private host
  • SSRF protection blocks requests to non-allowlisted destinations
  • Only api.cachekit.io is permitted by default

Solutions:

  1. Use the default endpoint (unset any custom URL):
unset CACHEKIT_API_URL
  1. If self-hosting, confirm your host is correctly configured and reachable:
export CACHEKIT_API_URL=https://your-self-hosted-endpoint.example.com
curl -sf $CACHEKIT_API_URL/healthz
  1. Never point CACHEKIT_API_URL at localhost or internal IPs — these are blocked by SSRF protection regardless of environment.
Connection Timeout

Issue: Requests to cachekit.io hang and eventually time out

What it means:

  • High network latency between your environment and api.cachekit.io
  • Timeout configured too low for your network conditions
  • Transient outage or overloaded backend

Solutions:

  1. Check configured timeout:
echo $CACHEKIT_SOCKET_TIMEOUT
echo $CACHEKIT_SOCKET_CONNECT_TIMEOUT
  1. Increase timeout for high-latency environments:
export CACHEKIT_SOCKET_TIMEOUT=10.0
export CACHEKIT_SOCKET_CONNECT_TIMEOUT=5.0
  1. Measure actual latency:
curl -o /dev/null -s -w "Connect: %{time_connect}s  Total: %{time_total}s\n" \
    https://api.cachekit.io/healthz
  1. Circuit breaker will engage automatically after repeated timeouts, allowing your application to continue running without blocking on the cache backend.

Error Code Reference

E001: CACHEKIT_MASTER_KEY not set

Message: "CACHEKIT_MASTER_KEY environment variable must be set"

Cause: Using @cache.secure() without encryption key configured

When it occurs:

# WRONG - will raise E001
@cache.secure(ttl=300)
def get_sensitive_data():
    return secrets

Solution:

# Generate and export master key
export CACHEKIT_MASTER_KEY=$(openssl rand -hex 32)
E002: Invalid Key Format

Message: "CACHEKIT_MASTER_KEY must be hex-encoded, minimum 32 bytes"

Cause: Master key is not valid hex or too short

Invalid examples:

export CACHEKIT_MASTER_KEY="my-secret-key"  # Not hex
export CACHEKIT_MASTER_KEY="abcd1234"  # Too short

Solution:

# Generate valid 64-character hex string (32 bytes)
export CACHEKIT_MASTER_KEY=$(openssl rand -hex 32)

# Verify length
python -c "import os; print(len(os.getenv('CACHEKIT_MASTER_KEY', '')))"
# Output: 64
E003: Decryption Failed - Authentication Tag Mismatch

Message: "Decryption failed: authentication tag verification failed"

Cause:

  • Master key was changed (can't decrypt old data)
  • Data corruption during storage or retrieval
  • Encrypted data was modified

Solutions:

  1. Key was rotated (most common):
# Clear Redis to remove incompatible cached data
redis-cli FLUSHDB

# Keep new key and restart application
export CACHEKIT_MASTER_KEY=$(openssl rand -hex 32)
python app.py
  1. Wrong key still in use:
# Verify current key
python -c "import os; print(os.getenv('CACHEKIT_MASTER_KEY')[:16] + '...')"

# Revert to original key if available
export CACHEKIT_MASTER_KEY=<original-key>
  1. Data corruption:
# If data is corrupted, clearing cache is safe
redis-cli FLUSHDB

# Function will recompute and re-cache with current key
E004: Serialization Compatibility Error

Message: "Could not serialize object of type X"

Cause: Data type not supported by serializer

When it occurs:

from cachekit import cache
import datetime

# WRONG - datetime not serializable by default serializer
@cache()
def get_timestamp():
    return datetime.datetime.now()

Solution:

from cachekit import cache
from cachekit.serializers import OrjsonSerializer
import datetime

# OrjsonSerializer handles datetime natively (converts to ISO-8601 string)
@cache(serializer=OrjsonSerializer(), backend=None)
def get_timestamp():
    return {"ts": datetime.datetime.now()}

</details>

---

## Recovery Strategies

<details>
<summary><strong>Cache Invalidation</strong></summary>

**Clear entire cache**:
```bash
redis-cli FLUSHDB

Clear by namespace (if implemented):

from cachekit import cache

@cache(namespace="users")
def get_user(user_id):
    return fetch_user(user_id)

# Manual invalidation
# Note: Current cachekit doesn't provide built-in invalidation
# Clear Redis and re-cache on next call
redis-cli FLUSHDB

Per-function cache clearing (workaround):

from cachekit import cache
import redis

r = redis.from_url("redis://localhost:6379/0")

def invalidate_user_cache(user_id):
    key = f"users:get_user:{user_id}"
    r.delete(key)

@cache(namespace="users")
def get_user(user_id):
    return fetch_user(user_id)

# Invalidate when user data changes
user = update_user(user_id, data)
invalidate_user_cache(user_id)
Graceful Degradation

Fallback when cache fails:

from cachekit import cache
import logging

logger = logging.getLogger(__name__)

@cache(ttl=3600)
def expensive_operation(x):
    try:
        return compute_expensive_result(x)
    except Exception as e:
        logger.warning(f"Computation failed: {e}")
        # Return fallback value or raise
        return fallback_value(x)

Check cache health:

import redis

def is_redis_healthy():
    try:
        r = redis.from_url("redis://localhost:6379/0")
        r.ping()
        return True
    except Exception:
        return False

# Use in monitoring
if not is_redis_healthy():
    logger.warning("Redis unavailable - cache disabled")
Health Monitoring

Monitor cache hits/misses:

from cachekit import cache
import time

cache_stats = {"hits": 0, "misses": 0}

@cache(ttl=3600)
def monitored_function(x):
    return expensive_operation(x)

# Manual tracking (built-in metrics coming soon)
def get_hit_rate():
    total = cache_stats["hits"] + cache_stats["misses"]
    if total == 0:
        return 0
    return cache_stats["hits"] / total

Health check endpoint:

import redis
from flask import jsonify

@app.route("/health/cache")
def cache_health():
    try:
        r = redis.from_url("redis://localhost:6379/0")
        r.ping()
        return jsonify({"status": "healthy"}), 200
    except Exception as e:
        return jsonify({"status": "unhealthy", "error": str(e)}), 503

Debugging

Enable detailed logging
import logging

# Set cachekit to DEBUG level
logging.getLogger("cachekit").setLevel(logging.DEBUG)

# Set Redis client to DEBUG level
logging.getLogger("redis").setLevel(logging.DEBUG)

# View logs
logging.basicConfig(level=logging.DEBUG)
Check cache key format
from cachekit.core import generate_cache_key

# See what key is generated for function
key = generate_cache_key("get_user", (123,), {})
print(f"Cache key: {key}")
# Output: cachekit:get_user:abc123def456...
Test serializer independently
from cachekit.serializers import StandardSerializer

serializer = StandardSerializer()

# Test serialization
data = {"key": "value"}
encoded = serializer.serialize(data)
print(f"Encoded: {encoded[:50]}...")

# Test deserialization
decoded = serializer.deserialize(encoded)
print(f"Decoded matches: {decoded == data}")

See Also