Skip to content

Implement distributed storage for Caddy certificates #31

@psviderski

Description

@psviderski

When using multiple machines with Caddy behind a TCP load balancer or a DNS record with multiple IPs, certificates can take a while to issue. The acme challenge bounces around to different machines until it hits the correct one. Moreover, if Cloudflare proxy record is used with multiple addresses and one instance is able to successfully issue a certificate, Cloudflare may not try to send requests for ACME challenge to other instances or send it very rarely. In practice, I saw the cases when instances failed to issue certificates for days or weeks.

Possible solutions:

  • Share the certificate storage (file system) between caddy containers
  • Consider using alternative storage module for certificates or implement our own, e.g. backed by Corrosion: https://caddyserver.com/docs/json/storage/

As the first step for implementing a distributed storage, we can fork the default file_system storage that uses the certmagic's one and only implement Store and Load methods for keys that store challenge tokens. We can store challenge tokens in Corrosion that will share them among all Caddy instances.

We can perhaps start even without a distributed lock. In the case when multiple instances start issuing a certificate for the same domain at the same time, they may override each other's token but after a few retries they all should be able to succeed. We can try adding optimistic locking by failing if a record for the key exists and ignore it after a reasonable timeout.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions