[–] viraptor link

With short enough expiry you can actually treat them the same. Where short means hours. If your response time, pushing updates, etc. is going to take hours anyway, revocation starts to lose meaning. And that's before we start talking about methods of revocation which, on public internet, virtually don't exist. Or internal systems, where barely any library actually supports anything better than manually distributed CRLs.

reply

[–] cestith link

I think the more common case than a revocation is replacing an expiring certificate. I don't have hard data. It sure seems to me that short-lived certificates tend to rotate out far more often than they need to be revoked.

reply

[–] viraptor link

> It sure seems to me that short-lived certificates tend to rotate out far more often than they need to be revoked.

For large majority of companies, would they even spot that their keys have been stolen? That's a few steps before revocation itself.

reply

[–] colmmacc link

This is a good write up, and it's awesome to see on-line rotation of certificates.

But (there was always a but coming) ... the word "rotation" is over-used here and very dangerous, because it doesn't emphasize what's important. To many it means "deploying a new credential". That's not that important at all, at best it's a means to an end at worst it's make-work. What's important is that credentials are revoked. It's exactly like the important part of backup systems being that we can restore (and we should really call them "restore" systems).

When a credential becomes compromised, what you want to do is revoke it and make sure it stays revoked, other wise the attacker's goal is complete. So think of it a "Revocation" system, and call it that.

Viewed in that context, it become more apparent that the write-up doesn't mention, or test or check, that the credential actually is revoked and doesn't work any more. But that's the most critical step. Even if you're relying only on expiration times (which seems unsafe!) it's important to check for broken checks (like fail-open configurations that let everything in), broken clocks, etc ...

reply

[–] stanleydrew link

I'm actively considering what it would take to set up a for-profit ACME CA, and pricing based on rate limits might be the key business model insight I needed. Thanks!

reply

[–] jvehent link

LE needs $2MM/year to run and bootstrapped under an existing CA, so there's your starting point ;)

It might be easier to resell someone else's certificates.

reply

[–] zalmoxes link

Nice writeup!

I wish there was a CA out there that could let you requests new certs more frequently.

Yes there's Let's Encrypt, which is amazing and works great but the ratelimits[1] really kill you if you're not careful. I've had a few issues where I've triggered the LE ratelimit with a production domain and got locked out of making new certs for a whole week. I would gladly pay for an ACME CA which does not enforce these ratelimits.

[1] https://letsencrypt.org/docs/rate-limits/

reply

[–] amenghra link

https://github.com/square/ghostunnel/ is written in Go and does hitless cert reloading too.

reply

[–] diogomonicapt link

Author here, I actually added a footnote exactly because of that fact: https://diogomonica.com/2017/01/11/hitless-tls-certificate-r...

reply

[–] kyrra link

So I'm not 100% certain on this, but this flow seems like like it would be a good candidate for atomic.Value[0]? The the mutexes could be removed entirely. That way you don't need to get a lock on every config read.

[0] https://golang.org/pkg/sync/atomic/#example_Value_config

reply