Why you should consider WS-Trust…
…even if you plan to play only within the boundaries of your garden.
Even if you think that you will never need to deal with the concept of federation, there are still very good reasons for setting up an identity provider. Instead of providing the official ones, of which I’m sure every developer is completely convinced by now, here I’m going to state a simple fact that will give also the operations guys something to think about.
The “Hello World” of WS-Security lingo is the evergreen webmethod call with X509 certificates playing the part of the cryptographic goodies: a cert with a private key in the local machine store, its public key and another one with a private key in the current user store. Once you get the hang of it, everything goes swiftly and it’s only natural to think of extending it to a real system. However, history teaches that not everything that works well in the small scale retains this good property when the game gets harder. Remember: certificates need to be
- Released. You need one or more working CA, properly configured for issuing the right kind (suitable for intended usages, business logic for the informative fields) of certificates. The operations related to the creation of a new client or server need to be extended by an issuing step: suitable logic has to be determined and implemented.
- Distributed. Certificates must reside in the correct place. That means packing and moving. Different kinds of endpoints imply differences in the deployment. And so on. There are “critical sections” that must be taken into account. (When I’m distributing a new certificate, when should I dispose of the old one without breaking clients? Should I have a timeframe with the old and the new both valid? How do I know if there is a long running trans going on that is still using the old?) And so on.
- Maintained. Surprise: after some time, certificates expire!!!:-) You usually don’t think of it, when you play with the examples… but when it happens, believe me you can really feel the sparks in the air! If you used a static distribution system, tracking down loosely coupled clients that may still have the old public key for a service can be a real pain. You keep finding them for months after the renewal, especially if the client lives outside of your network. There’s the option of not changing the key couple when renewing a certificate, but besides not being exactly best practice it doesn’t solve the case in which the certificate was compromised, and instead of a renewal you are actually changing it completely.
Don’t get me wrong: I’m NOT suggesting that you should not use certificates. Certificates are an incredibly valuable asset, and they are absolutely a cornerstone of distributed systems. I always fight for having the chance of using certificates. Active Directory on W2K3 does wonders with it: CA templates, autoenrollment and roaming profiles are things of beauty.
What I’m suggesting is that managing a PKI at enterprise scale IS A SERIOUS MATTER, which gives you great powers but demands careful planning and good care.
That said, let’s imagine extending the simple example to a full enterprise-scale ecosystem. A simplifying hypothesis could be that the certificates will be used to identify machines, rather than users and specific web services. In fact, since what interests us is the order of magnitude rather than the specific numbers, even the dream of the most dedicated watchmaker (every user with a personal smartcard with a full certificate on it, every service with its own certificate) would still make my point. In fact, don’t take the numbers below too strictly: I’ll make explicit and implicit simplifications all the time.
Let Nc be the number of client machines, and Ns the number of server machines.
Every client machine will own a full (public + private) certificate, and the same holds for every server machine.
Assuming that we want to enable every client to securely invoke every service on every server, we must deliver the full list of all public keys of all servers to every client store. That can be done in many ways: for example by obtaining a key only contextually to a call, in order to acquire only the ones actually needed. Anyway, in the worst (full exhaustion) case all public keys will eventually be cached on every client. So the full number of deployed certificate instances will be
Ncert = Nc+Ns+(Nc*Ns)
Note that some servers can be clients as well, but even in the degenerated case (the set of clients and the set of servers fully overlap, so Nc=Ns) our result will still be quadratic (N+(N*(N-1)) = N+N^2-N=N^2).
Now: if you take into account all the things that you have to do to keep the certificate party running smooth, I’d say that a quadratic trend should raise some eyebrows…
Let’s try to introduce an authority entity to the scenario and let’s see what happens.
Imagine we dedicate one machine to the authority service. Let’s install on that machine a full certificate, plus the list of all public keys of all servers. Practically, on that machine we will have the same situation we had on all clients in the former scenario.
Suppose that, every time a client C wants to contact a web service S, it first asks (WS-Trust Rqst) the authority A for an intermediate token containing a session key K, say a SAML token with surprise. Since the authority owns the public key of S, the token can contain a copy of the session key enciphered for S, be it Ks (and since C will have signed its request, A will have the chance to cipher a copy of the session key for C as well, be it Kc).
Once C owns the session key, it can use it to secure its first call to S: Ks will be attached to the call, so S will be able to obtain K and read the message (typically the negotiation for yet another session key, maybe a WS-SecureConversation context).
Below a sketch, courtesy of OneNote & M200 (disregard most S/E containment hierarchy, it’s just for giving the idea. It doesn’t litterally mean that you encrypt and sign the same XML parts all the time).
Now: the details of how it works are not the main point here, what I’d like to remark is that at the price of some extra roundtrip we have a system that not only is as secure as before (and it could even be faster, since we could use symmetric algorithms), but it has WAY less certificates scattered around. Let’s do the math
Ncert = 2*Nc + 2*Ns + 1 + Ns
Not baaaaad at all. Especially if you watch the comparative graph (where Nc and Ns collapse in N, just to show the trend). UPDATE: 3N+1 should be 4N+1, but what count is the trend and I’m too lazy to change the image 🙂
And as stated at the very beginning, here we are purposefully forgetting perimeter services, performance improvements (guess what? symmetric keys not only work faster, they are cacheable), chances of maintaining the authorization logic in a single point, ease of integration with other federations…
If your backoffice team is not fond of gardening, they’ll love you…
.