URL, URN, URI: Oh My!

[does this sound like common sense to you? Fantastic, please skip this post in its entirety. But lately I am stumbling in people confused about this every day, hence I going to write a post and start pointing them here]

Let me get this straight right away: “URL” and “URI” are NOT interchangeable terms. Given that those are cornerstone concepts of the entire Web-based authentication castle, confusion and misunderstandings in their respective roles and functions can come at a steep price. For the same reason, in this post I am going to cut my usual language flourish in favor of pretty blunt statements. I’ll also avoid trivia like expanding acronyms and similar, you can read the original specs for more details.
That said:

A URI is an identifier. It is meant to identify a resource of any kind, literally anything you can think of. It’s just a name following some syntax (more below).
A URL is a special kind of URI which is meant to specify the location of a resource available on the Internet.

Concretely, the key difference between the two is: a URL always refers to something that is network-addressable, whereas a URI is a logical identifier which might have no endpoint or network-level manifestation whatsoever.

The common rule of thumb is that if you paste a URL in a browser, you always get something back other than an error. Not really true, given that you might not be authorized or the corresponding resource might not support a straight GET, but it should give the idea. Examples of URL? Below:

All pretty concrete stuff, which your browser (or Web API client, FTP client, mail client, etc) would use to obtain the corresponding resource. Not much to explain here.
Want to see some examples of URI? Well, technically all of the above were URI, given that URLs are a subset of the larger URIs set. So, how about some “pure” URIs meant to identify, as opposed to locate?

I guess that for some of you the entries in this list looks less familiar. Let’s look at each of those in some detail, and it will become clear why identification and location are radically different concepts.

Entry #1 is the identifier of a SAML protocol binding. I took it from a Windows Azure AD metadata file, from the line below:

<SingleSignOnService 
  Location="https://accounts.accesscontrol.windows.net/929bfe53-8d2d-4d9e-a94d-dd3c121183b4/v2/saml2" 
  Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-Redirect"/>

That line is saying that if you want to sign on in your app using SAML, you should use the endpoint specified at location (yes, it’s a URL) and you should obey to the SAML binding HTTP-Redirect. A binding type is clearly an abstract concept, something that lives in our heads but has no network manifestation. Also, the form used here (URN-based, more about that below) takes all ambiguity out given that it does not even look as something that would work in a browser.

Entry #2 is also from the metadata doc, and is also about a very abstract concept: it indicates the signature algorithm that the STS described by the metadata will use for singing the tokens it issues. Interestingly enough, this does look like an URL; and if you paste it in a browser you don’t get an error! However, you don’t get the resource either; you are presented with some documentation and contextual information about the spec where this particular URI is defined. So, still not a location.

Entry #3 represents the Name claim type. Once again, it is an abstract concept (an attribute type) which is not locatable in itself. The STS uses this identifier in its metadata and in the tokens it issues to indicate that a given element should be interpreted as the subject’s name.

Entry #4 represents the identifier of the STS itself: it is the value in the token that indicates the issuer that originated it. Applications validating incoming tokens will verify that this value corresponds to the identifier of the issuer they trust. As for all the other URIs, that’s not network addressable (though the use of “https” is a bit unfortunate IMHO, given that it suggests a transport feature that is totally not relevant here).
Here there’s an interesting observation: the issuer URI remains the same regardless of which protocol is used for obtaining a token, while there are as many issuer URLs as there are supported protocols.

Entry #5 starts to get more interesting. That URI is what I have used to indicate the realm of the Web application I described in this post. The function of the realm in authentication protocols is to indicate to the STS the identity of the application for which a token is being requested. Furthermore, it is used by the STS to scope the tokens it issue to the application: by including the app identifier back in the token (does AudienceRestriction ring a bell?) it guarantees that the token cannot be reused against another app on a different realm. Long story short, that is an absolutely necessary piece of information, without which the SSO flow cannot take place: which leads me to the confusing part.
For usability reasons, development stacks don’t force you to enter a realm; they usually pick a reasonable default, so that you can click-click-click and F5. The most common default value for the realm is… the URL of the Web project you are developing. It is a good default choice, but unfortunately it created such a tight correlation between two different concepts – the URL at which the application is hosted and the URI identifying the app – that now a lot of people think that they are one and the same. By now you know that they really aren’t.
There is another reason for which it is convenient to think about the realm as an identifier. The boundaries of your applications/solutions are not necessarily defined by the application artifact per se, you decide if the security realm is limited to your frontend or if it includes other apps or tiers. Using the URL of one of the artifacts in your solution would make it awkward to access the other components, given that the resulting token would be seemingly scoped to the wrong granularity. Don’t worry too much if this last part is not too clear, it is not strictly required for getting the more general point here: however if you want to know more see this.

I saved the most philosophical case last. #6 represents the realm of one Web app targeting Windows Azure Web Sites. Interestingly, that looks exactly like the last URL in the first list; but in this case, it plays a completely different role; it identifies the app rather than providing its location. Want proof? Here there’s a mental experiment. Think of the moment in which I create the Visual Studio project for the application. At first, the app will be created on IIS express and will have a URL of the form http://localhost:34534 or similar. Now, say that I want to set up single sign on for the app already while I am doing development (hence on IIS express); also, say that I already know that once I am done with development I will deploy this to vibro.azurewebsites.net. As part of the SSO configuration process, I can already assign “https://vibro.azurewebsites.net” as a realm even if my current URL is localhost based and the network addressable endpoint for the WAWS site does not exist yet!
Note, as by now you know I don’t *have* to use the final URL of the app as a realm; I could accept the default realm value at creation time (which would be “http://localhost:34534”) and keep the same value when pushing the app in production in the cloud; however it would not be a very meaningful name, and if I’d do that often I would end up with a bunch of apps in production all named localhost:something which would not be great for managing them. Having the realm value reflect something that refers to the app is simply a good naming strategy, but you could just as well apply it to urn-based URIs.

——–

Still with me? Great! I believe that the above made y point, hence I could fold and go to sleep (it’s 2:30am on the night between Friday and Saturday: what’s wrong with me? ) but I would just like to spend a couple of words on explaining what the heck those “urn:…” thingies are.

All URIs (hence also all URLs) follow the same syntax:

<scheme>:<scheme-specific-part>

You have seen the same pattern often: the head part (the scheme) gives hints to the consumer which understand that particular scheme about how to interpret the information that follows.
Now, the above is not especially prescriptive. For URLs, the scheme is often a protocol which will provide its own guidance about how to work with the scheme-specific-parts. To provide some normative guidance for names as well, the RCF described a URI based naming schema (URN) which is what I used in the various examples beginning with “urn:”.

Thank you for having read all the way to here, hats off for your stamina !
If you have questions about this URI!=URL thing, or you think parts of this post are confusing, please leave a comment: I really really want to make sure that this is as crisp as it can get

One Comment

Leave a Reply Cancel reply