Atom feeds add one burden to the publisher, your entries require a lasting, well-formed unique identifier1, if you intend to follow the specification. Other formats do not add as strict rules, RSS and JSON Feed only require it to be unique, although JSON Feed relaxes it further to only be locally unique. Both usually advocate for the post’s URL (or permalink), but this isn’t necessarily the best option, because although cool URIs don’t change, in practice there’s very little perma in permalinks.
There’s not really anything new in this post that wouldn’t have been covered 20 years ago in Mark Pilgirm’s post on “How to make a good ID in Atom”2.
It’s easy to generate a URL as an ID at run-time, but its quality is based on the assumption that your URL structure3 and domain never change. And this “never” part is hard, because URLs are not really persistent even if you handle everything on your end. Domains are rented after all.
As a quick refresher, URLs are web addresses that you can use in your browser to find a page. URIs are a subset of URLs that also include other resource indicators that do not (directly) dereference to a web location like a URL does. These were previously called URNs. However, in practice URL means a thing you can put into a web browser and URI a thing that does point to a thing, but can’t be necessarily put into web browser4.
So, what other options we have for good Atom IDs, if URLs are not going to cut it?
The first that comes to mind for easily creating something that is globally unique, is to roll some dice and generate an UUID. These are entirely opaque. The good thing is that there is an existing namespace5 for UUID URIs (urn:uuid:fbffecb5-bb5c-4c4b-95e3-7e03eb807b18
) so they pass Atom ID requirements. However, you need to somehow store this.
And if you can store your ID, you can also use the post’s permalink at the time of creation as the ID for the post. Just remember to always use this stored URL instead of a run-time generated, in case any metadata has changed that could affect the URL of that post.
In Pilgrim’s post, he advocated for tag
URIs like tag:kalifi.org,2024-01-01:/2019/11/link-rot.html
. They solve the problem of a domain’s (or other identification’s) validity over time by adding a date component. Essentially it adds a claim that this object’s authority was valid at this time. You can then go and identify the object, as Pilgrim suggests, by creation timestamp6, or to defeat the point a bit, with relative permalink as in above example. However, you porbably again need to store this ID somewhere, or have at least authority metadata available somewhere to regenerate it on the fly.
Since Pilgrim’s post, there has been a new addition, the Named Information (NI) URI scheme. Like Magnet links of BitTorrent and other p2p networks, it identifies the resource by its hash. Content hashing is a bit infeasbile for blogs, though, because any change to the content, like a typo fix, will change the hash7. However, the RFC for Named Identifiers doesn’t require the use of full content8 for hashes:
Other than in the aforementioned special case where public keys are used, we do not specify the hash function input here. Other specifications are expected to define this.
For blogging purposes, one could just hash some metadata, like creation date and title, and use that as the Named Identifier. However, the only difference to tag
URIs would be that the authority part of the URI is optional here. Otherwise you’d be essentially just hashing a tag
URI.
Anyway, we are a bit bikeshedding here. Ultimately, a URL is natural and also a globally unique identifier for the resource. It’s probably persistent for some number of years. Many feed processors make an assumption that Atom’s ID is a permalink just like for other feed formats. And ultimately, when your posts’ permalinks change, the effect is minor nuisance that some posts are duplicated and marked wrongly as unread in your readers RSS readers9.
If you can esaily store the IDs of your posts somewhere, be it in a databse or front matter, your choice of it hardly matters - as long as you generate them on creation and let them be. If you need to regenerate them, tag
is probably your best bet because it lets you identify the object any way you want from relative permalink to post title or creation timestamp.
You’re screwed anyway if/when your permalink structure changes, because then you need to have logic to regenerate old style IDs for older posts and newer style IDs for newer posts. Some ingenious use of NI scheme might work around this problem, but what that would look like is beyond this blog post.