Revisiting the Million Ways to Markup Your Content
Ten years ago, I wrote about The Million Ways to Markup Your Content, where I covered the then main ways (HTML5, Microdata, Microformats, OGP, JSON-LD) to mark up one’s content for the semantic web, or more accurately for the search engines. When I last year moved my blog to Eleventy, I wrote my layouts from scratch and left most of the semantic markup to the floor for a couple of reasons.
The first, and main reason, is that this blog is meant for humans to read and not for machines to consume. The second is that clearly the machines do not need meticuously marked up content to get sense of it, LLMs seem to manage just fine with the mess that is the actual web. Third, because of the previous point, is it really worth it to please the tech giants, and should I focus more on smaller web initiatives like IndieWeb.
A decade ago I wrote how Microformats were on the way out, because search engines didn’t support them and there was a lot of overlap with Microdata that search engines did support. Looking at Google’s Structured Data documentation, it still strongly recommends JSON-LD but claims to also support Microdata and RDFa. Microdata, however, as a standard seems to have been abandoned (again) around 2019, which probably tells more about how much Google cares about structured data in general especially when in its list of supported formats the link for Microdata is broken[1]. Turns out I was wrong, probably because the web changed.
Although, as it did back during the original post, it would feel silly to duplicate this site’s content in JSON-LD just for the machines. It would make sense if this blog was using JSON instead of HTML, which I guess is true for many of the Javascript and API driven web apps of today. However, for invidivual blogs where the HTML is handcrafted and the pipeline revoles around Markdown files it makes little sense. There the focus is on the documents to be read, not on the data to be exchanged. This change from HTML documents to web apps[2] that use HTML mostly as a presentation layer makes JSON-LD more preferable, as instead of duplicating content, it’s just rawer, serialized version of it.
However, this is a static site, albeit built with a Javascript based generator. So, in this latest iteration of this blog, the content is marked up with (hopefully) valid HTML5 with some Microformats2 sprinkled in. Not for the search engines or data scrapers, but to enable non-siloed human interactions through standards like Webmention - SEO be damned.
Probably still attempting to link to a spec that was abandoned already in 2013. ↩︎
There is probably a blog post in the story of this evolution(?)[3] from documents to applications, but that topic is a bit outside of the scope of this post. ↩︎
The hypertext people probably would not call it an evolution, but a misstep. ↩︎