+++ date = "2022-11-30T19:57:00+00:00" publishdate = "2023-12-29T07:08:55+00:00" title = "Extensible Stylesheets" slug = "extensible-stylesheets" author = "Thedro" tags = ["xml"] type = "posts" summary = "The browser has a peculiar dimensionality that natively supports moving around complexity between different data and language domains." draft = "" syntax = "1" toc = "" updated = "" +++ ![XML Logo](/images/extensible-stylesheets.png " `XML` Logo" ) The browser has a peculiar dimensionality that natively supports moving around complexity between different data and language domains. One of those domains is `XML` (Extensible Markup Language) and it comes practically in the form of `RSS` (Really Simple Syndication) Feeds, `Atom` Feeds, Site Maps, `OPML` (Outline Processor Markup Language) Outlines and other various `XML` representations. [`XSLT`](https://developer.mozilla.org/en-US/docs/Web/XSLT) (Extensible Stylesheet Language Transformations) is a {{< sidenote mark="language" set="left" >}} See the [XSL Transformations Version 3.0](https://www.w3.org/TR/xslt-30/) specification. {{< /sidenote >}} that transforms `XML` into different output formats. You can expose and transform big or tiny blobs of `atom.xml`, `rss.xml`, `sitemap.xml`, and `opml.xml` files into {{< sidenote mark="different" set="right" >}} Think plain text or `PDF` (Portable Document Format) but that's outside the scope of this article. {{< /sidenote >}} presentation formats. In some respects, [`XSLT`](https://www.w3.org/TR/xslt-30/) is considered ["dead"](https://lists.w3.org/Archives/Public/public-forms/2013Oct/0013.html) technology, but take the word _dead_ with a {{< sidenote mark="grain" set="left" >}} "`X` is _dead_, and `Y` killed it" is a common trope on the Internet. You can find articles and comments for any `X` of your choosing. {{< /sidenote >}} of salt. It's not uncommon to right click a site, and lo and behold see a `DTD` (Document Type Definition) for `XHTML` (EXtensible HyperText Markup Language) in its source generated from `XML` --- you'd be surprised. `XSLT` operates as an `XML` templating language and a rather verbose one at that. If you go deep enough, the verbosity gets unwieldy and like all programming shenanigans it's a perpetual rabbit hole. Here's my practical notes for working with `XML` and `XSLT` in a web context for adding style and presentation while maintaining a bit of sanity. ## Formats, File Extensions, and MIME types The `XSLT` transformations discussed here will be limited to [`XHTML`](https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/XHTML) output. Raw `XML` in the browser has no styles associated with it ([example](/post/rss.xml)) so styles are added with `XSLT` ([example](https://micro.thedroneely.com/m/tdro/rss.xml)). The `MIME` (Multipurpose Internet Mail Extensions) [type definition](https://www.w3.org/TR/xslt20/#xslt-mime-definition) for `XSLT` is `application/xslt+xml`. A file extension ending in `.xsl` or `.xslt` is the commonly accepted and used form. The mimetype definition for `XHTML` is `application/xhtml+xml` but it's usually served using the `text/html` content type for browsers to assume `HTML` instead of `XML` parsing. `XHTML` has {{< sidenote mark="differences" set="left" >}} `HTML` vs `XHTML` is an epic and historic flame war. Think tabs vs. spaces, self--closing tags vs. non self--closing tags or any other versus trope you can imagine. {{< /sidenote >}} from `HTML` which you can take a look at in this [`XHTML` in a nutshell article](https://blog.whatwg.org/xhtml5-in-a-nutshell). ## XML Validation and Formatting You can _validate_ and check an `XML` document for well formedness using [`xmllint`](https://man.archlinux.org/man/xmllint.1) from the [`libxml2`](https://repology.org/project/libxml2/versions) {{< sidenote mark="package." set="right" >}} Check your [Linux distribution repositories](https://repology.org/repositories/statistics). {{< /sidenote >}} `W3C` (The World Wide Web Consortium) offers an online [feed validation service](https://validator.w3.org/feed/), but an offline validator sets up a better feedback loop and is a lot more robust and {{< sidenote mark="efficient." set="left" >}} If you're behind a [CGNAT](https://en.wikipedia.org/wiki/Carrier-grade_NAT) like me, the Internet is effectively a captcha game. CaptchaNETâ„¢. {{< /sidenote >}} `XML` has multiple validation grammars in the form of schemas. [`RELAX NG`](https://relaxng.org/) (REgular LAnguage for `XML` Next Generation) is one of those schema language formats. Schema examples can be found in [`RFCs`](https://datatracker.ietf.org/doc/html/rfc4287#appendix-B) (Request for Comments) or in niche places around the web --- for example here's a [`RSS` rng file](https://www.w3.org/2002/09/rss-rng/rss.rng), an [`ATOM` rnc file](https://gist.github.com/tommorris/3725394#file-atom-rnc), and an [`ATOM` rng file](https://gist.github.com/tommorris/3725394#file-atom-rng). The catch is that these validation schema files may have differing use cases or may be out of spec due to time, but they're still worth looking at. `RELAX NG` has both a standard `xml.rng` syntax and a [compact](https://relaxng.org/compact-tutorial-20030326.html) `xml.rnc` syntax. Offline validation with `xmllint` does not {{< sidenote mark="support" set="right" >}} According to the `xmllint` manual it supports [`RELAX NG`](https://relaxng.org/), [`WXS`](https://www.w3.org/XML/Schema) (`W3C` `XML` Schema), and [Schematron](https://www.schematron.com/). {{< /sidenote >}} `rnc` compact schema syntax --- but `rng` works. Schema {{< sidenote mark="conversion" set="left" >}} As seen in this [blog post](https://cweiske.de/tagebuch/atom-validation.htm) on validating `ATOM` feeds locally. {{< /sidenote >}} between `rnc` and `rng` can be achieved with the [Java]() program [`trang`](https://relaxng.org/jclark/trang.html) (usually goes by the name [`jing-trang`](https://repology.org/project/jing-trang/versions) in package repositories). **Trang** : Trang converts between different schema languages for `XML`. `RELAX NG` (`XML` syntax), `RELAX NG` compact syntax, `XML` `1.0` `DTDs` and `W3C` `XML` Schema (`WXS`). In my case, and maybe yours, it's easier to run `trang` on an already well specified and well formed `XML` document. This produces a basic `rng` schema file for validation and adding more rules. ```shell trang rss.xml rss.rng trang atom.xml atom.rng trang opml.xml opml.rng ``` Validate `XML` using the `rng` file with `xmllint` and the `--relaxng` flag. The `--noout` flag disables printing the output to the command line. ``` shell $ xmllint --noout --relaxng rss.rng rss.xml rss.xml validates ``` If it fails to validate it will return the error message defined by the schema's grammar. ```shell $ xmllint --noout --relaxng rss.rng rss.xml rss.xml:25: element description: Relax-NG validity error : Did not expect element description there rss.xml fails to validate ``` Pretty print `XML` with `--pretty 1` for basic formatting or `--pretty 2` for "one attribute per line" white space formatting. ```shell xmllint --pretty 1 rss.xml xmllint --pretty 2 rss.xml ``` ## Stylesheet Processing and Validation The command line `XSLT` processor [`xsltproc`](https://man.archlinux.org/man/xsltproc.1.en) can be used to process stylesheets offline and works only on stylesheets up to version `1.1`. If using `xsltproc` as a validation tool for `xsl` files, you'll have to downgrade the version declaration from [version `3.0`](https://www.w3.org/TR/xslt-30/) to version `1.1` and {{< sidenote mark="sacrifice" set="right" >}} Not that it matters much --- you'll find that [version `1.0`](https://www.w3.org/TR/xslt-10/) is the version that that most browsers support. {{< /sidenote >}} a few features. ```shell {caption="If nothing returns the xsl file is validated"} xsltproc rss.xsl ``` ```shell {caption="Transform rss.xml using rss.xsl. The data transforms from XML → XHTML"} xsltproc rss.xsl rss.xml ``` Other processors like [Xalan--Java](https://xml.apache.org/xalan-j/) supports `XSLT` up to version `1.0` and [Saxon](https://www.saxonica.com/documentation11/index.html#!using-xsl/xslt30) up to version `3.0`. ## Stylesheet Boilerplate and Transformations Below is one variation of a stylesheet that transforms `XML` to `XHTML`. A typical `XHTML` document skeleton is embedded within along with `XSLT` elements for processing and transformation. ```xsl {options="hl_lines=5 7",caption="A basic template for transforming RSS, OPML, ATOM → XHTML"} XHTML Document ``` In the above, [namespace attributes](https://en.wikipedia.org/wiki/XML_namespace) in the form `xmlns:itunes` extend the document. You _could_ think of them as imports for extending features and avoiding naming conflicts. The `URL` points to the "allowed" {{< sidenote mark="vocabulary" set="right" >}} Specs are meant to be broken after all. {{< /sidenote >}} specified by the namespace. For example, the [Atom Activity Streams](https://activitystrea.ms/specs/atom/1.0/) namespace could be added under `xmlns:activity` and extend the stylesheet with an understanding of [Activity Streams](https://www.w3.org/TR/activitystreams-core/) related vocabulary. Namespaces can also be used to extend processing instructions like `xmlns:xsl` for `XSLT` processors that support them. ```xsl {options="hl_lines=2",caption="Namespace in the XSLT stylesheet"} ``` ```xml {caption="Somewhere in an XML document"} post ``` Drop the `xsl` stylesheet inside a `XML` document with the `xml-stylesheet` declaration and the browser handles the rest. ```xml {options="hl_lines=2"} ``` `XSLT` works in conjunction with [`XPath`](https://www.w3.org/TR/1999/REC-xpath-19991116/) (the `XML` Path Language) and is somewhat similar to `CSS` (Cascading Style Sheets) selectors. Command line programs like [`xmlstarlet`](https://man.archlinux.org/man/xmlstarlet.1.en) make use of `XPath` expressions for selecting data from parts of an `XML` document. ```xsl ``` The` XPath` expression from the select attribute above gets the `href` value from the `` tag in the `atom` namespace which is equal to `https://example.com/page/2/rss.xml`. ```xml