+++ date = "2022-06-29T22:29:50+00:00" publishdate = "2023-12-29T07:08:55+00:00" title = "Literate Programming" slug = "literate-programming" author = "Thedro" tags = ["literate"] type = "posts" summary = "Literate Programming is a documentation first programming style pioneered by Donald Knuth." draft = "" syntax = "" toc = "" updated = "2022-07-01" +++ ![Donald Knuth's Literate Programming](/images/literate-programming.png " Donald Knuth's Literate Programming" ) [Literate Programming](https://en.wikipedia.org/wiki/Literate_programming) is a documentation first programming style pioneered by [Donald Knuth](https://en.wikipedia.org/wiki/Donald_Knuth). It's {{< sidenote mark="like" set="left" >}} That comparison doesn't do it justice as there's [much more involved](http://www.literateprogramming.com/index.html) when it comes to literate programming. {{< /sidenote >}} writing a blog post as a light specification but with the added bonus of producing a runnable program at the end. But this isn't a post about my hacked together attempts at literate programming --- that won't be very interesting. This is more of a small research gathering exercise on the different literate programming tools and workflows. ## Ideal In my opinion, literate programming is still an incomplete paradigm. The biggest drawback comes in the form of programming resistance. This resistance comes mostly from the fact that a problem space is unknown ahead of time and so, as the story goes --- the implementation becomes the specification. That's the norm, but ideally anyone should be able to jump into a source tree, make changes, and reconcile those changes later on correctly with some sort of generated {{< sidenote mark="human" set="right" >}} The `RFC` (Request For Comments) document style is a nice example of "for the human" language. Here's the `RFC` for the [Atom Syndication Format](https://datatracker.ietf.org/doc/html/rfc4287) as an example. {{< /sidenote >}} readable documentation or specification. The goal is not perfection but accurate agility, and the tooling should be super simple. What would that look like? Who knows, so here's what I think a more complete or "agile" literate programming workflow might entail. 1. The workflow should facilitate calculating and visualizing sources of documentation drift. Difference in the specification (documentation) and implementation (source code) should be easy to reconcile through varied means. This could be by the seat of your pants, automated [diffing](https://git-scm.com/docs/git-diff) and [fuzzing](https://en.wikipedia.org/wiki/Fuzzing) algorithms between the source and doc, or from signals that inform specificity and constraints. [Test driven development](https://en.wikipedia.org/wiki/Test-driven_development) and [type systems](https://en.wikipedia.org/wiki/Type_system) are useful signals but are exposed as low level source code rather than high level human readable constraints. 2. The initial documentation entry point should not matter. Edit source files directly or enter top down through an overarching literate system. If documentation drift is easy to identify then there is flexibility to experiment "freely" with the implementation. Primitives could be at a singular source file level and ideally any source file can act as an entry point for jump starting a code repository's documentation in part, or in whole. 3. Literate workflows should facilitate producing documents in whole or in part as discrete outputs based on section, topic, idea, or source file. This can be in any output format such as `PDF` (Portable Document Format), `HTML` (HyperText Markup Language), or [`Markdown`](https://en.wikipedia.org/wiki/Markdown). This is important because sometimes only select portions of a program are worth documenting or reading. 4. Ideally multiple formats should be available as inputs and outputs. Markdown is popular but there are a range of other markup document languages that offer different advantages. [reStructuredText](https://docutils.sourceforge.io/rst.html), [AsciiDoc](https://docs.asciidoctor.org/asciidoc/latest/), and [Troff/Groff](https://www.gnu.org/software/groff/manual/groff.html) are a few. The shortcut approach is to leverage [Pandoc](https://pandoc.org/), a universal document converter. There are nigh infinite literate programming tools and workflows. I've tried some of them, but not in any meaningful way to write about at great length. ## Literate Programming Tools and Programs The literate programming workflow is based on a simple tangle and weave process. The literate source file contains splices of code chunks with their accompanying explanations. The code chunks are tangled to construct a complete source file or executable. The explanations around the chunks are weaved to create a document that fully explains the program. [**Web**](http://www.literateprogramming.com/knuthweb.pdf) [`pdf`]: Web is Donald Knuth's system for literate programming. Tangling and weaving macros process different parts of a `web` file to produce multiple outputs. There's [cweb](http://www.literateprogramming.com/cweb_download.html), [nuweb](http://nuweb.sourceforge.net/), and [noweb](https://www.cs.tufts.edu/~nr/noweb/). [**Documented LaTeX Files**](https://tug.org/TUGboat/tb29-2/tb92pakin.pdf) [`pdf`]: [LaTeX](https://www.latex-project.org/about/) Packages from [`CTAN`](https://ctan.org/?lang=en) (Comprehensive TeX Archive Network) achieve a literate style by weaving and tangling documentation and code with the `doc` and `docstrip` packages. Package code is mixed in with commented typesetting inside a documented LaTeX file (`dtx`). [**Babel**](https://orgmode.org/worg/org-contrib/babel/intro.html): Babel converts [Emacs'](https://www.gnu.org/software/emacs/) [Org Mode](https://orgmode.org/org.html#Summary) into a powerful workflow more suitable for literate programming. Babel is popular in some circles and has numerous use cases including [developer operations](http://howardism.org/Technical/Emacs/literate-devops.html) and general [system crafting and architecture](https://www.youtube.com/watch?v=kkqVTDbfYp4). [**Tsodings' lit**](https://github.com/tsoding/lit#readme): A simple literate implementation based on the [Literate Haskell](https://wiki.haskell.org/Literate_programming) approach. The literate program reads a document written in LaTeX and comments out every line not contained in a `\code{}` section block. The output is an executable source file and the literate document takes advantage of LaTeX's output formats. [**Zyedidia's Literate**](https://github.com/zyedidia/Literate#literate): A literate system based on Markdown. The literate document is written inside `.lit` files and outputs to `HTML` and `CSS`. Since it outputs directly to `HTML`, there's a lot of leverage in refining and adapting the output. [**Jupyter Notebook**](https://jupyter-notebook.readthedocs.io/en/latest/): A more popular kind of simple literate programming in the form of literate "computation". These are interactive `GUI` (Graphical User Interface) documents for explaining and running code from a single view. [**lit.sh**](https://github.com/vijithassar/lit#readme): A literate programming preprocessor in pure shell because why not? ## Conclusion Literate programming is difficult and hasn't caught on because in the majority of cases, programs are built with the specification made up along the way. Put a customer in the mix and your specification changes randomly. This is the status quo because often the range of inputs are unknown along with a full scope of the problem space. Quantifying documentation drift and adding flexibility for limited but accurate documentation could assist in making literate programming much easier. Testing and type systems are becoming much more robust --- perhaps Donald Knuth was just a tad bit too early for literate programming to fully catch on? My hunch is that if someone or a company figures this out, it will be super simple to use and would look like [`git`](https://git-scm.com/) but for documentation. In terms of popularity, the most used literate programming tools seem to be [Babel](https://orgmode.org/worg/org-contrib/babel/intro.html) and [Noweb](https://www.cs.tufts.edu/~nr/noweb/).