+++
date = "2022-06-29T22:29:50+00:00"
publishdate = "2023-12-29T07:08:55+00:00"
title = "Literate Programming"
slug = "literate-programming"
author = "Thedro"
tags = ["literate"]
type = "posts"
summary =  "Literate Programming is a documentation first programming style pioneered by Donald Knuth."
draft =  ""
syntax =  ""
toc =  ""
updated =  "2022-07-01"
+++

![Donald Knuth's Literate Programming](/images/literate-programming.png "
  Donald Knuth's Literate Programming"
)

[Literate Programming](https://en.wikipedia.org/wiki/Literate_programming) is a
documentation first programming style pioneered by
[Donald Knuth](https://en.wikipedia.org/wiki/Donald_Knuth). It's
{{< sidenote mark="like" set="left" >}} That comparison doesn't do it justice as
there's [much more involved](http://www.literateprogramming.com/index.html) when
it comes to literate programming. {{< /sidenote >}} writing a blog post as a
light specification but with the added bonus of producing a runnable program at
the end.

But this isn't a post about my hacked together attempts at literate programming
--- that won't be very interesting. This is more of a small research gathering
exercise on the different literate programming tools and workflows.

## Ideal

In my opinion, literate programming is still an incomplete paradigm. The biggest
drawback comes in the form of programming resistance. This resistance comes
mostly from the fact that a problem space is unknown ahead of time and so, as
the story goes --- the implementation becomes the specification.

That's the norm, but ideally anyone should be able to jump into a source tree,
make changes, and reconcile those changes later on correctly with some sort of
generated {{< sidenote mark="human" set="right" >}} The `RFC` (Request For
Comments) document style is a nice example of "for the human" language. Here's
the `RFC` for the
[Atom Syndication Format](https://datatracker.ietf.org/doc/html/rfc4287) as an
example. {{< /sidenote >}} readable documentation or specification. The goal is
not perfection but accurate agility, and the tooling should be super simple. What would
that look like? Who knows, so here's what I think a more complete or "agile"
literate programming workflow might entail.

1. The workflow should facilitate calculating and visualizing sources of
   documentation drift. Difference in the specification (documentation) and
   implementation (source code) should be easy to reconcile through varied
   means. This could be by the seat of your pants, automated
   [diffing](https://git-scm.com/docs/git-diff) and
   [fuzzing](https://en.wikipedia.org/wiki/Fuzzing) algorithms between the
   source and doc, or from signals that inform specificity and constraints.
   [Test driven development](https://en.wikipedia.org/wiki/Test-driven_development)
   and [type systems](https://en.wikipedia.org/wiki/Type_system) are useful
   signals but are exposed as low level source code rather than high level human
   readable constraints.

2. The initial documentation entry point should not matter. Edit source files
   directly or enter top down through an overarching literate system. If
   documentation drift is easy to identify then there is flexibility to
   experiment "freely" with the implementation. Primitives could be at a
   singular source file level and ideally any source file can act as an entry
   point for jump starting a code repository's documentation in part, or in
   whole.

3. Literate workflows should facilitate producing documents in whole or in part
   as discrete outputs based on section, topic, idea, or source file. This can
   be in any output format such as `PDF` (Portable Document Format), `HTML`
   (HyperText Markup Language), or
   [`Markdown`](https://en.wikipedia.org/wiki/Markdown). This is important
   because sometimes only select portions of a program are worth documenting or
   reading.

4. Ideally multiple formats should be available as inputs and outputs. Markdown
   is popular but there are a range of other markup document languages that
   offer different advantages.
   [reStructuredText](https://docutils.sourceforge.io/rst.html),
   [AsciiDoc](https://docs.asciidoctor.org/asciidoc/latest/), and
   [Troff/Groff](https://www.gnu.org/software/groff/manual/groff.html) are a
   few. The shortcut approach is to leverage [Pandoc](https://pandoc.org/), a
   universal document converter.

There are nigh infinite literate programming tools and workflows. I've tried
some of them, but not in any meaningful way to write about at great length.

## Literate Programming Tools and Programs

The literate programming workflow is based on a simple tangle and weave process.
The literate source file contains splices of code chunks with their accompanying
explanations. The code chunks are tangled to construct a complete source file or
executable. The explanations around the chunks are weaved to create a document
that fully explains the program.

[**Web**](http://www.literateprogramming.com/knuthweb.pdf) [`pdf`]: Web is
Donald Knuth's system for literate programming. Tangling and weaving macros
process different parts of a `web` file to produce multiple outputs. There's
[cweb](http://www.literateprogramming.com/cweb_download.html),
[nuweb](http://nuweb.sourceforge.net/), and
[noweb](https://www.cs.tufts.edu/~nr/noweb/).

[**Documented LaTeX Files**](https://tug.org/TUGboat/tb29-2/tb92pakin.pdf)
[`pdf`]: [LaTeX](https://www.latex-project.org/about/) Packages from
[`CTAN`](https://ctan.org/?lang=en) (Comprehensive TeX Archive Network) achieve
a literate style by weaving and tangling documentation and code with the `doc`
and `docstrip` packages. Package code is mixed in with commented typesetting
inside a documented LaTeX file (`dtx`).

[**Babel**](https://orgmode.org/worg/org-contrib/babel/intro.html): Babel
converts [Emacs'](https://www.gnu.org/software/emacs/)
[Org Mode](https://orgmode.org/org.html#Summary) into a powerful workflow more
suitable for literate programming. Babel is popular in some circles and has
numerous use cases including
[developer operations](http://howardism.org/Technical/Emacs/literate-devops.html)
and general
[system crafting and architecture](https://www.youtube.com/watch?v=kkqVTDbfYp4).

[**Tsodings' lit**](https://github.com/tsoding/lit#readme): A simple literate
implementation based on the
[Literate Haskell](https://wiki.haskell.org/Literate_programming) approach. The
literate program reads a document written in LaTeX and comments out every line
not contained in a `\code{}` section block. The output is an executable source
file and the literate document takes advantage of LaTeX's output formats.

[**Zyedidia's Literate**](https://github.com/zyedidia/Literate#literate): A
literate system based on Markdown. The literate document is written inside
`.lit` files and outputs to `HTML` and `CSS`. Since it outputs directly to
`HTML`, there's a lot of leverage in refining and adapting the output.

[**Jupyter Notebook**](https://jupyter-notebook.readthedocs.io/en/latest/): A
more popular kind of simple literate programming in the form of literate
"computation". These are interactive `GUI` (Graphical User Interface) documents
for explaining and running code from a single view.

[**lit.sh**](https://github.com/vijithassar/lit#readme): A literate programming
preprocessor in pure shell because why not?

## Conclusion

Literate programming is difficult and hasn't caught on because in the majority
of cases, programs are built with the specification made up along the way. Put a
customer in the mix and your specification changes randomly. This is the status
quo because often the range of inputs are unknown along with a full scope of the
problem space.

Quantifying documentation drift and adding flexibility for limited but accurate
documentation could assist in making literate programming much easier. Testing
and type systems are becoming much more robust --- perhaps Donald Knuth was just
a tad bit too early for literate programming to fully catch on? My hunch is that
if someone or a company figures this out, it will be super simple to use and
would look like [`git`](https://git-scm.com/) but for documentation.

In terms of popularity, the most used literate programming tools seem to be
[Babel](https://orgmode.org/worg/org-contrib/babel/intro.html) and
[Noweb](https://www.cs.tufts.edu/~nr/noweb/).