programming ← blog

Adding Syntax Highlighting to Markdown

Now that I’ve gotten some of the more basic blogging features out of the way (comments, editing, etc), I want to start posting more often. Earlier today I saw an article on Content-Security-Policy that I found somewhat lacking, so I considered writing my own. Then I realized.. if I’m going to be writing tech articles, I need to have syntax highlighting for my code!

Reading code without syntax highlighting is a dreary experience as any developer will tell you. In the Extended Syntax for Markdown you can create Fenced Code Blocks, which are blocks of text delinted with triple tick marks that are meant to render code using the HTML <code> element. This is what sites like GitHub and Stack Overflow use for rendering code blocks.

However, the markdown parser I use on the backend (commonmark-java) doesn’t have any built-in support for syntax highlighting. You can add in a custom renderer for Fenced Code Block nodes, but doing that from scratch would be really complicated. I’d either have to find a Java or Scala library for syntax highlighting and use that to power a custom HTML renderer, or I’d have to create a syntax highlighter from scratch.. a fun project no doubt, but quite the commitment.

In the end, I decided to do the syntax highlighting on the frontend using javascript, and I found an incredibly easy to use library called highlight.js. All you need to do is include the javscript and css as such:

<link rel="stylesheet" href="/path/to/styles/default.min.css">
<script src="/path/to/highlight.min.js"></script>
<script>hljs.highlightAll();</script>

And that’s all that’s needed! Highlight.js will find all code withing <code> blocks and apply syntax highlighting to it. Highlight.js will attempt to determine the language automatically for highlighting, but you can specify the language in markdown by simply putting the language at the start of your fenced code block. See here for an example.

As I mentioned before, the backend markdown parser isn’t aware of syntax highlighting at all, so I had to add some custom logic to allow me to specify the language in fenced code blocks. Commonmark-java allows really fine-grained control over rendering, and I was already using some custom logic to add css classes to markdown quotes and images. It was very easy to add in some logic to extract the language-specifier from the fenced code block node, and add in a css class. For example if I use the language scala at the start of my code block, it will automatically add the css class “language-scala” to my <code> element. Here’s the logic (and the first real example of syntax highlighting!):

  // used to add css classes to header tags
  val attributeProviderFactory: AttributeProviderFactory =
    (_: AttributeProviderContext) => {

      (node: Node, tagName: String, attributes: java.util.Map[String, String]) => {

        node match {
          case _: BlockQuote => attributes.put("class", "w3-light-grey w3-panel w3-leftbar w3-border-dark-grey")
          case _: Image => attributes.put("class", "w3-image")
          case code: FencedCodeBlock if tagName == "code" =>
            if (!code.getInfo.isEmpty) attributes.put("class", s"language-${code.getInfo}")
          case _ => ()
        }
      }
    }

I then use that factory when constructing the html renderer:

  lazy val escapedHtmlRenderer: HtmlRenderer =
    HtmlRenderer.builder().attributeProviderFactory(attributeProviderFactory)
      .extensions(extensions)
      .escapeHtml(true).build()

The code may look a bit daunting with its factory-building, but basically, when the backend is converting raw markdown -> HTML, it now knows how to add on the css classes that I want for particular markdown constructs (like fenced code blocks).

If you inspect the HTML of this page right now, you’ll see the css class language-scala was added to the above code blocks. Highlight.js runs on this page on startup, and knows to highlight the language as Scala, and it all seamlessly works. And it uses a dark theme!

//comments