Bringing Visual Studio Code Syntax Highlighting to Jekyll

Day 2

December 03, 2018 at 03:53 (UTC)

Goals

I feel like I made good investigative progress yesterday, but not so good code progress. That’s okay. I was burned out from my Friday night activities (not what you think) and needed recovery time.

I feel better today, and although I have some housework I should do, I’m feeling a bit ambitious. I’d like to complete the primary investigative work today, which means I need to accomplish the following two things:

Understand how Visual Studio Code uses the grammar and theme files to process files (and text)
Determine what I need to do to integrate with kramdown

Investigation

Kramdown

Being the smaller project, I think I’ll start with kramdown. It’s hosted on GitHub.

I know kramdown supports Rogue, and Rogue seems like a specific enough keyword for a good search
My mistake, it’s Rouge, not Rogue. (I got this wrong yesterday, as well)
Searching GitHub and limiting the results to Ruby yields two test files, one Rakefile, and one library file. The library file is probably what I want
Right at the top of the file module Kramdown::Converter::SyntaxHighlighter. Bingo!
Let’s see how it gets invoked. Go up one directory and I find a file for three different syntax highlighters. Frequently that means there’s a common interface they all support defined further up
Up one more level and we find syntax_highlighter.rb. That looks useful
A mostly empty module file with instructions on how to implement a syntax highlighter. Looks simple enough, too. That’s goal #1 complete

Visual Studio Code

I’ve decided I’m not happy with using where I was yesterday as a starting point for today. While I’m certain that work will still be useful, it was done when my brain wasn’t fully engaged. So I’m going to take a slightly different path today.

From yesterday, we know where the built-in Rust extension lives, and that the built-in Rust extension is basically just for syntax highlighting
It’s unlikely that the extension has code specifically to load it, so presumably its structure is common. (Verified by looking at the Go extension)
syntaxes is the folder that holds the grammar files. And the grammar files end with .tmLanguage.json. Both of those would make good search terms, but let’s start with the folder. (syntax is a common term in the code, but syntaxes does not appear to be)
5 results, but only the first (TMGrammar.ts) seems to be relevant. So we’re back to what we found yesterday, but now with a line number!
This line is part of a function call to ExtensionsRegistry.registerExtensionPoint. I don’t really understand the arguments, but one of them seems to be called items and one called defaultSnippets and both reference the syntaxes folder, so I think it’s worth looking at that function in more detail to better understand it
Okay, this function seems to be registering an extension point, which is a place extensions can be connected to, or something like that. I’m sure a trip to the VSCode documentation could explain more, but it creates a thing called an ExtensionPoint, so I’m going to look at that class
Didn’t pan out. The Rust package.json has a contributes section that includes “grammars”. Let’s see what searching for that does
Found the vscode-textmate declaration file, and it seems to have a loadGrammar function that I’m sure is going to be very useful
Also found a getGrammarLocation function
parseRawGrammar is probably the function that parses the grammar file contents into whatever structure is used for processing text
And that function led me to this project
parseRawGrammar is in grammarReader.ts
From that file, I found information that leads me to assume that JSON parsing is in json.ts and TextMate property list parsing is in plist.ts
These look like custom parsers for speed
IGrammar and tokenizeLine seems to be the primary means of interface (the second one is a guess)
Searching for tokenizeLine pulled up the README file. Should have started there
Yep, looks like I was correct in what the workhorses are
Found the definition for tokenizeLine
The actual work is done by _tokenize
I kept seeing references to “onig” (various capitalizations). Finally traced it through the code to https://github.com/Microsoft/vscode-textmate/blob/e8f439d613afa00674e83b3b3ae382fc774665e1/src/onigLibs.ts which led me to onigasm. It’s a regex library.
Even more the actual work is done by _tokenizeString
This is starting to get complicated. While I’m certain it is either a) necessary complication or b) for performance reasons, TypeScript is not my forte. I’m going to take a whack at building something simple that works, and use this for reference if I get stuck, now that I know where most of the pieces are. I guess goal #2 complete?

Coding

I got tired after about three seconds of coding. It’s been a while since I’ve used Ruby, and I’m not familiar with its idioms anymore. Good thing I bought a physical copy of Why’s Poignant Guide to Ruby to read at bedtime
I decided it really doesn’t make much sense for me to redo all of the work the Visual Studio Code team has already done. NodeJS is popular enough that someone must have made a way for Ruby to use it. As much as I dislike mixing language runtimes, let’s take a look
Found nothing, oh well. Guess I’ll have to do some real Ruby coding either later tonight (after dinner) or tomorrow