Bringing Visual Studio Code Syntax Highlighting to Jekyll
Day 2
Goals
I feel like I made good investigative progress yesterday, but not so good code progress. That’s okay. I was burned out from my Friday night activities (not what you think) and needed recovery time.
I feel better today, and although I have some housework I should do, I’m feeling a bit ambitious. I’d like to complete the primary investigative work today, which means I need to accomplish the following two things:
- Understand how Visual Studio Code uses the grammar and theme files to process files (and text)
- Determine what I need to do to integrate with kramdown
Investigation
Kramdown
Being the smaller project, I think I’ll start with kramdown. It’s hosted on GitHub.
- I know kramdown supports Rogue, and Rogue seems like a specific enough keyword for a good search
- My mistake, it’s Rouge, not Rogue. (I got this wrong yesterday, as well)
- Searching GitHub and limiting the results to Ruby yields two test files, one Rakefile, and one library file. The library file is probably what I want
- Right at the top of the file
module Kramdown::Converter::SyntaxHighlighter
. Bingo! - Let’s see how it gets invoked. Go up one directory and I find a file for three different syntax highlighters. Frequently that means there’s a common interface they all support defined further up
- Up one more level and we find
syntax_highlighter.rb
. That looks useful - A mostly empty module file with instructions on how to implement a syntax highlighter. Looks simple enough, too. That’s goal #1 complete
Visual Studio Code
I’ve decided I’m not happy with using where I was yesterday as a starting point for today. While I’m certain that work will still be useful, it was done when my brain wasn’t fully engaged. So I’m going to take a slightly different path today.
- From yesterday, we know where the built-in Rust extension lives, and that the built-in Rust extension is basically just for syntax highlighting
- It’s unlikely that the extension has code specifically to load it, so presumably its structure is common. (Verified by looking at the Go extension)
syntaxes
is the folder that holds the grammar files. And the grammar files end with.tmLanguage.json
. Both of those would make good search terms, but let’s start with the folder. (syntax
is a common term in the code, butsyntaxes
does not appear to be)- 5 results, but only the first (
TMGrammar.ts
) seems to be relevant. So we’re back to what we found yesterday, but now with a line number! - This line is part of a function call to
ExtensionsRegistry.registerExtensionPoint
. I don’t really understand the arguments, but one of them seems to be calleditems
and one calleddefaultSnippets
and both reference thesyntaxes
folder, so I think it’s worth looking at that function in more detail to better understand it - Okay, this function seems to be registering an extension point, which is a place extensions can be connected to, or something like that. I’m sure a trip to the VSCode documentation could explain more, but it creates a thing called an
ExtensionPoint
, so I’m going to look at that class - Didn’t pan out. The Rust package.json has a contributes section that includes “grammars”. Let’s see what searching for that does
- Found the vscode-textmate declaration file, and it seems to have a loadGrammar function that I’m sure is going to be very useful
- Also found a
getGrammarLocation
function parseRawGrammar
is probably the function that parses the grammar file contents into whatever structure is used for processing text- And that function led me to this project
parseRawGrammar
is in grammarReader.ts- From that file, I found information that leads me to assume that JSON parsing is in json.ts and TextMate property list parsing is in plist.ts
- These look like custom parsers for speed
IGrammar
andtokenizeLine
seems to be the primary means of interface (the second one is a guess)- Searching for
tokenizeLine
pulled up the README file. Should have started there - Yep, looks like I was correct in what the workhorses are
- Found the definition for
tokenizeLine
- The actual work is done by
_tokenize
- I kept seeing references to “onig” (various capitalizations). Finally traced it through the code to https://github.com/Microsoft/vscode-textmate/blob/e8f439d613afa00674e83b3b3ae382fc774665e1/src/onigLibs.ts which led me to
onigasm
. It’s a regex library. - Even more the actual work is done by
_tokenizeString
- This is starting to get complicated. While I’m certain it is either a) necessary complication or b) for performance reasons, TypeScript is not my forte. I’m going to take a whack at building something simple that works, and use this for reference if I get stuck, now that I know where most of the pieces are. I guess goal #2 complete?
Coding
- I got tired after about three seconds of coding. It’s been a while since I’ve used Ruby, and I’m not familiar with its idioms anymore. Good thing I bought a physical copy of Why’s Poignant Guide to Ruby to read at bedtime
- I decided it really doesn’t make much sense for me to redo all of the work the Visual Studio Code team has already done. NodeJS is popular enough that someone must have made a way for Ruby to use it. As much as I dislike mixing language runtimes, let’s take a look
- Found nothing, oh well. Guess I’ll have to do some real Ruby coding either later tonight (after dinner) or tomorrow