Lem
GitHubDiscord Matrix Toggle Dark/Light/Auto modeToggle Dark/Light/Auto modeToggle Dark/Light/Auto modeBack to homepage

Tree-sitter Integration

Lem integrates tree-sitter for advanced syntax highlighting and indentation. Tree-sitter provides incremental parsing that can efficiently update syntax trees as you edit, enabling features that understand code structure rather than just patterns.

Overview

The tree-sitter integration in Lem provides:

  • Syntax Highlighting: Accurate, language-aware highlighting based on AST nodes
  • Indentation: Structure-based indentation using tree-sitter queries
  • Incremental Parsing: Efficient updates without re-parsing entire files
  • Graceful Fallback: Falls back to regex-based highlighting when tree-sitter is unavailable

Architecture

                              ┌─────────────────┐
                              │  tree-sitter-cl │
                              │  (FFI bindings) │
                              └────────┬────────┘
                                       │
┌─────────────────┐           ┌────────▼────────┐
│   Language Mode │──────────▶│ lem-tree-sitter │
│  (json, nix...) │           │   (extension)   │
└─────────────────┘           └────────┬────────┘
                                       │
                              ┌────────▼────────┐
                              │   Lem Buffer    │
                              │   (syntax-parser)│
                              └─────────────────┘

The integration consists of:

ComponentLocationPurpose
tree-sitter-clExternalFFI bindings to native tree-sitter
lem-tree-sitterextensions/tree-sitter/Core integration module
Query filesextensions/<mode>/tree-sitter/Per-language highlight/indent rules

Supported Languages

The following modes have tree-sitter support:

LanguageSyntax HighlightingIndentation
JSON-
YAML-
Nix
Markdown-
WAT (WebAssembly Text)-

Adding Tree-sitter Support to a Mode

Step 1: Create Query Files

Create a tree-sitter/ directory in your mode’s extension folder:

extensions/your-mode/
├── your-mode.lisp
├── your-mode.asd
└── tree-sitter/
    ├── highlights.scm    # Syntax highlighting rules
    └── indents.scm       # Indentation rules (optional)

Step 2: Write Highlight Queries

Highlight queries use tree-sitter’s S-expression query syntax. Each query captures AST nodes and assigns them to highlight groups.

Example highlights.scm for a simple language:

;; Comments
(comment) @comment

;; Keywords
[
  "if"
  "else"
  "let"
  "in"
  "function"
] @keyword

;; Literals
(string) @string
(number) @number
(boolean) @constant.builtin

;; Functions
(function_definition
  name: (identifier) @function)

(function_call
  function: (identifier) @function.call)

;; Variables
(variable_declaration
  name: (identifier) @variable)

;; Operators
["+" "-" "*" "/" "==" "!="] @operator

;; Punctuation
["(" ")" "{" "}" "[" "]"] @punctuation.bracket
["," ";" ":"] @punctuation.delimiter

Step 3: Enable Tree-sitter in Mode Definition

In your mode’s Lisp file:

(defpackage :lem-your-mode
  (:use :cl :lem :lem/language-mode))
(in-package :lem-your-mode)

(defvar *your-syntax-table*
  (let ((table (make-syntax-table ...)))
    ;; Set up base syntax table
    table))

(defun tree-sitter-query-path ()
  "Return the path to the tree-sitter highlight query."
  (asdf:system-relative-pathname :lem-your-mode "tree-sitter/highlights.scm"))

(defun tree-sitter-indent-query-path ()
  "Return the path to the tree-sitter indent query."
  (asdf:system-relative-pathname :lem-your-mode "tree-sitter/indents.scm"))

(define-major-mode your-mode language-mode
    (:name "Your Mode"
     :syntax-table *your-syntax-table*
     :mode-hook *your-mode-hook*)
  ;; Enable tree-sitter (with optional indent query)
  (lem-tree-sitter:enable-tree-sitter-for-mode
   *your-syntax-table*
   "your-language"  ; tree-sitter language name
   (tree-sitter-query-path)
   :indent-query-path (tree-sitter-indent-query-path))
  (setf (variable-value 'enable-syntax-highlight) t))

Step 4: Handle Fallback (Optional)

For robust error handling when tree-sitter isn’t available:

(defun try-enable-tree-sitter ()
  "Try to enable tree-sitter, returning T on success, NIL on failure."
  (ignore-errors
    (when (and (find-package :lem-tree-sitter)
               (funcall (find-symbol "TREE-SITTER-AVAILABLE-P" :lem-tree-sitter)))
      (funcall (find-symbol "ENABLE-TREE-SITTER-FOR-MODE" :lem-tree-sitter)
               *your-syntax-table* "your-language" (tree-sitter-query-path))
      t)))

(define-major-mode your-mode language-mode
    (:syntax-table *your-syntax-table*)
  (unless (try-enable-tree-sitter)
    ;; Fall back to tmlanguage-based highlighting
    (set-syntax-parser *your-syntax-table* (make-tmlanguage-your-lang)))
  (setf (variable-value 'enable-syntax-highlight) t))

Capture Names Reference

Lem maps tree-sitter capture names to syntax attributes. The following captures are supported:

Code Elements

CaptureLem AttributeUsage
@keywordsyntax-keyword-attributeLanguage keywords
@keyword.controlsyntax-keyword-attributeControl flow keywords
@keyword.functionsyntax-keyword-attributeFunction-related keywords
@stringsyntax-string-attributeString literals
@string.escapesyntax-constant-attributeEscape sequences
@numbersyntax-constant-attributeNumeric literals
@commentsyntax-comment-attributeComments
@functionsyntax-function-name-attributeFunction names
@function.callsyntax-function-name-attributeFunction calls
@function.builtinsyntax-builtin-attributeBuilt-in functions
@typesyntax-type-attributeType names
@variablesyntax-variable-attributeVariable names
@variable.builtinsyntax-builtin-attributeBuilt-in variables
@constantsyntax-constant-attributeConstants
@constant.builtinsyntax-constant-attributeBuilt-in constants
@operatorsyntax-builtin-attributeOperators
@propertysyntax-variable-attributeObject properties

Markdown/Document Elements

CaptureLem AttributeUsage
@markup.heading.1 - @markup.heading.6document-header1-attribute - document-header6-attributeHeaders
@markup.bolddocument-bold-attributeBold text
@markup.italicdocument-italic-attributeItalic text
@markup.rawdocument-code-block-attributeCode blocks
@markup.linkdocument-link-attributeLinks
@markup.quotedocument-blockquote-attributeBlock quotes

Hierarchical Fallback

Capture names support hierarchical fallback. For example, @keyword.control will fall back to @keyword if no specific mapping exists.

Writing Indent Queries

Indent queries follow the Helix editor’s format using @indent and @outdent captures.

Basic Syntax

;; Nodes that increase indentation
[
  (block)
  (object)
  (array)
] @indent

;; Tokens that decrease indentation
[
  "}"
  "]"
  ")"
] @outdent

Scope Rules

The indentation system uses scope rules:

  • @indent (scope: tail): Only applies if the node starts on a previous line
  • @outdent (scope: all): Applies when the token appears at the start of a line

Example: Nix Indentation

;; Indent-contributing nodes
[
  (attrset_expression)
  (rec_attrset_expression)
  (let_expression)
  (list_expression)
  (function_expression)
  (if_expression)
] @indent

;; Closing tokens
[
  "}"
  "]"
  ")"
] @outdent

;; Special: "in" aligns with "let"
(let_expression "in" @outdent)

;; Special: "then" and "else" align with "if"
(if_expression "then" @outdent)
(if_expression "else" @outdent)

API Reference

Main Functions

enable-tree-sitter-for-mode

(enable-tree-sitter-for-mode syntax-table language query-path &key indent-query-path)

Enable tree-sitter for a mode’s syntax table.

ParameterDescription
syntax-tableThe mode’s syntax table
languageTree-sitter language name (e.g., "json")
query-pathPath to highlights.scm
indent-query-pathOptional path to indents.scm

Returns T on success, NIL on failure.

tree-sitter-available-p

(tree-sitter-available-p) => boolean

Check if tree-sitter is available on the system.

get-buffer-treesitter-parser

(get-buffer-treesitter-parser buffer) => treesitter-parser or NIL

Get the tree-sitter parser for a buffer, if any.

The treesitter-parser Class

The core class for tree-sitter integration:

SlotAccessorDescription
language-nametreesitter-parser-language-nameLanguage name (e.g., “json”)
treetreesitter-parser-treeCurrent syntax tree
highlight-querytreesitter-parser-highlight-queryCompiled highlight query
indent-querytreesitter-parser-indent-queryCompiled indent query

Incremental Parsing

Lem automatically handles incremental parsing. When you edit a buffer:

  1. Edit events are recorded via after-change-functions hook
  2. Pending edits are applied to the existing tree
  3. Tree-sitter reuses unchanged AST nodes
  4. Only modified portions are re-parsed

This provides efficient performance even for large files.

Installing Tree-sitter Grammars

Tree-sitter grammars must be installed on your system. The typical approach:

Using tree-sitter CLI

# Install tree-sitter CLI
npm install -g tree-sitter-cli

# Clone and build a grammar
git clone https://github.com/tree-sitter/tree-sitter-json
cd tree-sitter-json
tree-sitter generate

Using System Package Manager

Many distributions provide pre-built grammars:

# Arch Linux
pacman -S tree-sitter-grammars

# macOS (Homebrew)
brew install tree-sitter

# Nix
nix-env -iA nixpkgs.tree-sitter-grammars.tree-sitter-json

Troubleshooting

Tree-sitter not working

  1. Check if tree-sitter is available:

    (lem-tree-sitter:tree-sitter-available-p)
    
  2. Check if the language grammar is installed:

    (tree-sitter:get-language "json")
    
  3. Verify query file exists:

    (probe-file (tree-sitter-query-path))
    

Incorrect highlighting

  1. Test your query with tree-sitter CLI:

    tree-sitter query highlights.scm example.json
    
  2. Check for syntax errors in query file

  3. Verify capture names match supported names (see Capture Names Reference)

Performance issues

  • Tree-sitter uses incremental parsing, but very large files may still be slow
  • Consider limiting highlight range for extremely large files
  • Check if grammar has known performance issues

Resources