---
title: "Building autocomplete with antlr and codemirror"
page_name: "Building autocomplete with ANTLR and CodeMirror"
type: "blog"
slug: "building-autocomplete-antlr-codemirror"
published_at: "2021-01-27"
modified_at: "2026-05-16"
url: "https://www.sumologic.com/blog/building-autocomplete-antlr-codemirror"
canonical: "https://www.sumologic.com/blog/building-autocomplete-antlr-codemirror"
markdown_url: "https://www.sumologic.com/blog/building-autocomplete-antlr-codemirror.md"
lang: "en"
excerpt: "One of the challenges we dealt with recently was improving the query building experience in our new, revamped Metrics UI."
taxonomy_blog_category:
  - "DevOps &amp; IT Operations"
---

[ All blogs ](https://www.sumologic.com/blog "blog")[DevOps &amp; IT Operations](https://www.sumologic.com/blog/devops-it-operations)

# Building autocomplete with ANTLR and CodeMirror

[Sebastian Slepowronski](#blog-author-block-287)

January 27, 2021

10 min read 

[DevOps &amp; IT Operations](https://www.sumologic.com/blog/devops-it-operations)

##### Table of contents

 

 

 

## What’s the problem?

At Sumo Logic, we’re dealing with a large amount of data. To help our customers explore the data quickly and effectively, our product lets them write Logs, Metrics, and Tracing queries. One of the challenges we dealt with recently was improving the query building experience in our new, revamped Metrics UI.

We’ve created the basic query builder where users can build the queries using a structured UI. However, more advanced users are used to the quicker, keyboard-first experience of writing free-text *Metrics Query Language* queries. Given that it’s literally impossible to remember all the names and feasible combinations of metric dimension and their values, we’d like to offer our customers an autocomplete mechanism that would aid the query building process.

## How to suggest relevant pieces of data to the users?

As a result, we had to come up with a way to display autocomplete suggestions to the users in a reliable, fast, and maintainable way.

We’ve been choosing between several options fulfilling the three conditions above:

**Suggestions computed on the backend side**

It’s a good option because the backend has comprehensive knowledge about the query language but the biggest disadvantage would be a relatively slow perceived performance because every change of the query string or caret position would trigger the backend request. Therefore, it doesn’t fit the requirement of being a fast user experience.

**Simple and naive query string parsing on the frontend side**

Working directly with the query string on a low level is the solution being most straightforward to implement but potentially very hard to maintain in the long run. What if the query language needs to be changed? The custom parsing algorithm needs to be adjusted then. The same story when the language is extended. This way is easy to develop right now but it comes with an additional development cost at the end, leaving the tech debt behind (remember, every debt needs to be paid off, sooner or later).

**Lexing and parsing query string on the frontend side**

As a result, it turned out that the best option is to have a complete definition of the query language but with the speed improvements (like caching or computing the suggestions without the network request) possible to achieve only on the frontend side.

Then, the question arises: how to parse the string to have the full context and knowledge about its building parts?

[ANTLR](https://www.antlr.org/) is an answer. It’s a powerful parser generator. With its help, we will be able to parse the raw incomprehensible string into an understandable data structure, the so-called *Abstract Syntax Tree (AST)*. We’ll achieve that by writing the language grammar file (you can see a simple example [here](https://gist.github.com/mattmcd/5425206) and a set of more complex ones [there](https://github.com/antlr/grammars-v4)) and generating the runtime code with [TypeScript target for ANTLR 4](https://github.com/tunnelvisionlabs/antlr4ts).

We’ll do exactly that right now. At the end of this blog post, we’ll have a working autocomplete based on the simple grammar accepting the *key=value* syntax. The UI part will be handled by the [*React*](https://github.com/facebook/react) framework and [*CodeMirror*](https://github.com/codemirror/CodeMirror) library together with its [*show-hint*](https://codemirror.net/doc/manual.html#addon_show-hint) plugin responsible for controlling the autocomplete dropdown.

For the sake of clarity, we’ll focus only on the crucial pieces of code below but you can find the whole project on [GitHub](https://github.com/slepowronski/autocomplete). I encourage you to clone the repository and play with it.

## Let’s define the project and dependencies

Before digging into the details of parsing the query string, shall we start with creating the base *React TypeScript* application? Of course! We can do it very quickly using the [create-react-app](https://github.com/facebook/create-react-app) CLI tool:

```
> npx create-react-app autocomplete --template typescript
```

Then, go to the *autocomplete* folder and install all the required dependencies:

- [antlr4ts](https://github.com/tunnelvisionlabs/antlr4ts) runtime *ANTLR TypeScript* library to import the classes required during the lexing and parsing process. For instance, *CharStreams* (being a simple stream of characters) or *CommonTokenStream* (a stream of tokens coming from the lexing – don’t worry, we’ll follow this process in the next section)
- [CodeMirror](https://github.com/codemirror/CodeMirror) together with its [React wrapper](https://github.com/scniro/react-codemirror2) for rendering the code editor input together with the *show-hint* plugin to display the suggestions

```
> yarn add antlr4ts codemirror react-codemirror2
```

Then, proceed with dev dependencies:

- [antlr4ts-cli](https://www.npmjs.com/package/antlr4ts-cli) tool to generate code out of our language grammar file
- [CodeMirror’s types](https://www.npmjs.com/package/@types/codemirror) because we don’t want to give up the perks of having the *TypeScript* typings

```
> yarn add -D antlr4ts-cli @types/codemirror
```

We’ve got the base application at last so we can start building upon that.

## How to understand the custom query language?

The backend can return a list of key or value suggestions for a given term, but we need to have a good understanding of the keys and values present in the input string typed by the user first. Moreover, it’s crucial to know whether caret is currently at the key or value. We’ll use this information combined to construct the backend request.

To retrieve such data, the query string needs to be parsed to the *Abstract Syntax Tree (AST)*.

This process contains two steps:

- Lexing – generating a set of tokens from the input query string
- Parsing – using the tokens from the previous step to generate the final *AST*

Lexing and parsing the ‘customKey=customValue’ query string

*AST* is a data structure representing the input string in such a way that tokens and parser rules are nodes of the tree. If you’d traverse tree leaves in order, you should be able to assemble the query string again.

Technically speaking, we need to define our *.g4* language grammar file containing lexer rules defining the matchers for tokens in the string and parser rules allowing us to generate the tree by specifying the relationships between rules and tokens.

*ANTLR* will help us (thank you!) to generate the runtime parser out of the grammar file that we’ll create in a moment.

You could ask: what is grammar after all?

I’m eager to answer this question.

**The grammar** is a set of lexer and parser rules describing our whole language. In other words, it specifies the language and sets the boundaries to clearly define what’s allowed and what’s not.

**The lexer** is a set of the basic low-level rules where each one has the matching expression defined. The lexer will return tokens matched by the specified rules out of the given input string. As a result, we’ll receive the set of tokens used later in the parsing process.

We can define two lexer rules:

```
ALPHANUMERIC: [a-zA-Z] [0-9a-zA-Z]*;
EQ: '=';
```

The first one matches the alphanumeric word starting with the letter while the second one accepts only direct use of the equal sign.

These rules define two possible tokens (*ALPHANUMERIC* and *EQ*) that can be retrieved from the string.

**The parser** rules define relationships between the parser and lexer rules. It allows generating the final *AST* out of the tokens coming from the previous lexing process.

Let’s define the set of rules making up our language:

```
expression: (key | keyValueExpression) EOF;
keyValueExpression: key EQ value;
key: ALPHANUMERIC;
value: ALPHANUMERIC | /* empty */;
```

In essence, our grammar accepts either the key (being the alphanumeric word) itself or the combination of key and value (possibly empty) connected with the equal sign. EOF, being an explicit marker for the end of the expression, isn’t strictly required but there are some unresolved edge cases yet so… better safe than sorry.

Based on the lexer and parser rules, the **Abstract Syntax Tree** will be created on runtime and the mentioned rules will create nodes of this tree.

Congratulations! We’ve just created a simple grammar allowing for either *key=value* expression or *key* alone.

customKey

customKey=

customKey=customValue

Note: There is a very useful [ANTLR v4 plugin](https://plugins.jetbrains.com/plugin/7358-antlr-v4) for IntelliJ-based IDEs allowing you to easily explore *AST* and improve your grammar. Simply create the *.g4* file and play with that to see all the possible outcomes.

## ANTLR, please generate TypeScript files for us

As soon as we’ve got our *.g4* grammar file, we need to generate the runtime code that we will be able to use in our application. Even though there is an official ANTLR [JavaScript](https://github.com/antlr/antlr4/blob/master/doc/javascript-target.md) target, we will use the [TypeScript](https://github.com/tunnelvisionlabs/antlr4ts) one to not lose all the benefits coming from the static typing.

The most convenient way is adding the script to package.json:

```
"scripts": {
    …
    "grammar": "antlr4ts -visitor ./src/grammar/KeyValue.g4"
}
```

and running it afterwise:

```
> yarn grammar
```

As a result, a bunch of *TS* files is created in the same folder as the *.g4* one. They contain the foundation of our grammar: *KeyValueLexer.ts* and *KeyValueParser.ts*.

Additionally, we should pay special attention to the*KeyValueVisitor.ts* as we’ll implement this interface in our custom visitor. It defines a set of *visit* functions for every node of the *AST*. This file has been generated because, as you could notice, we’ve used the *-visitor* option. Alternatively, it would be a listener created instead. The differences between these two are quite subtle: coming down to the fact that the listener always goes through all nodes of the tree. In the case of the visitor, we’re deciding whether child nodes should be visited or not. More control is always a good thing so let’s proceed with the latter approach.

## It’s high time to write some real visitor’s code

We’ve got the project and all the files ready so we can finally start writing our code.

Let’s take a look at the data model as a result of walking through the tree. This is the model materializing options from our grammar directly: it’s either key or the value within a given key. Additionally, the result contains a range which is the start and end index of the detected token: key or value respectively.

```
// Token range - start and end included
export interface Range {
    start: number;
    end: number;
}

// Search key
interface KeyResult {
    type: 'KeyResult';
    key: string;
    range: Range;
}

// Search value within given key
interface ValueResult {
    type: 'ValueResult';
    key: string;
    value: string;
    range: Range;
}

export type KeyValueResult = KeyResult | ValueResult | undefined;
```

The visitor *KeyValueResultVisitor* extending the *AbstractParseTreeVisitor* coming from *antlr4ts* is the heart of our solution.

The final result is due to two factors: the query string and the current caret position:

| **Query string** | **Caret position (index)** | **Result** |
|---|---|---|
| abc | 0 – 3 | { type: ‘KeyResult’, key: ‘abc’, range: { start: 0, end: 2 } } |
| abc= | 0 – 3 | { type: ‘KeyResult’, key: ‘abc’, range: { start: 0, end: 2 } } |
| abc= | 4 | { type: ‘ValueResult’, key: ‘abc’, value: ‘’, range: { start: 4, end: 4 } } |
| abc=xyz | 0 – 3 | { type: ‘KeyResult’, key: ‘abc’, range: { start: 0, end: 2 } } |
| abc=xyz | 4 – 7 | { type: ‘ValueResult’, key: ‘abc’, value: ‘xyz’, range: { start: 4, end: 6 } } |

We’ll implement several *visit* functions to define a logic performed when the corresponding nodes of the tree are visited.

We should start by visiting the root node of the tree (*expression*). If the original query string is empty, we want to search all the keys. Otherwise, proceed with visiting the child nodes.

```
visitExpression(node: ExpressionContext): KeyValueResult {
    if (node.text === '') {
        return {
            type: 'KeyResult',
            key: '',
            range: {
                start: 0,
                end: 0,
            }
        };
    }
    return this.visitChildren(node);
}
```

Next, let’s visit the *‘key=value’* node where we want to distinguish cases of being at the key or value, depending on the caret position.

```
visitKeyValueExpression(node: KeyValueExpressionContext): KeyValueResult {
    const key = node.children?.find((child) => child instanceof KeyContext);
    const value = node.children?.find((child) => child instanceof ValueContext);

    // Both key and value are defined, caret is positioned at the value -> search value within the given key
    if (key !== undefined && value !== undefined && this.isWithinCaretPosition(value)) {
        return {
            type: 'ValueResult',
            key: key.text,
            value: value.text,
            range: KeyValueResultVisitor.getRange(value),
        };
    }
    // Caret is positioned at the key -> search for keys filtered by the given key text
    else if (key !== undefined && this.isWithinCaretPosition(key)) {
        return {
            type: 'KeyResult',
            key: key.text,
            range: KeyValueResultVisitor.getRange(key),
        };
    }
    return this.defaultResult();
}
```

Finally, we can handle the case of visiting the *key* alone (when *‘=value’* part is not typed yet). We should search for keys filtered by the given key text then.

```
visitKey(node: KeyContext): KeyValueResult {
    return {
        type: 'KeyResult',
        key: node.text,
        range: KeyValueResultVisitor.getRange(node),
    };
}
```

## Create util to see the visitor in action

We’ve got the visitor already so there are no obstacles to use it in the util function.

```
export const parseQuery = (query: string, caretPosition: number): KeyValueResult => {
    // Create input stream from the given query string
    const inputStream = CharStreams.fromString(prepareQuery(query));
    // Create lexer
    const lexer = new KeyValueLexer(inputStream);
    const tokenStream = new CommonTokenStream(lexer);
    // Create parser based on the tokens from lexer
    const parser = new KeyValueParser(tokenStream);

    // Create Abstract Syntax Tree based on the root 'expression' from the parser
    const tree = parser.expression();

    // Visit the tree to gather the result
    const visitor = new KeyValueResultVisitor(caretPosition);
    return visitor.visit(tree);
};

const prepareQuery = (query: string): string => {
    // Remove whitespaces at the end of query string
    return query.trimEnd();
};
```

## The Big Moment – use the util in our component

Without undue delay, import required dependencies. Primarily, pay attention to the required *CodeMirror* dependencies together with its *show-hint* plugin:

```
// react-codemirror2
import {UnControlled as CodeMirrorEditor} from 'react-codemirror2'
// codemirror
import CodeMirror, {Editor} from 'codemirror';
import 'codemirror/lib/codemirror.css';
import 'codemirror/theme/material-ocean.css';
// codemirror - show-hint
import 'codemirror/addon/hint/show-hint';
import 'codemirror/addon/hint/show-hint.css';
```

Define types that will be used in the state of our function component:

```
// CodeMirror's base and show-hint type
type CodeMirrorEditorType = CodeMirror.Editor & Editor;

// Query info encapsulating the query string and current caret position
interface QueryInfo {
    query: string;
    caretPosition: number;
}

// Store for the fetched suggestions together with the result of parsing
interface SuggestionsInfo {
    result: KeyValueResult;
    suggestions: string[];
}
```

Finally, let’s define the AutoComplete function component:

```
export const AutoComplete: FunctionComponent = () => {
...
```

Callback performed when either the value or caret position in the *CodeMirror* has been changed. Update the query info in the state then:

```
const onChange = useCallback((editor: CodeMirror.Editor) => {
    setQueryInfo({
        query: editor.getValue(),
        caretPosition: editor.getCursor().ch,
    });
}, []);
```

As soon as the query info is changed, parse the query using the util function that we’ve written previously.

Use the result of visiting the *AST* and pass it to the *fetchSuggestions()*. This is the function sending a request to the backend and resolving the *Promise* with the fetched suggestions. In our case, the function should distinguish between the “keys” and “values within key” scenarios to send a correct request.

Additionally, we’re receiving the range from visiting the *AST* which is the range of the token that we’ve got the caret currently at. In the case of “keys”, it will be the start and end index of the key token. Otherwise, for “values within key”, it will be the range of the value. We’ll use this information later to tell *CodeMirror*’s *show-hint* plugin where to put the picked suggestion.

```
const prevQueryInfo = usePrevious(queryInfo);
useEffect(() => {
    ...
    const result = parseQuery(queryInfo.query, queryInfo.caretPosition);

    fetchSuggestions(result).then((fetchedSuggestions) => setSuggestionsInfo({
        result,
        suggestions: fetchedSuggestions,
    }));
}, [codeMirrorEditor, prevQueryInfo, queryInfo]);
```

Finally, get the fetched suggestions and display them using the *CodeMirror*’s *show-hint*.

```
useEffect(() => {
    ...
    const isKeyResult = suggestionsInfo.result?.type === 'KeyResult';
    const options = {
        // Don't complete automatically in case of only one suggestion
        completeSingle: false,
        hint: () => ({
            from: { line: 0, ch: suggestionsInfo.result?.range.start ?? 0 },
            to: {
                line: 0,
                ch: (suggestionsInfo.result?.range.end ?? 0) +
                      (isKeyResult ? 1 : 0) + // for key result, extend index by 1 to cover the '=' in query
                      1, // end index is excluded so let's add 1
            },
            list: suggestionsInfo.suggestions.map((text, index) => ({
                // Append '=' to the key and ' ' to the value
                text: isKeyResult ? `${text}=` : `${text} `,
            })),
        }),
    };
    codeMirrorEditor.showHint(options);
}, [codeMirrorEditor, suggestionsInfo]);
```

Our function component simply renders the uncontrolled *CodeMirrorEditor* coming from *react-codemirror2.*

```
return (
    <CodeMirrorEditor
        options={{
            theme: 'material',
            lineNumbers: true
        }}
        editorDidMount={setCodeMirrorEditor}
        onCursorActivity={onChange}
    />
);
```

This component is relatively maintainable but as soon as the new functionality is added and the codebase extended, particular steps of the parsing and fetching process should be extracted into the independent hooks.

## Final words

We’ve been able to write our own simple *ANTLR* grammar, generate *TypeScript* classes out of that, and display the suggestions using the *CodeMirror*’s *show-hint* plugin. Such an implementation gives clear and tangible benefits to the users. They can use all the power of *Metrics Query Language* but don’t need to remember all the keywords and variables’ names which greatly fastens the experience of writing the queries. It’s a perfect combination of two worlds: simplicity of the basic UI query builder with the speed of writing queries using the keyboard in the advanced mode.

## That sounds great but what’s next?

Language is a very dynamic and living matter so it will surely change over time. In such a case, you’ll update the *.g4* grammar file, re-generate classes using *antlr4ts*, and probably add the new visit functions depending on your needs. I guess that you’ll also extend your language by adding the syntax to the existing foundation. I encourage you to explore various existing [*ANTLR* grammars](https://github.com/antlr/grammars-v4) to get inspiration as possibilities are nearly limitless. Good luck.

### Article Tags

- [DevOps &amp; IT Operations](https://www.sumologic.com/blog/devops-it-operations)

Sebastian Slepowronski

Senior Software Engineer

[](https://www.sumologic.com/feed "RSS Feed")[](https://twitter.com/intent/tweet?text=Building%20autocomplete%20with%20ANTLR%20and%20CodeMirror&url=https%3A%2F%2Fwww.sumologic.com%2Fblog%2Fbuilding-autocomplete-antlr-codemirror "X")[](https://www.facebook.com/sharer/sharer.php?u=https%3A%2F%2Fwww.sumologic.com%2Fblog%2Fbuilding-autocomplete-antlr-codemirror "Facebook")[](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fwww.sumologic.com%2Fblog%2Fbuilding-autocomplete-antlr-codemirror "Linkedin")

[Previous blog

Code42 launches a new app in the Sumo Logic open source partner ecosystem](https://www.sumologic.com/blog/code42-app-launch)[Next blog

Case Study: Genesys’ journey to the cloud and DevOps excellence](https://www.sumologic.com/blog/case-study-genesys-journey-to-the-cloud-devops-excellence)

People who read this also enjoyed

[  

Sumo Logic AWS Region European Sovereign Cloud is now generally available

June 2, 2026

 

 ](https://www.sumologic.com/blog/sumo-logic-aws-region-european-sovereign-cloud-generally-available)[  

How to secure cloud workloads without building a full-scale SOC

April 30, 2026

 

 ](https://www.sumologic.com/blog/secure-cloud-workloads-with-limited-resources)[  

Join operator and Query Agent for smarter log analysis

April 22, 2026

 

 ](https://www.sumologic.com/blog/using-the-join-operator)[  

92% of security leaders say their SIEM is effective. 51% say it’s exceptional. What’s living in that gap?

April 16, 2026

 ](https://www.sumologic.com/blog/from-effective-to-exceptional-siem)

[AI Instructions](https://www.sumologic.com/ai-instructions.md)
