1
0
mirror of https://github.com/alecthomas/chroma.git synced 2025-02-05 13:05:18 +02:00

11 Commits

Author SHA1 Message Date
Mikhail Sorochan
3044bf5f32
Go lexer: single line comment without consuming endline, disable EnsureNL (#984)
This PR changes `CommentSingle` to not consume the newline at the end as
a part of comment.
That solves the problems of single line comment being not parsed at the
end of the line or at the end of the file. Which was reported earlier as
the reason to not highlight single line comment properly.

Disabling `EnsureNL: true` does not add unnecessary newline element for
`Text`, `CommentSymbol` symbols. Using chroma in console with syntax
highlighting was unusable becasue of this, since typing e.g. `b := `
adds newline each time space is at the end when host app asks for
highlighted text from `quick`.

Tokens behavior:
<table>
<tr>
<td> Before </td> <td> After </td>
</tr>
<tr>
<td>

``` go
t.Run("Single space", func(t *testing.T) {
        tokens, _ := chroma.Tokenise(Go, nil, " ")
        expected := []chroma.Token{
                {chroma.Text, " \n"},
        }
        assert.Equal(t, expected, tokens)
})
t.Run("Assignment unfinished", func(t *testing.T) {
        tokens, _ := chroma.Tokenise(Go, nil, "i = ")
        expected := []chroma.Token{
                { chroma.NameOther, "i" },
                { chroma.Text, " " },
                { chroma.Punctuation, "=" },
                { chroma.Text, " \n" },
        }
        assert.Equal(t, expected, tokens)
})
t.Run("Single comment", func(t *testing.T) {
        tokens, _ := chroma.Tokenise(Go, nil, "// W")
        expected := []chroma.Token{
                { chroma.CommentSingle, "// W\n" },
        }
        assert.Equal(t, expected, tokens)
})
```

</td>
<td>
    
``` go
t.Run("Single space", func(t *testing.T) {
        tokens, _ := chroma.Tokenise(Go, nil, " ")
        expected := []chroma.Token{
                {chroma.Text, " "},
        }
        assert.Equal(t, expected, tokens)
})
t.Run("Assignment unfinished", func(t *testing.T) {
        tokens, _ := chroma.Tokenise(Go, nil, "i = ")
        expected := []chroma.Token{
                { chroma.NameOther, "i" },
                { chroma.Text, " " },
                { chroma.Punctuation, "=" },
                { chroma.Text, " " },
        }
        assert.Equal(t, expected, tokens)
})
t.Run("Single comment", func(t *testing.T) {
        tokens, _ := chroma.Tokenise(Go, nil, "// W")
        expected := []chroma.Token{
                { chroma.CommentSingle, "// W" },
        }
        assert.Equal(t, expected, tokens)
})
```
</td>
</tr>
</table>
2024-07-23 02:19:08 +10:00
Abhinav Gupta
506e36f9e0
fix(lexers/go): "~" is a valid token (#926)
With the introduction of generics,
tilde is a valid punctuation token in Go programs.
https://go.dev/ref/spec#Operators_and_punctuation

This updates the punctuation regex for the Go lexer,
and adds a test to ensure that it's treated as such.
2024-02-12 14:51:38 +11:00
Eli Bendersky
c11725d832
Add new Go 1.21 builtins to the Go lexer: clear, min, max (#829) 2023-08-23 06:35:37 +10:00
Alec Thomas
cc2dd5b8ad Version 2 of Chroma
This cleans up the API in general, removing a bunch of deprecated stuff,
cleaning up circular imports, etc.

But the biggest change is switching to an optional XML format for the
regex lexer.

Having lexers defined only in Go is not ideal for a couple of reasons.
Firstly, it impedes a significant portion of contributors who use Chroma
in Hugo, but don't know Go. Secondly, it bloats the binary size of any
project that imports Chroma.

Why XML? YAML is an abomination and JSON is not human editable. XML
also compresses very well (eg. Go template lexer XML compresses from
3239 bytes to 718).

Why a new syntax format? All major existing formats rely on the
Oniguruma regex engine, which is extremely complex and for which there
is no Go port.

Why not earlier? Prior to the existence of fs.FS this was not a viable
option.

Benchmarks:

    $ hyperfine --warmup 3 \
        './chroma.master --version' \
        './chroma.xml-pre-opt --version' \
        './chroma.xml --version'
    Benchmark 1: ./chroma.master --version
      Time (mean ± σ):       5.3 ms ±   0.5 ms    [User: 3.6 ms, System: 1.4 ms]
      Range (min … max):     4.2 ms …   6.6 ms    233 runs

    Benchmark 2: ./chroma.xml-pre-opt --version
      Time (mean ± σ):      50.6 ms ±   0.5 ms    [User: 52.4 ms, System: 3.6 ms]
      Range (min … max):    49.2 ms …  51.5 ms    51 runs

    Benchmark 3: ./chroma.xml --version
      Time (mean ± σ):       6.9 ms ±   1.1 ms    [User: 5.1 ms, System: 1.5 ms]
      Range (min … max):     5.7 ms …  19.9 ms    196 runs

    Summary
      './chroma.master --version' ran
        1.30 ± 0.23 times faster than './chroma.xml --version'
        9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version'

A slight increase in init time, but I think this is okay given the
increase in flexibility.

And binary size difference:

    $ du -h lexers.test*
    $ du -sh chroma*                                                                                                                                                                                                                                                                                                                                                                                                                                                             951371ms
    8.8M	chroma.master
    7.8M	chroma.xml
    7.8M	chroma.xml-pre-opt

Benchmarks:

    $ hyperfine --warmup 3 \
        './chroma.master --version' \
        './chroma.xml-pre-opt --version' \
        './chroma.xml --version'
    Benchmark 1: ./chroma.master --version
      Time (mean ± σ):       5.3 ms ±   0.5 ms    [User: 3.6 ms, System: 1.4 ms]
      Range (min … max):     4.2 ms …   6.6 ms    233 runs

    Benchmark 2: ./chroma.xml-pre-opt --version
      Time (mean ± σ):      50.6 ms ±   0.5 ms    [User: 52.4 ms, System: 3.6 ms]
      Range (min … max):    49.2 ms …  51.5 ms    51 runs

    Benchmark 3: ./chroma.xml --version
      Time (mean ± σ):       6.9 ms ±   1.1 ms    [User: 5.1 ms, System: 1.5 ms]
      Range (min … max):     5.7 ms …  19.9 ms    196 runs

    Summary
      './chroma.master --version' ran
        1.30 ± 0.23 times faster than './chroma.xml --version'
        9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version'

Incompatible changes:

- (*RegexLexer).SetAnalyser: changed from func(func(text string) float32) *RegexLexer to func(func(text string) float32) Lexer
- (*TokenType).UnmarshalJSON: removed
- Lexer.AnalyseText: added
- Lexer.SetAnalyser: added
- Lexer.SetRegistry: added
- MustNewLazyLexer: removed
- MustNewLexer: changed from func(*Config, Rules) *RegexLexer to func(*Config, func() Rules) *RegexLexer
- Mutators: changed from func(...Mutator) MutatorFunc to func(...Mutator) Mutator
- NewLazyLexer: removed
- NewLexer: changed from func(*Config, Rules) (*RegexLexer, error) to func(*Config, func() Rules) (*RegexLexer, error)
- Pop: changed from func(int) MutatorFunc to func(int) Mutator
- Push: changed from func(...string) MutatorFunc to func(...string) Mutator
- TokenType.MarshalJSON: removed
- Using: changed from func(Lexer) Emitter to func(string) Emitter
- UsingByGroup: changed from func(func(string) Lexer, int, int, ...Emitter) Emitter to func(int, int, ...Emitter) Emitter
2022-01-27 15:22:00 +11:00
Alec Thomas
563aadc53c Moved lexers into alphabetical sub-packages.
This was done to speed up incremental compilation when working on
lexers. That is, modifying a single lexer will no longer require
recompiling all lexers.

This is a (slightly) backwards breaking change in that lexers are no
longer exported directly in the lexers package. The registry API is
"aliased" at the old location.
2018-02-15 21:09:02 +11:00
Alec Thomas
431e913333 Update documentation. Include "quick" package. 2017-09-18 13:15:07 +10:00
Alec Thomas
a10fd0a23d Switch to github.com/dlclark/regexp2.
This makes translating Pygments lexers much much simpler (and possible).
2017-09-18 11:16:44 +10:00
Alec Thomas
d12529ae61 HTML formatter + import all Pygments styles. 2017-07-20 00:01:29 -07:00
Alec Thomas
7ae55eb265 Wire up content sniffing. 2017-06-07 19:47:59 +10:00
Alec Thomas
6dd81b044b Add Markdown processor. A bunch of performance improvements. 2017-06-07 19:47:59 +10:00
Alec Thomas
b2fb8edf77 Initial commit! Working! 2017-06-07 19:47:59 +10:00