1
0
mirror of https://github.com/alecthomas/chroma.git synced 2025-02-09 13:23:51 +02:00
Alec Thomas cc2dd5b8ad Version 2 of Chroma
This cleans up the API in general, removing a bunch of deprecated stuff,
cleaning up circular imports, etc.

But the biggest change is switching to an optional XML format for the
regex lexer.

Having lexers defined only in Go is not ideal for a couple of reasons.
Firstly, it impedes a significant portion of contributors who use Chroma
in Hugo, but don't know Go. Secondly, it bloats the binary size of any
project that imports Chroma.

Why XML? YAML is an abomination and JSON is not human editable. XML
also compresses very well (eg. Go template lexer XML compresses from
3239 bytes to 718).

Why a new syntax format? All major existing formats rely on the
Oniguruma regex engine, which is extremely complex and for which there
is no Go port.

Why not earlier? Prior to the existence of fs.FS this was not a viable
option.

Benchmarks:

    $ hyperfine --warmup 3 \
        './chroma.master --version' \
        './chroma.xml-pre-opt --version' \
        './chroma.xml --version'
    Benchmark 1: ./chroma.master --version
      Time (mean ± σ):       5.3 ms ±   0.5 ms    [User: 3.6 ms, System: 1.4 ms]
      Range (min … max):     4.2 ms …   6.6 ms    233 runs

    Benchmark 2: ./chroma.xml-pre-opt --version
      Time (mean ± σ):      50.6 ms ±   0.5 ms    [User: 52.4 ms, System: 3.6 ms]
      Range (min … max):    49.2 ms …  51.5 ms    51 runs

    Benchmark 3: ./chroma.xml --version
      Time (mean ± σ):       6.9 ms ±   1.1 ms    [User: 5.1 ms, System: 1.5 ms]
      Range (min … max):     5.7 ms …  19.9 ms    196 runs

    Summary
      './chroma.master --version' ran
        1.30 ± 0.23 times faster than './chroma.xml --version'
        9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version'

A slight increase in init time, but I think this is okay given the
increase in flexibility.

And binary size difference:

    $ du -h lexers.test*
    $ du -sh chroma*                                                                                                                                                                                                                                                                                                                                                                                                                                                             951371ms
    8.8M	chroma.master
    7.8M	chroma.xml
    7.8M	chroma.xml-pre-opt

Benchmarks:

    $ hyperfine --warmup 3 \
        './chroma.master --version' \
        './chroma.xml-pre-opt --version' \
        './chroma.xml --version'
    Benchmark 1: ./chroma.master --version
      Time (mean ± σ):       5.3 ms ±   0.5 ms    [User: 3.6 ms, System: 1.4 ms]
      Range (min … max):     4.2 ms …   6.6 ms    233 runs

    Benchmark 2: ./chroma.xml-pre-opt --version
      Time (mean ± σ):      50.6 ms ±   0.5 ms    [User: 52.4 ms, System: 3.6 ms]
      Range (min … max):    49.2 ms …  51.5 ms    51 runs

    Benchmark 3: ./chroma.xml --version
      Time (mean ± σ):       6.9 ms ±   1.1 ms    [User: 5.1 ms, System: 1.5 ms]
      Range (min … max):     5.7 ms …  19.9 ms    196 runs

    Summary
      './chroma.master --version' ran
        1.30 ± 0.23 times faster than './chroma.xml --version'
        9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version'

Incompatible changes:

- (*RegexLexer).SetAnalyser: changed from func(func(text string) float32) *RegexLexer to func(func(text string) float32) Lexer
- (*TokenType).UnmarshalJSON: removed
- Lexer.AnalyseText: added
- Lexer.SetAnalyser: added
- Lexer.SetRegistry: added
- MustNewLazyLexer: removed
- MustNewLexer: changed from func(*Config, Rules) *RegexLexer to func(*Config, func() Rules) *RegexLexer
- Mutators: changed from func(...Mutator) MutatorFunc to func(...Mutator) Mutator
- NewLazyLexer: removed
- NewLexer: changed from func(*Config, Rules) (*RegexLexer, error) to func(*Config, func() Rules) (*RegexLexer, error)
- Pop: changed from func(int) MutatorFunc to func(int) Mutator
- Push: changed from func(...string) MutatorFunc to func(...string) Mutator
- TokenType.MarshalJSON: removed
- Using: changed from func(Lexer) Emitter to func(string) Emitter
- UsingByGroup: changed from func(func(string) Lexer, int, int, ...Emitter) Emitter to func(int, int, ...Emitter) Emitter
2022-01-27 15:22:00 +11:00
..
2022-01-27 15:22:00 +11:00
2022-01-04 03:22:50 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2021-05-07 08:42:53 +10:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00
2022-01-27 15:22:00 +11:00

Lexer tests

The tests in this directory feed a known input testdata/<name>.actual into the parser for <name> and check that its output matches <name>.exported.

It is also possible to perform several tests on a same parser <name>, by placing know inputs *.actual into a directory testdata/<name>/.

Running the tests

Run the tests as normal:

go test ./lexers

Update existing tests

When you add a new test data file (*.actual), you need to regenerate all tests. That's how Chroma creates the *.expected test file based on the corresponding lexer.

To regenerate all tests, type in your terminal:

RECORD=true go test ./lexers

This first sets the RECORD environment variable to true. Then it runs go test on the ./lexers directory of the Chroma project.

(That environment variable tells Chroma it needs to output test data. After running go test ./lexers you can remove or reset that variable.)

Windows users

Windows users will find that the RECORD=true go test ./lexers command fails in both the standard command prompt terminal and in PowerShell.

Instead we have to perform both steps separately:

  • Set the RECORD environment variable to true.
    • In the regular command prompt window, the set command sets an environment variable for the current session: set RECORD=true. See this page for more.
    • In PowerShell, you can use the $env:RECORD = 'true' command for that. See this article for more.
    • You can also make a persistent environment variable by hand in the Windows computer settings. See this article for how.
  • When the environment variable is set, run go tests ./lexers.

Chroma will now regenerate the test files and print its results to the console window.