1
0
mirror of https://github.com/alecthomas/chroma.git synced 2025-07-05 00:38:52 +02:00

33 Commits

Author SHA1 Message Date
40542a6255 refactor: migrate a bunch more Go-based lexers to XML
Also rename some existing XML lexers to their canonical XML name.
2023-09-09 12:29:23 +10:00
708662a581 feat: improve regex analysers in XML (#831) 2023-08-23 11:51:13 +10:00
a20cd7e8df feat: support basic regex analysers in XML (#828)
The `<analyse>` element contains a regex to match against the input, and
a score if the pattern matches.

The scores of all matching patterns for a lexer are summed.

Replaces #815, #813 and #826.
2023-08-22 05:32:23 +10:00
cc2dd5b8ad Version 2 of Chroma
This cleans up the API in general, removing a bunch of deprecated stuff,
cleaning up circular imports, etc.

But the biggest change is switching to an optional XML format for the
regex lexer.

Having lexers defined only in Go is not ideal for a couple of reasons.
Firstly, it impedes a significant portion of contributors who use Chroma
in Hugo, but don't know Go. Secondly, it bloats the binary size of any
project that imports Chroma.

Why XML? YAML is an abomination and JSON is not human editable. XML
also compresses very well (eg. Go template lexer XML compresses from
3239 bytes to 718).

Why a new syntax format? All major existing formats rely on the
Oniguruma regex engine, which is extremely complex and for which there
is no Go port.

Why not earlier? Prior to the existence of fs.FS this was not a viable
option.

Benchmarks:

    $ hyperfine --warmup 3 \
        './chroma.master --version' \
        './chroma.xml-pre-opt --version' \
        './chroma.xml --version'
    Benchmark 1: ./chroma.master --version
      Time (mean ± σ):       5.3 ms ±   0.5 ms    [User: 3.6 ms, System: 1.4 ms]
      Range (min … max):     4.2 ms …   6.6 ms    233 runs

    Benchmark 2: ./chroma.xml-pre-opt --version
      Time (mean ± σ):      50.6 ms ±   0.5 ms    [User: 52.4 ms, System: 3.6 ms]
      Range (min … max):    49.2 ms …  51.5 ms    51 runs

    Benchmark 3: ./chroma.xml --version
      Time (mean ± σ):       6.9 ms ±   1.1 ms    [User: 5.1 ms, System: 1.5 ms]
      Range (min … max):     5.7 ms …  19.9 ms    196 runs

    Summary
      './chroma.master --version' ran
        1.30 ± 0.23 times faster than './chroma.xml --version'
        9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version'

A slight increase in init time, but I think this is okay given the
increase in flexibility.

And binary size difference:

    $ du -h lexers.test*
    $ du -sh chroma*                                                                                                                                                                                                                                                                                                                                                                                                                                                             951371ms
    8.8M	chroma.master
    7.8M	chroma.xml
    7.8M	chroma.xml-pre-opt

Benchmarks:

    $ hyperfine --warmup 3 \
        './chroma.master --version' \
        './chroma.xml-pre-opt --version' \
        './chroma.xml --version'
    Benchmark 1: ./chroma.master --version
      Time (mean ± σ):       5.3 ms ±   0.5 ms    [User: 3.6 ms, System: 1.4 ms]
      Range (min … max):     4.2 ms …   6.6 ms    233 runs

    Benchmark 2: ./chroma.xml-pre-opt --version
      Time (mean ± σ):      50.6 ms ±   0.5 ms    [User: 52.4 ms, System: 3.6 ms]
      Range (min … max):    49.2 ms …  51.5 ms    51 runs

    Benchmark 3: ./chroma.xml --version
      Time (mean ± σ):       6.9 ms ±   1.1 ms    [User: 5.1 ms, System: 1.5 ms]
      Range (min … max):     5.7 ms …  19.9 ms    196 runs

    Summary
      './chroma.master --version' ran
        1.30 ± 0.23 times faster than './chroma.xml --version'
        9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version'

Incompatible changes:

- (*RegexLexer).SetAnalyser: changed from func(func(text string) float32) *RegexLexer to func(func(text string) float32) Lexer
- (*TokenType).UnmarshalJSON: removed
- Lexer.AnalyseText: added
- Lexer.SetAnalyser: added
- Lexer.SetRegistry: added
- MustNewLazyLexer: removed
- MustNewLexer: changed from func(*Config, Rules) *RegexLexer to func(*Config, func() Rules) *RegexLexer
- Mutators: changed from func(...Mutator) MutatorFunc to func(...Mutator) Mutator
- NewLazyLexer: removed
- NewLexer: changed from func(*Config, Rules) (*RegexLexer, error) to func(*Config, func() Rules) (*RegexLexer, error)
- Pop: changed from func(int) MutatorFunc to func(int) Mutator
- Push: changed from func(...string) MutatorFunc to func(...string) Mutator
- TokenType.MarshalJSON: removed
- Using: changed from func(Lexer) Emitter to func(string) Emitter
- UsingByGroup: changed from func(func(string) Lexer, int, int, ...Emitter) Emitter to func(int, int, ...Emitter) Emitter
2022-01-27 15:22:00 +11:00
d6bdd14670 sort lexers with lower case 2021-05-02 17:23:18 +10:00
34d9c7143b Add new TokeniseOption EnsureLF (#336)
* Add new TokeniseOption EnsureLF

ref #329

* Use efficient process suggested by @chmike
2020-03-04 18:56:47 +11:00
da5ac60d8c Add golangci-lint and fix all lint issues. 2018-12-31 22:46:59 +11:00
9c3abeae1d Tokens by value (#187)
This results in about a 8% improvement in speed.
2018-11-04 10:22:51 +11:00
15a009f0fc Add DelegatingLexer. 2018-03-17 13:44:03 +11:00
3020e2ea8c Fix bug with nested newlines.
Fixes #124.

Also reinstitute lexer tests that disappeared during package split.
2018-03-03 10:16:21 +11:00
e56590a815 Add data-driven test framework for lexers.
See #68.
2018-01-02 14:53:25 +11:00
93868c5a99 Add Objective-C and support lexer priorities.
Fixes #66.
2017-10-23 11:21:37 +11:00
ce3d6bf527 Invert default "ensure newline" behaviour so that it is opt-in.
See #47.
2017-09-30 14:41:05 +10:00
573c1d157d Ensure a newline exists at the end of files.
Fixes #42.
2017-09-29 21:59:52 +10:00
cc0e4a59ab Switch to an Iterator interface.
This is to solve an issue where writers returned by the Formatter
were often stateful, but this fact was not obvious to the API consumer,
and failed in interesting ways.
2017-09-20 22:19:36 +10:00
44b23f97b4 Split Regexp lexer into its own file. 2017-09-20 20:19:33 +10:00
3f230ec717 Add support for line numbers. 2017-09-20 13:33:44 +10:00
a72960340e Add test to pre-compile all regexes. 2017-09-19 14:15:33 +10:00
87183b3633 Add HTML formatter option for setting the tab width. 2017-09-19 13:14:29 +10:00
631fc87d6e Fix lua lexer, and actually check error value from compiling regexes :( 2017-09-19 12:05:53 +10:00
3df4c80190 Rename S -> R + sort list of lexers. 2017-09-19 10:47:22 +10:00
431e913333 Update documentation. Include "quick" package. 2017-09-18 13:15:07 +10:00
a10fd0a23d Switch to github.com/dlclark/regexp2.
This makes translating Pygments lexers much much simpler (and possible).
2017-09-18 11:16:44 +10:00
86bda70acd Switch to github.com/dlclark/regexp2 2017-09-15 22:18:20 +10:00
d12529ae61 HTML formatter + import all Pygments styles. 2017-07-20 00:01:29 -07:00
7ae55eb265 Wire up content sniffing. 2017-06-07 19:47:59 +10:00
5749aebe42 Generalise and support 8, 256 and 16m colour terminals. 2017-06-07 19:47:59 +10:00
1f47bd705c Use pointers to tokens + support regex flags in importer. 2017-06-07 19:47:59 +10:00
c64e5829b5 Add JavaScript. 2017-06-07 19:47:59 +10:00
5dedc6e45b Add a bunch of automatically translated lexers. 2017-06-07 19:47:59 +10:00
b30de35ff1 Use a callback to emit tokens.
This is a) faster and b) supports streaming output.
2017-06-07 19:47:59 +10:00
6dd81b044b Add Markdown processor. A bunch of performance improvements. 2017-06-07 19:47:59 +10:00
b2fb8edf77 Initial commit! Working! 2017-06-07 19:47:59 +10:00