1
0
mirror of https://github.com/alecthomas/chroma.git synced 2025-03-19 21:10:15 +02:00

40 Commits

Author SHA1 Message Date
Alec Thomas
cc2dd5b8ad Version 2 of Chroma
This cleans up the API in general, removing a bunch of deprecated stuff,
cleaning up circular imports, etc.

But the biggest change is switching to an optional XML format for the
regex lexer.

Having lexers defined only in Go is not ideal for a couple of reasons.
Firstly, it impedes a significant portion of contributors who use Chroma
in Hugo, but don't know Go. Secondly, it bloats the binary size of any
project that imports Chroma.

Why XML? YAML is an abomination and JSON is not human editable. XML
also compresses very well (eg. Go template lexer XML compresses from
3239 bytes to 718).

Why a new syntax format? All major existing formats rely on the
Oniguruma regex engine, which is extremely complex and for which there
is no Go port.

Why not earlier? Prior to the existence of fs.FS this was not a viable
option.

Benchmarks:

    $ hyperfine --warmup 3 \
        './chroma.master --version' \
        './chroma.xml-pre-opt --version' \
        './chroma.xml --version'
    Benchmark 1: ./chroma.master --version
      Time (mean ± σ):       5.3 ms ±   0.5 ms    [User: 3.6 ms, System: 1.4 ms]
      Range (min … max):     4.2 ms …   6.6 ms    233 runs

    Benchmark 2: ./chroma.xml-pre-opt --version
      Time (mean ± σ):      50.6 ms ±   0.5 ms    [User: 52.4 ms, System: 3.6 ms]
      Range (min … max):    49.2 ms …  51.5 ms    51 runs

    Benchmark 3: ./chroma.xml --version
      Time (mean ± σ):       6.9 ms ±   1.1 ms    [User: 5.1 ms, System: 1.5 ms]
      Range (min … max):     5.7 ms …  19.9 ms    196 runs

    Summary
      './chroma.master --version' ran
        1.30 ± 0.23 times faster than './chroma.xml --version'
        9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version'

A slight increase in init time, but I think this is okay given the
increase in flexibility.

And binary size difference:

    $ du -h lexers.test*
    $ du -sh chroma*                                                                                                                                                                                                                                                                                                                                                                                                                                                             951371ms
    8.8M	chroma.master
    7.8M	chroma.xml
    7.8M	chroma.xml-pre-opt

Benchmarks:

    $ hyperfine --warmup 3 \
        './chroma.master --version' \
        './chroma.xml-pre-opt --version' \
        './chroma.xml --version'
    Benchmark 1: ./chroma.master --version
      Time (mean ± σ):       5.3 ms ±   0.5 ms    [User: 3.6 ms, System: 1.4 ms]
      Range (min … max):     4.2 ms …   6.6 ms    233 runs

    Benchmark 2: ./chroma.xml-pre-opt --version
      Time (mean ± σ):      50.6 ms ±   0.5 ms    [User: 52.4 ms, System: 3.6 ms]
      Range (min … max):    49.2 ms …  51.5 ms    51 runs

    Benchmark 3: ./chroma.xml --version
      Time (mean ± σ):       6.9 ms ±   1.1 ms    [User: 5.1 ms, System: 1.5 ms]
      Range (min … max):     5.7 ms …  19.9 ms    196 runs

    Summary
      './chroma.master --version' ran
        1.30 ± 0.23 times faster than './chroma.xml --version'
        9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version'

Incompatible changes:

- (*RegexLexer).SetAnalyser: changed from func(func(text string) float32) *RegexLexer to func(func(text string) float32) Lexer
- (*TokenType).UnmarshalJSON: removed
- Lexer.AnalyseText: added
- Lexer.SetAnalyser: added
- Lexer.SetRegistry: added
- MustNewLazyLexer: removed
- MustNewLexer: changed from func(*Config, Rules) *RegexLexer to func(*Config, func() Rules) *RegexLexer
- Mutators: changed from func(...Mutator) MutatorFunc to func(...Mutator) Mutator
- NewLazyLexer: removed
- NewLexer: changed from func(*Config, Rules) (*RegexLexer, error) to func(*Config, func() Rules) (*RegexLexer, error)
- Pop: changed from func(int) MutatorFunc to func(int) Mutator
- Push: changed from func(...string) MutatorFunc to func(...string) Mutator
- TokenType.MarshalJSON: removed
- Using: changed from func(Lexer) Emitter to func(string) Emitter
- UsingByGroup: changed from func(func(string) Lexer, int, int, ...Emitter) Emitter to func(int, int, ...Emitter) Emitter
2022-01-27 15:22:00 +11:00
Alec Thomas
9a8a647afb Report file pattern errors when a lexer is initialised.
See #555
2021-09-27 14:27:46 +10:00
Siavash Askari Nasr
10329f849e
Add ByGroupNames function, same as ByGroups but use named groups (#519)
For named groups that are not given, an Error will be emitted anyway.

This also handles the case when an Emitter for group `0` is provided
or not. Since numbers can also be used for names.
But it might be over-doing, because why would anyone use ByGroupNames
if they wanted to assign a token to the whole match?!
2021-06-08 22:26:59 +10:00
Siavash Askari Nasr
22cbca546a Allow skipping group's emitter, via passing nil as emitter 2021-06-07 22:45:39 +10:00
Ville Skyttä
b5d03c0079 feat(regexlexer): compile in RE2 compatibility mode
To better match vanilla Go regexps and support some additional
constructs that might be present in Pygments rules.

https://github.com/dlclark/regexp2#re2-compatibility-mode
2021-05-17 14:09:19 +10:00
Siavash Askari Nasr
2e23e7f215 regexp2 uses number of group as its name so name check isn't needed 2021-05-08 18:48:49 +10:00
mlpo
ff6eedba72
Fix: sort words in descending order of length before regex generation (#496)
* Fix: sort words in descending order of length before regex generation

* Avoid code duplication in Raku lexer
2021-05-08 09:10:18 +10:00
Siavash Askari Nasr
225e1862d3 Pass *LexerState as context to emitters
Useful for accessing named capture groups and context set by
`mutators` and other field and methods LexerState provides.
2021-05-07 22:55:54 +10:00
Siavash Askari Nasr
dcfd826b25 Add support for named capture groups 2021-05-06 21:34:28 +10:00
Alec Thomas
7e282be495 Update golangci-lint so we can force use of LazyLexer. 2021-04-29 12:08:28 +10:00
Cameron Moore
59126c5b32
Add NewLazyLexer to defer rules definitions and reduce init costs (#449)
Add NewLazyLexer and MustNewLazyLexer which accept a function that
returns the rules for the lexer.  This allows us to defer the rules
definitions until they're needed.

Lexers in a, g, s, and x packages have been updated to use the new lazy
lexer.
2021-02-08 12:16:49 +11:00
Alec Thomas
5da831672d Fix a few bugs including sub-lexers adding additional newlines when
EnsureNL is true.
2021-02-06 20:13:50 +11:00
Alec Thomas
e62d93f4aa Add a timeout to regexes.
This avoids pathologically bad match times. Fixes #378.
2020-07-08 20:23:13 +10:00
Alec Thomas
2b9ea60d89 Split PHP into two lexers - PHP and PHTML.
The former is pure PHP code while the latter is PHP code in <? ?> tags,
within HTML.

Fixes #210.
2020-06-30 21:00:09 +10:00
Alec Thomas
ee4284bb40 Add a Rules.Merge() helper function.
Might be useful for #363.
2020-05-16 16:04:21 +10:00
satotake
34d9c7143b
Add new TokeniseOption EnsureLF (#336)
* Add new TokeniseOption EnsureLF

ref #329

* Use efficient process suggested by @chmike
2020-03-04 18:56:47 +11:00
Alec Thomas
28dcb8565c Fixes #305. 2019-11-24 12:49:34 +11:00
Alec Thomas
bbc59ac372 Emit error tokens when there's a group mismatch.
Also don't panic/recover, as we no longer use panic to report "real"
errors.

Fixes #295.
2019-10-24 17:03:35 +11:00
Alec Thomas
73d11b3c45 Clear background colour for TTY formatters. 2019-10-15 21:08:17 +11:00
Alec Thomas
ea14dd8660 Fixed a fundamental bug where ^ would always match.
The engine was always passing a string sliced to the current position,
resulting in ^ always matching. Switched to use
FindRunesMatchStartingAt.

Fixes #242.
2019-06-12 12:32:20 +10:00
Alec Thomas
2105c68ed2 Implemented a weird little Pygments rule that I missed.
> If the RegexLexer encounters a newline that is flagged as an error
> token, the stack is emptied and the lexer continues scanning in the
> 'root' state. This can help producing error-tolerant highlighting for
> erroneous input, e.g. when a single-line string is not closed.

Fixes #246.
2019-04-22 18:22:58 +10:00
Alec Thomas
da5ac60d8c Add golangci-lint and fix all lint issues. 2018-12-31 22:46:59 +11:00
Daniel Eloff
9c3abeae1d Tokens by value (#187)
This results in about a 8% improvement in speed.
2018-11-04 10:22:51 +11:00
Kenneth Shaw
95d0a9381b Fix Dollar-Quoted Strings (postgres + cql)
This commit refactors code from the markdown lexer into the chroma
package, and alters the PostgreSQL and CQL lexers to make use of it.

Additionally, an example markdown with the various sublexers is added.
2018-06-12 09:16:18 +07:00
Alec Thomas
f315512f5c Add support for Go templates.
These are exposed as go-text-template and go-html-template.

Fixes #105.
2018-03-18 21:57:34 +11:00
Alec Thomas
3020e2ea8c Fix bug with nested newlines.
Fixes #124.

Also reinstitute lexer tests that disappeared during package split.
2018-03-03 10:16:21 +11:00
Alec Thomas
35126f9a94 Implement rudimentary JSX lexer based on https://github.com/fcurella/jsx-lexer/blob/master/jsx/lexer.py
Fixes #111.
2018-02-07 22:11:40 +11:00
Alec Thomas
ce3d6bf527 Invert default "ensure newline" behaviour so that it is opt-in.
See #47.
2017-09-30 14:41:05 +10:00
Alec Thomas
573c1d157d Ensure a newline exists at the end of files.
Fixes #42.
2017-09-29 21:59:52 +10:00
Alec Thomas
d5083b3f7c Big changes to the style and colour APIs.
- Styles now use a builder system, to enforce immutability of styles.
- Corrected and cleaned up how style inheritance works.
- Added a brightening function to colours
- HTML formatter will now automatically pick line and highlight colours
  if they are not provided in the style. This is done by slightly
  darkening or lightening.

Fixes #21.
2017-09-23 22:09:46 +10:00
Alec Thomas
9d7539a4cd Fix bug in Turtle lexer. 2017-09-22 23:27:40 +10:00
Alec Thomas
a5a3b67010 Reprocess all rules after a LexerMutator is applied. 2017-09-22 23:14:32 +10:00
Alec Thomas
2ce2ec7f65 Fix bug with empty states. 2017-09-22 22:40:00 +10:00
Alec Thomas
0bb853fb4f Convert Include to a LexerMutator.
Fixes #18.
2017-09-22 22:29:17 +10:00
Alec Thomas
1724aab879 Implement compile-time lexer mutators.
This should fix #15.
2017-09-21 20:02:53 +10:00
Alec Thomas
60797cc03f Add tracing + better error recovery. 2017-09-21 17:52:28 +10:00
Alec Thomas
e2d6abaa64 Document and add iterator panic recovery. 2017-09-20 23:06:23 +10:00
Alec Thomas
cc0e4a59ab Switch to an Iterator interface.
This is to solve an issue where writers returned by the Formatter
were often stateful, but this fact was not obvious to the API consumer,
and failed in interesting ways.
2017-09-20 22:19:36 +10:00
Alec Thomas
36ead7258a Use utf8.RuneCountInString() rather than len() :(
Fixes #10. Thanks @curio77.
2017-09-20 20:36:25 +10:00
Alec Thomas
44b23f97b4 Split Regexp lexer into its own file. 2017-09-20 20:19:33 +10:00