chroma

go/chroma

mirror of https://github.com/alecthomas/chroma.git synced 2025-02-05 13:05:18 +02:00

Author	SHA1	Message	Date
Mikhail Sorochan	3044bf5f32	Go lexer: single line comment without consuming endline, disable EnsureNL (#984 ) This PR changes `CommentSingle` to not consume the newline at the end as a part of comment. That solves the problems of single line comment being not parsed at the end of the line or at the end of the file. Which was reported earlier as the reason to not highlight single line comment properly. Disabling `EnsureNL: true` does not add unnecessary newline element for `Text`, `CommentSymbol` symbols. Using chroma in console with syntax highlighting was unusable becasue of this, since typing e.g. `b := ` adds newline each time space is at the end when host app asks for highlighted text from `quick`. Tokens behavior: <table> <tr> <td> Before </td> <td> After </td> </tr> <tr> <td> ``` go t.Run("Single space", func(t testing.T) { tokens, _ := chroma.Tokenise(Go, nil, " ") expected := []chroma.Token{ {chroma.Text, " \n"}, } assert.Equal(t, expected, tokens) }) t.Run("Assignment unfinished", func(t testing.T) { tokens, _ := chroma.Tokenise(Go, nil, "i = ") expected := []chroma.Token{ { chroma.NameOther, "i" }, { chroma.Text, " " }, { chroma.Punctuation, "=" }, { chroma.Text, " \n" }, } assert.Equal(t, expected, tokens) }) t.Run("Single comment", func(t testing.T) { tokens, _ := chroma.Tokenise(Go, nil, "// W") expected := []chroma.Token{ { chroma.CommentSingle, "// W\n" }, } assert.Equal(t, expected, tokens) }) ``` </td> <td> ``` go t.Run("Single space", func(t testing.T) { tokens, _ := chroma.Tokenise(Go, nil, " ") expected := []chroma.Token{ {chroma.Text, " "}, } assert.Equal(t, expected, tokens) }) t.Run("Assignment unfinished", func(t testing.T) { tokens, _ := chroma.Tokenise(Go, nil, "i = ") expected := []chroma.Token{ { chroma.NameOther, "i" }, { chroma.Text, " " }, { chroma.Punctuation, "=" }, { chroma.Text, " " }, } assert.Equal(t, expected, tokens) }) t.Run("Single comment", func(t testing.T) { tokens, _ := chroma.Tokenise(Go, nil, "// W") expected := []chroma.Token{ { chroma.CommentSingle, "// W" }, } assert.Equal(t, expected, tokens) }) ``` </td> </tr> </table>	2024-07-23 02:19:08 +10:00
Abhinav Gupta	506e36f9e0	fix(lexers/go): "~" is a valid token (#926 ) With the introduction of generics, tilde is a valid punctuation token in Go programs. https://go.dev/ref/spec#Operators_and_punctuation This updates the punctuation regex for the Go lexer, and adds a test to ensure that it's treated as such.	2024-02-12 14:51:38 +11:00
Eli Bendersky	c11725d832	Add new Go 1.21 builtins to the Go lexer: clear, min, max (#829 )	2023-08-23 06:35:37 +10:00
Alec Thomas	cc2dd5b8ad	Version 2 of Chroma This cleans up the API in general, removing a bunch of deprecated stuff, cleaning up circular imports, etc. But the biggest change is switching to an optional XML format for the regex lexer. Having lexers defined only in Go is not ideal for a couple of reasons. Firstly, it impedes a significant portion of contributors who use Chroma in Hugo, but don't know Go. Secondly, it bloats the binary size of any project that imports Chroma. Why XML? YAML is an abomination and JSON is not human editable. XML also compresses very well (eg. Go template lexer XML compresses from 3239 bytes to 718). Why a new syntax format? All major existing formats rely on the Oniguruma regex engine, which is extremely complex and for which there is no Go port. Why not earlier? Prior to the existence of fs.FS this was not a viable option. Benchmarks: $ hyperfine --warmup 3 \ './chroma.master --version' \ './chroma.xml-pre-opt --version' \ './chroma.xml --version' Benchmark 1: ./chroma.master --version Time (mean ± σ): 5.3 ms ± 0.5 ms [User: 3.6 ms, System: 1.4 ms] Range (min … max): 4.2 ms … 6.6 ms 233 runs Benchmark 2: ./chroma.xml-pre-opt --version Time (mean ± σ): 50.6 ms ± 0.5 ms [User: 52.4 ms, System: 3.6 ms] Range (min … max): 49.2 ms … 51.5 ms 51 runs Benchmark 3: ./chroma.xml --version Time (mean ± σ): 6.9 ms ± 1.1 ms [User: 5.1 ms, System: 1.5 ms] Range (min … max): 5.7 ms … 19.9 ms 196 runs Summary './chroma.master --version' ran 1.30 ± 0.23 times faster than './chroma.xml --version' 9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version' A slight increase in init time, but I think this is okay given the increase in flexibility. And binary size difference: $ du -h lexers.test* $ du -sh chroma* 951371ms 8.8M chroma.master 7.8M chroma.xml 7.8M chroma.xml-pre-opt Benchmarks: $ hyperfine --warmup 3 \ './chroma.master --version' \ './chroma.xml-pre-opt --version' \ './chroma.xml --version' Benchmark 1: ./chroma.master --version Time (mean ± σ): 5.3 ms ± 0.5 ms [User: 3.6 ms, System: 1.4 ms] Range (min … max): 4.2 ms … 6.6 ms 233 runs Benchmark 2: ./chroma.xml-pre-opt --version Time (mean ± σ): 50.6 ms ± 0.5 ms [User: 52.4 ms, System: 3.6 ms] Range (min … max): 49.2 ms … 51.5 ms 51 runs Benchmark 3: ./chroma.xml --version Time (mean ± σ): 6.9 ms ± 1.1 ms [User: 5.1 ms, System: 1.5 ms] Range (min … max): 5.7 ms … 19.9 ms 196 runs Summary './chroma.master --version' ran 1.30 ± 0.23 times faster than './chroma.xml --version' 9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version' Incompatible changes: - (RegexLexer).SetAnalyser: changed from func(func(text string) float32) RegexLexer to func(func(text string) float32) Lexer - (TokenType).UnmarshalJSON: removed - Lexer.AnalyseText: added - Lexer.SetAnalyser: added - Lexer.SetRegistry: added - MustNewLazyLexer: removed - MustNewLexer: changed from func(Config, Rules) RegexLexer to func(Config, func() Rules) RegexLexer - Mutators: changed from func(...Mutator) MutatorFunc to func(...Mutator) Mutator - NewLazyLexer: removed - NewLexer: changed from func(Config, Rules) (RegexLexer, error) to func(Config, func() Rules) (*RegexLexer, error) - Pop: changed from func(int) MutatorFunc to func(int) Mutator - Push: changed from func(...string) MutatorFunc to func(...string) Mutator - TokenType.MarshalJSON: removed - Using: changed from func(Lexer) Emitter to func(string) Emitter - UsingByGroup: changed from func(func(string) Lexer, int, int, ...Emitter) Emitter to func(int, int, ...Emitter) Emitter	2022-01-27 15:22:00 +11:00
Alec Thomas	563aadc53c	Moved lexers into alphabetical sub-packages. This was done to speed up incremental compilation when working on lexers. That is, modifying a single lexer will no longer require recompiling all lexers. This is a (slightly) backwards breaking change in that lexers are no longer exported directly in the lexers package. The registry API is "aliased" at the old location.	2018-02-15 21:09:02 +11:00
Alec Thomas	431e913333	Update documentation. Include "quick" package.	2017-09-18 13:15:07 +10:00
Alec Thomas	a10fd0a23d	Switch to github.com/dlclark/regexp2. This makes translating Pygments lexers much much simpler (and possible).	2017-09-18 11:16:44 +10:00
Alec Thomas	d12529ae61	HTML formatter + import all Pygments styles.	2017-07-20 00:01:29 -07:00
Alec Thomas	7ae55eb265	Wire up content sniffing.	2017-06-07 19:47:59 +10:00
Alec Thomas	6dd81b044b	Add Markdown processor. A bunch of performance improvements.	2017-06-07 19:47:59 +10:00
Alec Thomas	b2fb8edf77	Initial commit! Working!	2017-06-07 19:47:59 +10:00

11 Commits