chroma

go/chroma

mirror of https://github.com/alecthomas/chroma.git synced 2025-03-19 21:10:15 +02:00

Author	SHA1	Message	Date
Alec Thomas	4dd2cbef84	fix: bump to latest dclark/regexp2 Fixes #805	2023-09-09 11:46:52 +10:00
Carlos Henrique Guardão Gandarez	708662a581	feat: improve regex analysers in XML (#831 )	2023-08-23 11:51:13 +10:00
Alec Thomas	47ce9a21b1	fix: vim lexer was marking `\n` as errors Fixes #827	2023-08-22 08:39:01 +10:00
Alec Thomas	cc2dd5b8ad	Version 2 of Chroma This cleans up the API in general, removing a bunch of deprecated stuff, cleaning up circular imports, etc. But the biggest change is switching to an optional XML format for the regex lexer. Having lexers defined only in Go is not ideal for a couple of reasons. Firstly, it impedes a significant portion of contributors who use Chroma in Hugo, but don't know Go. Secondly, it bloats the binary size of any project that imports Chroma. Why XML? YAML is an abomination and JSON is not human editable. XML also compresses very well (eg. Go template lexer XML compresses from 3239 bytes to 718). Why a new syntax format? All major existing formats rely on the Oniguruma regex engine, which is extremely complex and for which there is no Go port. Why not earlier? Prior to the existence of fs.FS this was not a viable option. Benchmarks: $ hyperfine --warmup 3 \ './chroma.master --version' \ './chroma.xml-pre-opt --version' \ './chroma.xml --version' Benchmark 1: ./chroma.master --version Time (mean ± σ): 5.3 ms ± 0.5 ms [User: 3.6 ms, System: 1.4 ms] Range (min … max): 4.2 ms … 6.6 ms 233 runs Benchmark 2: ./chroma.xml-pre-opt --version Time (mean ± σ): 50.6 ms ± 0.5 ms [User: 52.4 ms, System: 3.6 ms] Range (min … max): 49.2 ms … 51.5 ms 51 runs Benchmark 3: ./chroma.xml --version Time (mean ± σ): 6.9 ms ± 1.1 ms [User: 5.1 ms, System: 1.5 ms] Range (min … max): 5.7 ms … 19.9 ms 196 runs Summary './chroma.master --version' ran 1.30 ± 0.23 times faster than './chroma.xml --version' 9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version' A slight increase in init time, but I think this is okay given the increase in flexibility. And binary size difference: $ du -h lexers.test* $ du -sh chroma* 951371ms 8.8M chroma.master 7.8M chroma.xml 7.8M chroma.xml-pre-opt Benchmarks: $ hyperfine --warmup 3 \ './chroma.master --version' \ './chroma.xml-pre-opt --version' \ './chroma.xml --version' Benchmark 1: ./chroma.master --version Time (mean ± σ): 5.3 ms ± 0.5 ms [User: 3.6 ms, System: 1.4 ms] Range (min … max): 4.2 ms … 6.6 ms 233 runs Benchmark 2: ./chroma.xml-pre-opt --version Time (mean ± σ): 50.6 ms ± 0.5 ms [User: 52.4 ms, System: 3.6 ms] Range (min … max): 49.2 ms … 51.5 ms 51 runs Benchmark 3: ./chroma.xml --version Time (mean ± σ): 6.9 ms ± 1.1 ms [User: 5.1 ms, System: 1.5 ms] Range (min … max): 5.7 ms … 19.9 ms 196 runs Summary './chroma.master --version' ran 1.30 ± 0.23 times faster than './chroma.xml --version' 9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version' Incompatible changes: - (RegexLexer).SetAnalyser: changed from func(func(text string) float32) RegexLexer to func(func(text string) float32) Lexer - (TokenType).UnmarshalJSON: removed - Lexer.AnalyseText: added - Lexer.SetAnalyser: added - Lexer.SetRegistry: added - MustNewLazyLexer: removed - MustNewLexer: changed from func(Config, Rules) RegexLexer to func(Config, func() Rules) RegexLexer - Mutators: changed from func(...Mutator) MutatorFunc to func(...Mutator) Mutator - NewLazyLexer: removed - NewLexer: changed from func(Config, Rules) (RegexLexer, error) to func(Config, func() Rules) (*RegexLexer, error) - Pop: changed from func(int) MutatorFunc to func(int) Mutator - Push: changed from func(...string) MutatorFunc to func(...string) Mutator - TokenType.MarshalJSON: removed - Using: changed from func(Lexer) Emitter to func(string) Emitter - UsingByGroup: changed from func(func(string) Lexer, int, int, ...Emitter) Emitter to func(int, int, ...Emitter) Emitter	2022-01-27 15:22:00 +11:00
Alec Thomas	9a8a647afb	Report file pattern errors when a lexer is initialised. See #555	2021-09-27 14:27:46 +10:00
Siavash Askari Nasr	10329f849e	Add ByGroupNames function, same as ByGroups but use named groups (#519 ) For named groups that are not given, an Error will be emitted anyway. This also handles the case when an Emitter for group `0` is provided or not. Since numbers can also be used for names. But it might be over-doing, because why would anyone use ByGroupNames if they wanted to assign a token to the whole match?!	2021-06-08 22:26:59 +10:00
Siavash Askari Nasr	22cbca546a	Allow skipping group's emitter, via passing nil as emitter	2021-06-07 22:45:39 +10:00
Ville Skyttä	b5d03c0079	feat(regexlexer): compile in RE2 compatibility mode To better match vanilla Go regexps and support some additional constructs that might be present in Pygments rules. https://github.com/dlclark/regexp2#re2-compatibility-mode	2021-05-17 14:09:19 +10:00
Siavash Askari Nasr	2e23e7f215	regexp2 uses number of group as its name so name check isn't needed	2021-05-08 18:48:49 +10:00
mlpo	ff6eedba72	Fix: sort words in descending order of length before regex generation (#496 ) * Fix: sort words in descending order of length before regex generation * Avoid code duplication in Raku lexer	2021-05-08 09:10:18 +10:00
Siavash Askari Nasr	225e1862d3	Pass `*LexerState` as context to emitters Useful for accessing named capture groups and context set by `mutators` and other field and methods LexerState provides.	2021-05-07 22:55:54 +10:00
Siavash Askari Nasr	dcfd826b25	Add support for named capture groups	2021-05-06 21:34:28 +10:00
Alec Thomas	7e282be495	Update golangci-lint so we can force use of LazyLexer.	2021-04-29 12:08:28 +10:00
Cameron Moore	59126c5b32	Add NewLazyLexer to defer rules definitions and reduce init costs (#449 ) Add NewLazyLexer and MustNewLazyLexer which accept a function that returns the rules for the lexer. This allows us to defer the rules definitions until they're needed. Lexers in a, g, s, and x packages have been updated to use the new lazy lexer.	2021-02-08 12:16:49 +11:00
Alec Thomas	5da831672d	Fix a few bugs including sub-lexers adding additional newlines when EnsureNL is true.	2021-02-06 20:13:50 +11:00
Alec Thomas	e62d93f4aa	Add a timeout to regexes. This avoids pathologically bad match times. Fixes #378.	2020-07-08 20:23:13 +10:00
Alec Thomas	2b9ea60d89	Split PHP into two lexers - PHP and PHTML. The former is pure PHP code while the latter is PHP code in <? ?> tags, within HTML. Fixes #210.	2020-06-30 21:00:09 +10:00
Alec Thomas	ee4284bb40	Add a Rules.Merge() helper function. Might be useful for #363.	2020-05-16 16:04:21 +10:00
satotake	34d9c7143b	Add new TokeniseOption EnsureLF (#336 ) * Add new TokeniseOption EnsureLF ref #329 * Use efficient process suggested by @chmike	2020-03-04 18:56:47 +11:00
Alec Thomas	28dcb8565c	Fixes #305 .	2019-11-24 12:49:34 +11:00
Alec Thomas	bbc59ac372	Emit error tokens when there's a group mismatch. Also don't panic/recover, as we no longer use panic to report "real" errors. Fixes #295.	2019-10-24 17:03:35 +11:00
Alec Thomas	73d11b3c45	Clear background colour for TTY formatters.	2019-10-15 21:08:17 +11:00
Alec Thomas	ea14dd8660	Fixed a fundamental bug where ^ would always match. The engine was always passing a string sliced to the current position, resulting in ^ always matching. Switched to use FindRunesMatchStartingAt. Fixes #242.	2019-06-12 12:32:20 +10:00
Alec Thomas	2105c68ed2	Implemented a weird little Pygments rule that I missed. > If the RegexLexer encounters a newline that is flagged as an error > token, the stack is emptied and the lexer continues scanning in the > 'root' state. This can help producing error-tolerant highlighting for > erroneous input, e.g. when a single-line string is not closed. Fixes #246.	2019-04-22 18:22:58 +10:00
Alec Thomas	da5ac60d8c	Add golangci-lint and fix all lint issues.	2018-12-31 22:46:59 +11:00
Daniel Eloff	9c3abeae1d	Tokens by value (#187 ) This results in about a 8% improvement in speed.	2018-11-04 10:22:51 +11:00
Kenneth Shaw	95d0a9381b	Fix Dollar-Quoted Strings (postgres + cql) This commit refactors code from the markdown lexer into the chroma package, and alters the PostgreSQL and CQL lexers to make use of it. Additionally, an example markdown with the various sublexers is added.	2018-06-12 09:16:18 +07:00
Alec Thomas	f315512f5c	Add support for Go templates. These are exposed as go-text-template and go-html-template. Fixes #105.	2018-03-18 21:57:34 +11:00
Alec Thomas	3020e2ea8c	Fix bug with nested newlines. Fixes #124. Also reinstitute lexer tests that disappeared during package split.	2018-03-03 10:16:21 +11:00
Alec Thomas	35126f9a94	Implement rudimentary JSX lexer based on https://github.com/fcurella/jsx-lexer/blob/master/jsx/lexer.py Fixes #111.	2018-02-07 22:11:40 +11:00
Alec Thomas	ce3d6bf527	Invert default "ensure newline" behaviour so that it is opt-in. See #47.	2017-09-30 14:41:05 +10:00
Alec Thomas	573c1d157d	Ensure a newline exists at the end of files. Fixes #42.	2017-09-29 21:59:52 +10:00
Alec Thomas	d5083b3f7c	Big changes to the style and colour APIs. - Styles now use a builder system, to enforce immutability of styles. - Corrected and cleaned up how style inheritance works. - Added a brightening function to colours - HTML formatter will now automatically pick line and highlight colours if they are not provided in the style. This is done by slightly darkening or lightening. Fixes #21.	2017-09-23 22:09:46 +10:00
Alec Thomas	9d7539a4cd	Fix bug in Turtle lexer.	2017-09-22 23:27:40 +10:00
Alec Thomas	a5a3b67010	Reprocess all rules after a LexerMutator is applied.	2017-09-22 23:14:32 +10:00
Alec Thomas	2ce2ec7f65	Fix bug with empty states.	2017-09-22 22:40:00 +10:00
Alec Thomas	0bb853fb4f	Convert Include to a LexerMutator. Fixes #18.	2017-09-22 22:29:17 +10:00
Alec Thomas	1724aab879	Implement compile-time lexer mutators. This should fix #15.	2017-09-21 20:02:53 +10:00
Alec Thomas	60797cc03f	Add tracing + better error recovery.	2017-09-21 17:52:28 +10:00
Alec Thomas	e2d6abaa64	Document and add iterator panic recovery.	2017-09-20 23:06:23 +10:00
Alec Thomas	cc0e4a59ab	Switch to an Iterator interface. This is to solve an issue where writers returned by the Formatter were often stateful, but this fact was not obvious to the API consumer, and failed in interesting ways.	2017-09-20 22:19:36 +10:00
Alec Thomas	36ead7258a	Use utf8.RuneCountInString() rather than len() :( Fixes #10. Thanks @curio77.	2017-09-20 20:36:25 +10:00
Alec Thomas	44b23f97b4	Split Regexp lexer into its own file.	2017-09-20 20:19:33 +10:00

43 Commits