chroma

go/chroma

mirror of https://github.com/alecthomas/chroma.git synced 2025-01-12 01:22:30 +02:00

Author	SHA1	Message	Date
Alec Thomas	40542a6255	refactor: migrate a bunch more Go-based lexers to XML Also rename some existing XML lexers to their canonical XML name.	2023-09-09 12:29:23 +10:00
Carlos Henrique Guardão Gandarez	708662a581	feat: improve regex analysers in XML (#831 )	2023-08-23 11:51:13 +10:00
Alec Thomas	a20cd7e8df	feat: support basic regex analysers in XML (#828 ) The `<analyse>` element contains a regex to match against the input, and a score if the pattern matches. The scores of all matching patterns for a lexer are summed. Replaces #815, #813 and #826.	2023-08-22 05:32:23 +10:00
Alec Thomas	cc2dd5b8ad	Version 2 of Chroma This cleans up the API in general, removing a bunch of deprecated stuff, cleaning up circular imports, etc. But the biggest change is switching to an optional XML format for the regex lexer. Having lexers defined only in Go is not ideal for a couple of reasons. Firstly, it impedes a significant portion of contributors who use Chroma in Hugo, but don't know Go. Secondly, it bloats the binary size of any project that imports Chroma. Why XML? YAML is an abomination and JSON is not human editable. XML also compresses very well (eg. Go template lexer XML compresses from 3239 bytes to 718). Why a new syntax format? All major existing formats rely on the Oniguruma regex engine, which is extremely complex and for which there is no Go port. Why not earlier? Prior to the existence of fs.FS this was not a viable option. Benchmarks: $ hyperfine --warmup 3 \ './chroma.master --version' \ './chroma.xml-pre-opt --version' \ './chroma.xml --version' Benchmark 1: ./chroma.master --version Time (mean ± σ): 5.3 ms ± 0.5 ms [User: 3.6 ms, System: 1.4 ms] Range (min … max): 4.2 ms … 6.6 ms 233 runs Benchmark 2: ./chroma.xml-pre-opt --version Time (mean ± σ): 50.6 ms ± 0.5 ms [User: 52.4 ms, System: 3.6 ms] Range (min … max): 49.2 ms … 51.5 ms 51 runs Benchmark 3: ./chroma.xml --version Time (mean ± σ): 6.9 ms ± 1.1 ms [User: 5.1 ms, System: 1.5 ms] Range (min … max): 5.7 ms … 19.9 ms 196 runs Summary './chroma.master --version' ran 1.30 ± 0.23 times faster than './chroma.xml --version' 9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version' A slight increase in init time, but I think this is okay given the increase in flexibility. And binary size difference: $ du -h lexers.test* $ du -sh chroma* 951371ms 8.8M chroma.master 7.8M chroma.xml 7.8M chroma.xml-pre-opt Benchmarks: $ hyperfine --warmup 3 \ './chroma.master --version' \ './chroma.xml-pre-opt --version' \ './chroma.xml --version' Benchmark 1: ./chroma.master --version Time (mean ± σ): 5.3 ms ± 0.5 ms [User: 3.6 ms, System: 1.4 ms] Range (min … max): 4.2 ms … 6.6 ms 233 runs Benchmark 2: ./chroma.xml-pre-opt --version Time (mean ± σ): 50.6 ms ± 0.5 ms [User: 52.4 ms, System: 3.6 ms] Range (min … max): 49.2 ms … 51.5 ms 51 runs Benchmark 3: ./chroma.xml --version Time (mean ± σ): 6.9 ms ± 1.1 ms [User: 5.1 ms, System: 1.5 ms] Range (min … max): 5.7 ms … 19.9 ms 196 runs Summary './chroma.master --version' ran 1.30 ± 0.23 times faster than './chroma.xml --version' 9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version' Incompatible changes: - (RegexLexer).SetAnalyser: changed from func(func(text string) float32) RegexLexer to func(func(text string) float32) Lexer - (TokenType).UnmarshalJSON: removed - Lexer.AnalyseText: added - Lexer.SetAnalyser: added - Lexer.SetRegistry: added - MustNewLazyLexer: removed - MustNewLexer: changed from func(Config, Rules) RegexLexer to func(Config, func() Rules) RegexLexer - Mutators: changed from func(...Mutator) MutatorFunc to func(...Mutator) Mutator - NewLazyLexer: removed - NewLexer: changed from func(Config, Rules) (RegexLexer, error) to func(Config, func() Rules) (*RegexLexer, error) - Pop: changed from func(int) MutatorFunc to func(int) Mutator - Push: changed from func(...string) MutatorFunc to func(...string) Mutator - TokenType.MarshalJSON: removed - Using: changed from func(Lexer) Emitter to func(string) Emitter - UsingByGroup: changed from func(func(string) Lexer, int, int, ...Emitter) Emitter to func(int, int, ...Emitter) Emitter	2022-01-27 15:22:00 +11:00
Stephen Afam-Osemene	d6bdd14670	sort lexers with lower case	2021-05-02 17:23:18 +10:00
satotake	34d9c7143b	Add new TokeniseOption EnsureLF (#336 ) * Add new TokeniseOption EnsureLF ref #329 * Use efficient process suggested by @chmike	2020-03-04 18:56:47 +11:00
Alec Thomas	da5ac60d8c	Add golangci-lint and fix all lint issues.	2018-12-31 22:46:59 +11:00
Daniel Eloff	9c3abeae1d	Tokens by value (#187 ) This results in about a 8% improvement in speed.	2018-11-04 10:22:51 +11:00
Alec Thomas	15a009f0fc	Add DelegatingLexer.	2018-03-17 13:44:03 +11:00
Alec Thomas	3020e2ea8c	Fix bug with nested newlines. Fixes #124. Also reinstitute lexer tests that disappeared during package split.	2018-03-03 10:16:21 +11:00
Alec Thomas	e56590a815	Add data-driven test framework for lexers. See #68.	2018-01-02 14:53:25 +11:00
Alec Thomas	93868c5a99	Add Objective-C and support lexer priorities. Fixes #66.	2017-10-23 11:21:37 +11:00
Alec Thomas	ce3d6bf527	Invert default "ensure newline" behaviour so that it is opt-in. See #47.	2017-09-30 14:41:05 +10:00
Alec Thomas	573c1d157d	Ensure a newline exists at the end of files. Fixes #42.	2017-09-29 21:59:52 +10:00
Alec Thomas	cc0e4a59ab	Switch to an Iterator interface. This is to solve an issue where writers returned by the Formatter were often stateful, but this fact was not obvious to the API consumer, and failed in interesting ways.	2017-09-20 22:19:36 +10:00
Alec Thomas	44b23f97b4	Split Regexp lexer into its own file.	2017-09-20 20:19:33 +10:00
Alec Thomas	3f230ec717	Add support for line numbers.	2017-09-20 13:33:44 +10:00
Alec Thomas	a72960340e	Add test to pre-compile all regexes.	2017-09-19 14:15:33 +10:00
Alec Thomas	87183b3633	Add HTML formatter option for setting the tab width.	2017-09-19 13:14:29 +10:00
Alec Thomas	631fc87d6e	Fix lua lexer, and actually check error value from compiling regexes :(	2017-09-19 12:05:53 +10:00
Alec Thomas	3df4c80190	Rename S -> R + sort list of lexers.	2017-09-19 10:47:22 +10:00
Alec Thomas	431e913333	Update documentation. Include "quick" package.	2017-09-18 13:15:07 +10:00
Alec Thomas	a10fd0a23d	Switch to github.com/dlclark/regexp2. This makes translating Pygments lexers much much simpler (and possible).	2017-09-18 11:16:44 +10:00
Alec Thomas	86bda70acd	Switch to github.com/dlclark/regexp2	2017-09-15 22:18:20 +10:00
Alec Thomas	d12529ae61	HTML formatter + import all Pygments styles.	2017-07-20 00:01:29 -07:00
Alec Thomas	7ae55eb265	Wire up content sniffing.	2017-06-07 19:47:59 +10:00
Alec Thomas	5749aebe42	Generalise and support 8, 256 and 16m colour terminals.	2017-06-07 19:47:59 +10:00
Alec Thomas	1f47bd705c	Use pointers to tokens + support regex flags in importer.	2017-06-07 19:47:59 +10:00
Alec Thomas	c64e5829b5	Add JavaScript.	2017-06-07 19:47:59 +10:00
Alec Thomas	5dedc6e45b	Add a bunch of automatically translated lexers.	2017-06-07 19:47:59 +10:00
Alec Thomas	b30de35ff1	Use a callback to emit tokens. This is a) faster and b) supports streaming output.	2017-06-07 19:47:59 +10:00
Alec Thomas	6dd81b044b	Add Markdown processor. A bunch of performance improvements.	2017-06-07 19:47:59 +10:00
Alec Thomas	b2fb8edf77	Initial commit! Working!	2017-06-07 19:47:59 +10:00

33 Commits