Version 2 of Chroma
This cleans up the API in general, removing a bunch of deprecated stuff,
cleaning up circular imports, etc.
But the biggest change is switching to an optional XML format for the
regex lexer.
Having lexers defined only in Go is not ideal for a couple of reasons.
Firstly, it impedes a significant portion of contributors who use Chroma
in Hugo, but don't know Go. Secondly, it bloats the binary size of any
project that imports Chroma.
Why XML? YAML is an abomination and JSON is not human editable. XML
also compresses very well (eg. Go template lexer XML compresses from
3239 bytes to 718).
Why a new syntax format? All major existing formats rely on the
Oniguruma regex engine, which is extremely complex and for which there
is no Go port.
Why not earlier? Prior to the existence of fs.FS this was not a viable
option.
Benchmarks:
$ hyperfine --warmup 3 \
'./chroma.master --version' \
'./chroma.xml-pre-opt --version' \
'./chroma.xml --version'
Benchmark 1: ./chroma.master --version
Time (mean ± σ): 5.3 ms ± 0.5 ms [User: 3.6 ms, System: 1.4 ms]
Range (min … max): 4.2 ms … 6.6 ms 233 runs
Benchmark 2: ./chroma.xml-pre-opt --version
Time (mean ± σ): 50.6 ms ± 0.5 ms [User: 52.4 ms, System: 3.6 ms]
Range (min … max): 49.2 ms … 51.5 ms 51 runs
Benchmark 3: ./chroma.xml --version
Time (mean ± σ): 6.9 ms ± 1.1 ms [User: 5.1 ms, System: 1.5 ms]
Range (min … max): 5.7 ms … 19.9 ms 196 runs
Summary
'./chroma.master --version' ran
1.30 ± 0.23 times faster than './chroma.xml --version'
9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version'
A slight increase in init time, but I think this is okay given the
increase in flexibility.
And binary size difference:
$ du -h lexers.test*
$ du -sh chroma* 951371ms
8.8M chroma.master
7.8M chroma.xml
7.8M chroma.xml-pre-opt
Benchmarks:
$ hyperfine --warmup 3 \
'./chroma.master --version' \
'./chroma.xml-pre-opt --version' \
'./chroma.xml --version'
Benchmark 1: ./chroma.master --version
Time (mean ± σ): 5.3 ms ± 0.5 ms [User: 3.6 ms, System: 1.4 ms]
Range (min … max): 4.2 ms … 6.6 ms 233 runs
Benchmark 2: ./chroma.xml-pre-opt --version
Time (mean ± σ): 50.6 ms ± 0.5 ms [User: 52.4 ms, System: 3.6 ms]
Range (min … max): 49.2 ms … 51.5 ms 51 runs
Benchmark 3: ./chroma.xml --version
Time (mean ± σ): 6.9 ms ± 1.1 ms [User: 5.1 ms, System: 1.5 ms]
Range (min … max): 5.7 ms … 19.9 ms 196 runs
Summary
'./chroma.master --version' ran
1.30 ± 0.23 times faster than './chroma.xml --version'
9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version'
Incompatible changes:
- (*RegexLexer).SetAnalyser: changed from func(func(text string) float32) *RegexLexer to func(func(text string) float32) Lexer
- (*TokenType).UnmarshalJSON: removed
- Lexer.AnalyseText: added
- Lexer.SetAnalyser: added
- Lexer.SetRegistry: added
- MustNewLazyLexer: removed
- MustNewLexer: changed from func(*Config, Rules) *RegexLexer to func(*Config, func() Rules) *RegexLexer
- Mutators: changed from func(...Mutator) MutatorFunc to func(...Mutator) Mutator
- NewLazyLexer: removed
- NewLexer: changed from func(*Config, Rules) (*RegexLexer, error) to func(*Config, func() Rules) (*RegexLexer, error)
- Pop: changed from func(int) MutatorFunc to func(int) Mutator
- Push: changed from func(...string) MutatorFunc to func(...string) Mutator
- TokenType.MarshalJSON: removed
- Using: changed from func(Lexer) Emitter to func(string) Emitter
- UsingByGroup: changed from func(func(string) Lexer, int, int, ...Emitter) Emitter to func(int, int, ...Emitter) Emitter
2022-01-03 23:51:17 +11:00
|
|
|
package lexers
|
2021-04-28 03:25:14 +04:30
|
|
|
|
|
|
|
import (
|
|
|
|
"regexp"
|
|
|
|
"strings"
|
|
|
|
"unicode/utf8"
|
|
|
|
|
|
|
|
"github.com/dlclark/regexp2"
|
Version 2 of Chroma
This cleans up the API in general, removing a bunch of deprecated stuff,
cleaning up circular imports, etc.
But the biggest change is switching to an optional XML format for the
regex lexer.
Having lexers defined only in Go is not ideal for a couple of reasons.
Firstly, it impedes a significant portion of contributors who use Chroma
in Hugo, but don't know Go. Secondly, it bloats the binary size of any
project that imports Chroma.
Why XML? YAML is an abomination and JSON is not human editable. XML
also compresses very well (eg. Go template lexer XML compresses from
3239 bytes to 718).
Why a new syntax format? All major existing formats rely on the
Oniguruma regex engine, which is extremely complex and for which there
is no Go port.
Why not earlier? Prior to the existence of fs.FS this was not a viable
option.
Benchmarks:
$ hyperfine --warmup 3 \
'./chroma.master --version' \
'./chroma.xml-pre-opt --version' \
'./chroma.xml --version'
Benchmark 1: ./chroma.master --version
Time (mean ± σ): 5.3 ms ± 0.5 ms [User: 3.6 ms, System: 1.4 ms]
Range (min … max): 4.2 ms … 6.6 ms 233 runs
Benchmark 2: ./chroma.xml-pre-opt --version
Time (mean ± σ): 50.6 ms ± 0.5 ms [User: 52.4 ms, System: 3.6 ms]
Range (min … max): 49.2 ms … 51.5 ms 51 runs
Benchmark 3: ./chroma.xml --version
Time (mean ± σ): 6.9 ms ± 1.1 ms [User: 5.1 ms, System: 1.5 ms]
Range (min … max): 5.7 ms … 19.9 ms 196 runs
Summary
'./chroma.master --version' ran
1.30 ± 0.23 times faster than './chroma.xml --version'
9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version'
A slight increase in init time, but I think this is okay given the
increase in flexibility.
And binary size difference:
$ du -h lexers.test*
$ du -sh chroma* 951371ms
8.8M chroma.master
7.8M chroma.xml
7.8M chroma.xml-pre-opt
Benchmarks:
$ hyperfine --warmup 3 \
'./chroma.master --version' \
'./chroma.xml-pre-opt --version' \
'./chroma.xml --version'
Benchmark 1: ./chroma.master --version
Time (mean ± σ): 5.3 ms ± 0.5 ms [User: 3.6 ms, System: 1.4 ms]
Range (min … max): 4.2 ms … 6.6 ms 233 runs
Benchmark 2: ./chroma.xml-pre-opt --version
Time (mean ± σ): 50.6 ms ± 0.5 ms [User: 52.4 ms, System: 3.6 ms]
Range (min … max): 49.2 ms … 51.5 ms 51 runs
Benchmark 3: ./chroma.xml --version
Time (mean ± σ): 6.9 ms ± 1.1 ms [User: 5.1 ms, System: 1.5 ms]
Range (min … max): 5.7 ms … 19.9 ms 196 runs
Summary
'./chroma.master --version' ran
1.30 ± 0.23 times faster than './chroma.xml --version'
9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version'
Incompatible changes:
- (*RegexLexer).SetAnalyser: changed from func(func(text string) float32) *RegexLexer to func(func(text string) float32) Lexer
- (*TokenType).UnmarshalJSON: removed
- Lexer.AnalyseText: added
- Lexer.SetAnalyser: added
- Lexer.SetRegistry: added
- MustNewLazyLexer: removed
- MustNewLexer: changed from func(*Config, Rules) *RegexLexer to func(*Config, func() Rules) *RegexLexer
- Mutators: changed from func(...Mutator) MutatorFunc to func(...Mutator) Mutator
- NewLazyLexer: removed
- NewLexer: changed from func(*Config, Rules) (*RegexLexer, error) to func(*Config, func() Rules) (*RegexLexer, error)
- Pop: changed from func(int) MutatorFunc to func(int) Mutator
- Push: changed from func(...string) MutatorFunc to func(...string) Mutator
- TokenType.MarshalJSON: removed
- Using: changed from func(Lexer) Emitter to func(string) Emitter
- UsingByGroup: changed from func(func(string) Lexer, int, int, ...Emitter) Emitter to func(int, int, ...Emitter) Emitter
2022-01-03 23:51:17 +11:00
|
|
|
|
|
|
|
. "github.com/alecthomas/chroma/v2" // nolint
|
2021-04-28 03:25:14 +04:30
|
|
|
)
|
|
|
|
|
|
|
|
// Raku lexer.
|
Version 2 of Chroma
This cleans up the API in general, removing a bunch of deprecated stuff,
cleaning up circular imports, etc.
But the biggest change is switching to an optional XML format for the
regex lexer.
Having lexers defined only in Go is not ideal for a couple of reasons.
Firstly, it impedes a significant portion of contributors who use Chroma
in Hugo, but don't know Go. Secondly, it bloats the binary size of any
project that imports Chroma.
Why XML? YAML is an abomination and JSON is not human editable. XML
also compresses very well (eg. Go template lexer XML compresses from
3239 bytes to 718).
Why a new syntax format? All major existing formats rely on the
Oniguruma regex engine, which is extremely complex and for which there
is no Go port.
Why not earlier? Prior to the existence of fs.FS this was not a viable
option.
Benchmarks:
$ hyperfine --warmup 3 \
'./chroma.master --version' \
'./chroma.xml-pre-opt --version' \
'./chroma.xml --version'
Benchmark 1: ./chroma.master --version
Time (mean ± σ): 5.3 ms ± 0.5 ms [User: 3.6 ms, System: 1.4 ms]
Range (min … max): 4.2 ms … 6.6 ms 233 runs
Benchmark 2: ./chroma.xml-pre-opt --version
Time (mean ± σ): 50.6 ms ± 0.5 ms [User: 52.4 ms, System: 3.6 ms]
Range (min … max): 49.2 ms … 51.5 ms 51 runs
Benchmark 3: ./chroma.xml --version
Time (mean ± σ): 6.9 ms ± 1.1 ms [User: 5.1 ms, System: 1.5 ms]
Range (min … max): 5.7 ms … 19.9 ms 196 runs
Summary
'./chroma.master --version' ran
1.30 ± 0.23 times faster than './chroma.xml --version'
9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version'
A slight increase in init time, but I think this is okay given the
increase in flexibility.
And binary size difference:
$ du -h lexers.test*
$ du -sh chroma* 951371ms
8.8M chroma.master
7.8M chroma.xml
7.8M chroma.xml-pre-opt
Benchmarks:
$ hyperfine --warmup 3 \
'./chroma.master --version' \
'./chroma.xml-pre-opt --version' \
'./chroma.xml --version'
Benchmark 1: ./chroma.master --version
Time (mean ± σ): 5.3 ms ± 0.5 ms [User: 3.6 ms, System: 1.4 ms]
Range (min … max): 4.2 ms … 6.6 ms 233 runs
Benchmark 2: ./chroma.xml-pre-opt --version
Time (mean ± σ): 50.6 ms ± 0.5 ms [User: 52.4 ms, System: 3.6 ms]
Range (min … max): 49.2 ms … 51.5 ms 51 runs
Benchmark 3: ./chroma.xml --version
Time (mean ± σ): 6.9 ms ± 1.1 ms [User: 5.1 ms, System: 1.5 ms]
Range (min … max): 5.7 ms … 19.9 ms 196 runs
Summary
'./chroma.master --version' ran
1.30 ± 0.23 times faster than './chroma.xml --version'
9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version'
Incompatible changes:
- (*RegexLexer).SetAnalyser: changed from func(func(text string) float32) *RegexLexer to func(func(text string) float32) Lexer
- (*TokenType).UnmarshalJSON: removed
- Lexer.AnalyseText: added
- Lexer.SetAnalyser: added
- Lexer.SetRegistry: added
- MustNewLazyLexer: removed
- MustNewLexer: changed from func(*Config, Rules) *RegexLexer to func(*Config, func() Rules) *RegexLexer
- Mutators: changed from func(...Mutator) MutatorFunc to func(...Mutator) Mutator
- NewLazyLexer: removed
- NewLexer: changed from func(*Config, Rules) (*RegexLexer, error) to func(*Config, func() Rules) (*RegexLexer, error)
- Pop: changed from func(int) MutatorFunc to func(int) Mutator
- Push: changed from func(...string) MutatorFunc to func(...string) Mutator
- TokenType.MarshalJSON: removed
- Using: changed from func(Lexer) Emitter to func(string) Emitter
- UsingByGroup: changed from func(func(string) Lexer, int, int, ...Emitter) Emitter to func(int, int, ...Emitter) Emitter
2022-01-03 23:51:17 +11:00
|
|
|
var Raku Lexer = Register(MustNewLexer(
|
2021-04-28 03:25:14 +04:30
|
|
|
&Config{
|
|
|
|
Name: "Raku",
|
|
|
|
Aliases: []string{"perl6", "pl6", "raku"},
|
|
|
|
Filenames: []string{
|
|
|
|
"*.pl", "*.pm", "*.nqp", "*.p6", "*.6pl", "*.p6l", "*.pl6", "*.6pm",
|
|
|
|
"*.p6m", "*.pm6", "*.t", "*.raku", "*.rakumod", "*.rakutest", "*.rakudoc",
|
|
|
|
},
|
|
|
|
MimeTypes: []string{
|
|
|
|
"text/x-perl6", "application/x-perl6",
|
|
|
|
"text/x-raku", "application/x-raku",
|
|
|
|
},
|
|
|
|
DotAll: true,
|
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
rakuRules,
|
2021-04-28 03:25:14 +04:30
|
|
|
))
|
|
|
|
|
2021-04-28 20:54:41 -05:00
|
|
|
func rakuRules() Rules {
|
|
|
|
type RakuToken int
|
|
|
|
|
|
|
|
const (
|
|
|
|
rakuQuote RakuToken = iota
|
|
|
|
rakuNameAttribute
|
|
|
|
rakuPod
|
|
|
|
rakuPodFormatter
|
|
|
|
rakuPodDeclaration
|
|
|
|
rakuMultilineComment
|
|
|
|
rakuMatchRegex
|
|
|
|
rakuSubstitutionRegex
|
|
|
|
)
|
|
|
|
|
|
|
|
const (
|
|
|
|
colonPairOpeningBrackets = `(?:<<|<|«|\(|\[|\{)`
|
|
|
|
colonPairClosingBrackets = `(?:>>|>|»|\)|\]|\})`
|
2021-08-20 17:47:30 +04:30
|
|
|
colonPairPattern = `(?<!:)(?<colon>:)(?<key>\w[\w'-]*)(?<opening_delimiters>` + colonPairOpeningBrackets + `)`
|
|
|
|
colonPairLookahead = `(?=(:['\w-]+` +
|
|
|
|
colonPairOpeningBrackets + `.+?` + colonPairClosingBrackets + `)?`
|
2021-08-28 15:41:28 +04:30
|
|
|
namePattern = `(?:(?!` + colonPairPattern + `)(?:::|[\w':-]))+`
|
|
|
|
variablePattern = `[$@%&]+[.^:?=!~]?` + namePattern
|
|
|
|
globalVariablePattern = `[$@%&]+\*` + namePattern
|
2021-04-28 20:54:41 -05:00
|
|
|
)
|
|
|
|
|
|
|
|
keywords := []string{
|
|
|
|
`BEGIN`, `CATCH`, `CHECK`, `CLOSE`, `CONTROL`, `DOC`, `END`, `ENTER`, `FIRST`, `INIT`,
|
|
|
|
`KEEP`, `LAST`, `LEAVE`, `NEXT`, `POST`, `PRE`, `QUIT`, `UNDO`, `anon`, `augment`, `but`,
|
|
|
|
`class`, `constant`, `default`, `does`, `else`, `elsif`, `enum`, `for`, `gather`, `given`,
|
|
|
|
`grammar`, `has`, `if`, `import`, `is`, `of`, `let`, `loop`, `made`, `make`, `method`,
|
|
|
|
`module`, `multi`, `my`, `need`, `orwith`, `our`, `proceed`, `proto`, `repeat`, `require`,
|
|
|
|
`where`, `return`, `return-rw`, `returns`, `->`, `-->`, `role`, `state`, `sub`, `no`,
|
|
|
|
`submethod`, `subset`, `succeed`, `supersede`, `try`, `unit`, `unless`, `until`,
|
|
|
|
`use`, `when`, `while`, `with`, `without`, `export`, `native`, `repr`, `required`, `rw`,
|
|
|
|
`symbol`, `default`, `cached`, `DEPRECATED`, `dynamic`, `hidden-from-backtrace`, `nodal`,
|
|
|
|
`pure`, `raw`, `start`, `react`, `supply`, `whenever`, `also`, `rule`, `token`, `regex`,
|
|
|
|
`dynamic-scope`, `built`, `temp`,
|
|
|
|
}
|
|
|
|
|
2021-05-08 01:10:18 +02:00
|
|
|
keywordsPattern := Words(`(?<!['\w:-])`, `(?!['\w:-])`, keywords...)
|
2021-04-28 20:54:41 -05:00
|
|
|
|
|
|
|
wordOperators := []string{
|
|
|
|
`X`, `Z`, `R`, `after`, `and`, `andthen`, `before`, `cmp`, `div`, `eq`, `eqv`, `extra`, `ge`,
|
|
|
|
`gt`, `le`, `leg`, `lt`, `mod`, `ne`, `or`, `orelse`, `x`, `xor`, `xx`, `gcd`, `lcm`,
|
|
|
|
`but`, `min`, `max`, `^fff`, `fff^`, `fff`, `^ff`, `ff^`, `ff`, `so`, `not`, `unicmp`,
|
|
|
|
`TR`, `o`, `(&)`, `(.)`, `(|)`, `(+)`, `(-)`, `(^)`, `coll`, `(elem)`, `(==)`,
|
|
|
|
`(cont)`, `(<)`, `(<=)`, `(>)`, `(>=)`, `minmax`, `notandthen`, `S`,
|
|
|
|
}
|
|
|
|
|
2021-05-08 01:10:18 +02:00
|
|
|
wordOperatorsPattern := Words(`(?<=^|\b|\s)`, `(?=$|\b|\s)`, wordOperators...)
|
2021-04-28 20:54:41 -05:00
|
|
|
|
|
|
|
operators := []string{
|
|
|
|
`++`, `--`, `-`, `**`, `!`, `+`, `~`, `?`, `+^`, `~^`, `?^`, `^`, `*`, `/`, `%`, `%%`, `+&`,
|
|
|
|
`+<`, `+>`, `~&`, `~<`, `~>`, `?&`, `+|`, `+^`, `~|`, `~^`, `?`, `?|`, `?^`, `&`, `^`,
|
|
|
|
`<=>`, `^…^`, `^…`, `…^`, `…`, `...`, `...^`, `^...`, `^...^`, `..`, `..^`, `^..`, `^..^`,
|
|
|
|
`::=`, `:=`, `!=`, `==`, `<=`, `<`, `>=`, `>`, `~~`, `===`, `&&`, `||`, `|`, `^^`, `//`,
|
|
|
|
`??`, `!!`, `^fff^`, `^ff^`, `<==`, `==>`, `<<==`, `==>>`, `=>`, `=`, `<<`, `«`, `>>`, `»`,
|
|
|
|
`,`, `>>.`, `».`, `.&`, `.=`, `.^`, `.?`, `.+`, `.*`, `.`, `∘`, `∩`, `⊍`, `∪`, `⊎`, `∖`,
|
|
|
|
`⊖`, `≠`, `≤`, `≥`, `=:=`, `=~=`, `≅`, `∈`, `∉`, `≡`, `≢`, `∋`, `∌`, `⊂`, `⊄`, `⊆`, `⊈`,
|
|
|
|
`⊃`, `⊅`, `⊇`, `⊉`, `:`, `!!!`, `???`, `¯`, `×`, `÷`, `−`, `⁺`, `⁻`,
|
|
|
|
}
|
|
|
|
|
2021-05-08 01:10:18 +02:00
|
|
|
operatorsPattern := Words(``, ``, operators...)
|
2021-04-28 20:54:41 -05:00
|
|
|
|
|
|
|
builtinTypes := []string{
|
|
|
|
`False`, `True`, `Order`, `More`, `Less`, `Same`, `Any`, `Array`, `Associative`, `AST`,
|
|
|
|
`atomicint`, `Attribute`, `Backtrace`, `Backtrace::Frame`, `Bag`, `Baggy`, `BagHash`,
|
|
|
|
`Blob`, `Block`, `Bool`, `Buf`, `Callable`, `CallFrame`, `Cancellation`, `Capture`,
|
|
|
|
`CArray`, `Channel`, `Code`, `compiler`, `Complex`, `ComplexStr`, `CompUnit`,
|
|
|
|
`CompUnit::PrecompilationRepository`, `CompUnit::Repository`, `Empty`,
|
|
|
|
`CompUnit::Repository::FileSystem`, `CompUnit::Repository::Installation`, `Cool`,
|
|
|
|
`CurrentThreadScheduler`, `CX::Warn`, `CX::Take`, `CX::Succeed`, `CX::Return`, `CX::Redo`,
|
|
|
|
`CX::Proceed`, `CX::Next`, `CX::Last`, `CX::Emit`, `CX::Done`, `Cursor`, `Date`, `Dateish`,
|
|
|
|
`DateTime`, `Distribution`, `Distribution::Hash`, `Distribution::Locally`,
|
|
|
|
`Distribution::Path`, `Distribution::Resource`, `Distro`, `Duration`, `Encoding`,
|
Version 2 of Chroma
This cleans up the API in general, removing a bunch of deprecated stuff,
cleaning up circular imports, etc.
But the biggest change is switching to an optional XML format for the
regex lexer.
Having lexers defined only in Go is not ideal for a couple of reasons.
Firstly, it impedes a significant portion of contributors who use Chroma
in Hugo, but don't know Go. Secondly, it bloats the binary size of any
project that imports Chroma.
Why XML? YAML is an abomination and JSON is not human editable. XML
also compresses very well (eg. Go template lexer XML compresses from
3239 bytes to 718).
Why a new syntax format? All major existing formats rely on the
Oniguruma regex engine, which is extremely complex and for which there
is no Go port.
Why not earlier? Prior to the existence of fs.FS this was not a viable
option.
Benchmarks:
$ hyperfine --warmup 3 \
'./chroma.master --version' \
'./chroma.xml-pre-opt --version' \
'./chroma.xml --version'
Benchmark 1: ./chroma.master --version
Time (mean ± σ): 5.3 ms ± 0.5 ms [User: 3.6 ms, System: 1.4 ms]
Range (min … max): 4.2 ms … 6.6 ms 233 runs
Benchmark 2: ./chroma.xml-pre-opt --version
Time (mean ± σ): 50.6 ms ± 0.5 ms [User: 52.4 ms, System: 3.6 ms]
Range (min … max): 49.2 ms … 51.5 ms 51 runs
Benchmark 3: ./chroma.xml --version
Time (mean ± σ): 6.9 ms ± 1.1 ms [User: 5.1 ms, System: 1.5 ms]
Range (min … max): 5.7 ms … 19.9 ms 196 runs
Summary
'./chroma.master --version' ran
1.30 ± 0.23 times faster than './chroma.xml --version'
9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version'
A slight increase in init time, but I think this is okay given the
increase in flexibility.
And binary size difference:
$ du -h lexers.test*
$ du -sh chroma* 951371ms
8.8M chroma.master
7.8M chroma.xml
7.8M chroma.xml-pre-opt
Benchmarks:
$ hyperfine --warmup 3 \
'./chroma.master --version' \
'./chroma.xml-pre-opt --version' \
'./chroma.xml --version'
Benchmark 1: ./chroma.master --version
Time (mean ± σ): 5.3 ms ± 0.5 ms [User: 3.6 ms, System: 1.4 ms]
Range (min … max): 4.2 ms … 6.6 ms 233 runs
Benchmark 2: ./chroma.xml-pre-opt --version
Time (mean ± σ): 50.6 ms ± 0.5 ms [User: 52.4 ms, System: 3.6 ms]
Range (min … max): 49.2 ms … 51.5 ms 51 runs
Benchmark 3: ./chroma.xml --version
Time (mean ± σ): 6.9 ms ± 1.1 ms [User: 5.1 ms, System: 1.5 ms]
Range (min … max): 5.7 ms … 19.9 ms 196 runs
Summary
'./chroma.master --version' ran
1.30 ± 0.23 times faster than './chroma.xml --version'
9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version'
Incompatible changes:
- (*RegexLexer).SetAnalyser: changed from func(func(text string) float32) *RegexLexer to func(func(text string) float32) Lexer
- (*TokenType).UnmarshalJSON: removed
- Lexer.AnalyseText: added
- Lexer.SetAnalyser: added
- Lexer.SetRegistry: added
- MustNewLazyLexer: removed
- MustNewLexer: changed from func(*Config, Rules) *RegexLexer to func(*Config, func() Rules) *RegexLexer
- Mutators: changed from func(...Mutator) MutatorFunc to func(...Mutator) Mutator
- NewLazyLexer: removed
- NewLexer: changed from func(*Config, Rules) (*RegexLexer, error) to func(*Config, func() Rules) (*RegexLexer, error)
- Pop: changed from func(int) MutatorFunc to func(int) Mutator
- Push: changed from func(...string) MutatorFunc to func(...string) Mutator
- TokenType.MarshalJSON: removed
- Using: changed from func(Lexer) Emitter to func(string) Emitter
- UsingByGroup: changed from func(func(string) Lexer, int, int, ...Emitter) Emitter to func(int, int, ...Emitter) Emitter
2022-01-03 23:51:17 +11:00
|
|
|
`Encoding::GlobalLexerRegistry`, `Endian`, `Enumeration`, `Exception`, `Failure`, `FatRat`, `Grammar`,
|
2021-04-28 20:54:41 -05:00
|
|
|
`Hash`, `HyperWhatever`, `Instant`, `Int`, `int`, `int16`, `int32`, `int64`, `int8`, `str`,
|
|
|
|
`IntStr`, `IO`, `IO::ArgFiles`, `IO::CatHandle`, `IO::Handle`, `IO::Notification`,
|
|
|
|
`IO::Notification::Change`, `IO::Path`, `IO::Path::Cygwin`, `IO::Path::Parts`,
|
|
|
|
`IO::Path::QNX`, `IO::Path::Unix`, `IO::Path::Win32`, `IO::Pipe`, `IO::Socket`,
|
|
|
|
`IO::Socket::Async`, `IO::Socket::Async::ListenSocket`, `IO::Socket::INET`, `IO::Spec`,
|
|
|
|
`IO::Spec::Cygwin`, `IO::Spec::QNX`, `IO::Spec::Unix`, `IO::Spec::Win32`, `IO::Special`,
|
|
|
|
`Iterable`, `Iterator`, `Junction`, `Kernel`, `Label`, `List`, `Lock`, `Lock::Async`,
|
|
|
|
`Lock::ConditionVariable`, `long`, `longlong`, `Macro`, `Map`, `Match`,
|
|
|
|
`Metamodel::AttributeContainer`, `Metamodel::C3MRO`, `Metamodel::ClassHOW`,
|
|
|
|
`Metamodel::ConcreteRoleHOW`, `Metamodel::CurriedRoleHOW`, `Metamodel::DefiniteHOW`,
|
|
|
|
`Metamodel::Documenting`, `Metamodel::EnumHOW`, `Metamodel::Finalization`,
|
|
|
|
`Metamodel::MethodContainer`, `Metamodel::Mixins`, `Metamodel::MROBasedMethodDispatch`,
|
|
|
|
`Metamodel::MultipleInheritance`, `Metamodel::Naming`, `Metamodel::Primitives`,
|
|
|
|
`Metamodel::PrivateMethodContainer`, `Metamodel::RoleContainer`, `Metamodel::RolePunning`,
|
|
|
|
`Metamodel::Stashing`, `Metamodel::Trusting`, `Metamodel::Versioning`, `Method`, `Mix`,
|
|
|
|
`MixHash`, `Mixy`, `Mu`, `NFC`, `NFD`, `NFKC`, `NFKD`, `Nil`, `Num`, `num32`, `num64`,
|
|
|
|
`Numeric`, `NumStr`, `ObjAt`, `Order`, `Pair`, `Parameter`, `Perl`, `Pod::Block`,
|
|
|
|
`Pod::Block::Code`, `Pod::Block::Comment`, `Pod::Block::Declarator`, `Pod::Block::Named`,
|
|
|
|
`Pod::Block::Para`, `Pod::Block::Table`, `Pod::Heading`, `Pod::Item`, `Pointer`,
|
|
|
|
`Positional`, `PositionalBindFailover`, `Proc`, `Proc::Async`, `Promise`, `Proxy`,
|
|
|
|
`PseudoStash`, `QuantHash`, `RaceSeq`, `Raku`, `Range`, `Rat`, `Rational`, `RatStr`,
|
|
|
|
`Real`, `Regex`, `Routine`, `Routine::WrapHandle`, `Scalar`, `Scheduler`, `Semaphore`,
|
|
|
|
`Seq`, `Sequence`, `Set`, `SetHash`, `Setty`, `Signature`, `size_t`, `Slip`, `Stash`,
|
|
|
|
`Str`, `StrDistance`, `Stringy`, `Sub`, `Submethod`, `Supplier`, `Supplier::Preserving`,
|
|
|
|
`Supply`, `Systemic`, `Tap`, `Telemetry`, `Telemetry::Instrument::Thread`,
|
|
|
|
`Telemetry::Instrument::ThreadPool`, `Telemetry::Instrument::Usage`, `Telemetry::Period`,
|
|
|
|
`Telemetry::Sampler`, `Thread`, `Test`, `ThreadPoolScheduler`, `UInt`, `uint16`, `uint32`,
|
|
|
|
`uint64`, `uint8`, `Uni`, `utf8`, `ValueObjAt`, `Variable`, `Version`, `VM`, `Whatever`,
|
|
|
|
`WhateverCode`, `WrapHandle`, `NativeCall`,
|
|
|
|
// Pragmas
|
|
|
|
`precompilation`, `experimental`, `worries`, `MONKEY-TYPING`, `MONKEY-SEE-NO-EVAL`,
|
|
|
|
`MONKEY-GUTS`, `fatal`, `lib`, `isms`, `newline`, `nqp`, `soft`,
|
|
|
|
`strict`, `trace`, `variables`,
|
|
|
|
}
|
|
|
|
|
2021-05-08 01:10:18 +02:00
|
|
|
builtinTypesPattern := Words(`(?<!['\w:-])`, `(?::[_UD])?(?!['\w:-])`, builtinTypes...)
|
2021-04-28 20:54:41 -05:00
|
|
|
|
|
|
|
builtinRoutines := []string{
|
|
|
|
`ACCEPTS`, `abs`, `abs2rel`, `absolute`, `accept`, `accepts_type`, `accessed`, `acos`,
|
|
|
|
`acosec`, `acosech`, `acosh`, `acotan`, `acotanh`, `acquire`, `act`, `action`, `actions`,
|
|
|
|
`add`, `add_attribute`, `add_enum_value`, `add_fallback`, `add_method`, `add_parent`,
|
|
|
|
`add_private_method`, `add_role`, `add_stash`, `add_trustee`, `addendum`, `adverb`, `after`,
|
|
|
|
`all`, `allocate`, `allof`, `allowed`, `alternative-names`, `annotations`, `antipair`,
|
|
|
|
`antipairs`, `any`, `anyof`, `api`, `app_lifetime`, `append`, `arch`, `archetypes`,
|
|
|
|
`archname`, `args`, `ARGS-TO-CAPTURE`, `arity`, `Array`, `asec`, `asech`, `asin`, `asinh`,
|
|
|
|
`ASSIGN-KEY`, `ASSIGN-POS`, `assuming`, `ast`, `at`, `atan`, `atan2`, `atanh`, `AT-KEY`,
|
|
|
|
`atomic-assign`, `atomic-dec-fetch`, `atomic-fetch`, `atomic-fetch-add`, `atomic-fetch-dec`,
|
|
|
|
`atomic-fetch-inc`, `atomic-fetch-sub`, `atomic-inc-fetch`, `AT-POS`, `attributes`, `auth`,
|
|
|
|
`await`, `backend`, `backtrace`, `Bag`, `bag`, `Baggy`, `BagHash`, `bail-out`, `base`,
|
|
|
|
`basename`, `base-repeating`, `base_type`, `batch`, `BIND-KEY`, `BIND-POS`, `bind-stderr`,
|
|
|
|
`bind-stdin`, `bind-stdout`, `bind-udp`, `bits`, `bless`, `block`, `Bool`, `bool-only`,
|
|
|
|
`bounds`, `break`, `Bridge`, `broken`, `BUILD`, `TWEAK`, `build-date`, `bytes`, `cache`,
|
|
|
|
`callframe`, `calling-package`, `CALL-ME`, `callsame`, `callwith`, `can`, `cancel`,
|
|
|
|
`candidates`, `cando`, `can-ok`, `canonpath`, `caps`, `caption`, `Capture`, `capture`,
|
|
|
|
`cas`, `catdir`, `categorize`, `categorize-list`, `catfile`, `catpath`, `cause`, `ceiling`,
|
|
|
|
`cglobal`, `changed`, `Channel`, `channel`, `chars`, `chdir`, `child`, `child-name`,
|
|
|
|
`child-typename`, `chmod`, `chomp`, `chop`, `chr`, `chrs`, `chunks`, `cis`, `classify`,
|
|
|
|
`classify-list`, `cleanup`, `clone`, `close`, `closed`, `close-stdin`, `cmp-ok`, `code`,
|
|
|
|
`codename`, `codes`, `coerce_type`, `coll`, `collate`, `column`, `comb`, `combinations`,
|
|
|
|
`command`, `comment`, `compiler`, `Complex`, `compose`, `composalizer`, `compose_type`,
|
|
|
|
`compose_values`, `composer`, `compute_mro`, `condition`, `config`, `configure_destroy`,
|
|
|
|
`configure_type_checking`, `conj`, `connect`, `constraints`, `construct`, `contains`,
|
|
|
|
`content`, `contents`, `copy`, `cos`, `cosec`, `cosech`, `cosh`, `cotan`, `cotanh`, `count`,
|
|
|
|
`count-only`, `cpu-cores`, `cpu-usage`, `CREATE`, `create_type`, `cross`, `cue`, `curdir`,
|
|
|
|
`curupdir`, `d`, `Date`, `DateTime`, `day`, `daycount`, `day-of-month`, `day-of-week`,
|
|
|
|
`day-of-year`, `days-in-month`, `dd-mm-yyyy`, `declaration`, `decode`, `decoder`, `deepmap`,
|
|
|
|
`default`, `defined`, `DEFINITE`, `definite`, `delayed`, `delete`, `delete-by-compiler`,
|
|
|
|
`DELETE-KEY`, `DELETE-POS`, `denominator`, `desc`, `DESTROY`, `destroyers`, `devnull`,
|
|
|
|
`diag`, `did-you-mean`, `die`, `dies-ok`, `dir`, `dirname`, `distribution`, `dir-sep`,
|
|
|
|
`DISTROnames`, `do`, `does`, `does-ok`, `done`, `done-testing`, `duckmap`, `dynamic`, `e`,
|
|
|
|
`eager`, `earlier`, `elems`, `emit`, `enclosing`, `encode`, `encoder`, `encoding`, `end`,
|
|
|
|
`endian`, `ends-with`, `enum_from_value`, `enum_value_list`, `enum_values`, `enums`, `EOF`,
|
|
|
|
`eof`, `EVAL`, `eval-dies-ok`, `EVALFILE`, `eval-lives-ok`, `event`, `exception`,
|
|
|
|
`excludes-max`, `excludes-min`, `EXISTS-KEY`, `EXISTS-POS`, `exit`, `exitcode`, `exp`,
|
|
|
|
`expected`, `explicitly-manage`, `expmod`, `export_callback`, `extension`, `f`, `fail`,
|
|
|
|
`FALLBACK`, `fails-like`, `fc`, `feature`, `file`, `filename`, `files`, `find`,
|
|
|
|
`find_method`, `find_method_qualified`, `finish`, `first`, `flat`, `first-date-in-month`,
|
|
|
|
`flatmap`, `flip`, `floor`, `flunk`, `flush`, `flush_cache`, `fmt`, `format`, `formatter`,
|
|
|
|
`free-memory`, `freeze`, `from`, `from-list`, `from-loop`, `from-posix`, `from-slurpy`,
|
|
|
|
`full`, `full-barrier`, `GENERATE-USAGE`, `generate_mixin`, `get`, `get_value`, `getc`,
|
|
|
|
`gist`, `got`, `grab`, `grabpairs`, `grep`, `handle`, `handled`, `handles`, `hardware`,
|
|
|
|
`has_accessor`, `Hash`, `hash`, `head`, `headers`, `hh-mm-ss`, `hidden`, `hides`, `hostname`,
|
|
|
|
`hour`, `how`, `hyper`, `id`, `illegal`, `im`, `in`, `in-timezone`, `indent`, `index`,
|
|
|
|
`indices`, `indir`, `infinite`, `infix`, `postcirumfix`, `cicumfix`, `install`,
|
|
|
|
`install_method_cache`, `Instant`, `instead`, `Int`, `int-bounds`, `interval`, `in-timezone`,
|
|
|
|
`invalid-str`, `invert`, `invocant`, `IO`, `IO::Notification.watch-path`, `is_trusted`,
|
|
|
|
`is_type`, `isa`, `is-absolute`, `isa-ok`, `is-approx`, `is-deeply`, `is-hidden`,
|
|
|
|
`is-initial-thread`, `is-int`, `is-lazy`, `is-leap-year`, `isNaN`, `isnt`, `is-prime`,
|
|
|
|
`is-relative`, `is-routine`, `is-setting`, `is-win`, `item`, `iterator`, `join`, `keep`,
|
|
|
|
`kept`, `KERNELnames`, `key`, `keyof`, `keys`, `kill`, `kv`, `kxxv`, `l`, `lang`, `last`,
|
|
|
|
`lastcall`, `later`, `lazy`, `lc`, `leading`, `level`, `like`, `line`, `lines`, `link`,
|
|
|
|
`List`, `list`, `listen`, `live`, `lives-ok`, `load`, `load-repo-id`, `load-unit`, `loaded`,
|
|
|
|
`loads`, `local`, `lock`, `log`, `log10`, `lookup`, `lsb`, `made`, `MAIN`, `make`, `Map`,
|
|
|
|
`map`, `match`, `max`, `maxpairs`, `merge`, `message`, `method`, `meta`, `method_table`,
|
|
|
|
`methods`, `migrate`, `min`, `minmax`, `minpairs`, `minute`, `misplaced`, `Mix`, `mix`,
|
|
|
|
`MixHash`, `mixin`, `mixin_attribute`, `Mixy`, `mkdir`, `mode`, `modified`, `month`, `move`,
|
|
|
|
`mro`, `msb`, `multi`, `multiness`, `name`, `named`, `named_names`, `narrow`,
|
|
|
|
`nativecast`, `native-descriptor`, `nativesizeof`, `need`, `new`, `new_type`,
|
|
|
|
`new-from-daycount`, `new-from-pairs`, `next`, `nextcallee`, `next-handle`, `nextsame`,
|
|
|
|
`nextwith`, `next-interesting-index`, `NFC`, `NFD`, `NFKC`, `NFKD`, `nice`, `nl-in`,
|
|
|
|
`nl-out`, `nodemap`, `nok`, `normalize`, `none`, `norm`, `not`, `note`, `now`, `nude`,
|
|
|
|
`Num`, `numerator`, `Numeric`, `of`, `offset`, `offset-in-hours`, `offset-in-minutes`,
|
|
|
|
`ok`, `old`, `on-close`, `one`, `on-switch`, `open`, `opened`, `operation`, `optional`,
|
|
|
|
`ord`, `ords`, `orig`, `os-error`, `osname`, `out-buffer`, `pack`, `package`, `package-kind`,
|
|
|
|
`package-name`, `packages`, `Pair`, `pair`, `pairs`, `pairup`, `parameter`, `params`,
|
|
|
|
`parent`, `parent-name`, `parents`, `parse`, `parse-base`, `parsefile`, `parse-names`,
|
|
|
|
`parts`, `pass`, `path`, `path-sep`, `payload`, `peer-host`, `peer-port`, `periods`, `perl`,
|
|
|
|
`permutations`, `phaser`, `pick`, `pickpairs`, `pid`, `placeholder`, `plan`, `plus`,
|
|
|
|
`polar`, `poll`, `polymod`, `pop`, `pos`, `positional`, `posix`, `postfix`, `postmatch`,
|
|
|
|
`precomp-ext`, `precomp-target`, `precompiled`, `pred`, `prefix`, `prematch`, `prepend`,
|
|
|
|
`primary`, `print`, `printf`, `print-nl`, `print-to`, `private`, `private_method_names`,
|
|
|
|
`private_method_table`, `proc`, `produce`, `Promise`, `promise`, `prompt`, `protect`,
|
|
|
|
`protect-or-queue-on-recursion`, `publish_method_cache`, `pull-one`, `push`, `push-all`,
|
|
|
|
`push-at-least`, `push-exactly`, `push-until-lazy`, `put`, `qualifier-type`, `quaternary`,
|
|
|
|
`quit`, `r`, `race`, `radix`, `raku`, `rand`, `Range`, `range`, `Rat`, `raw`, `re`, `read`,
|
|
|
|
`read-bits`, `read-int128`, `read-int16`, `read-int32`, `read-int64`, `read-int8`,
|
|
|
|
`read-num32`, `read-num64`, `read-ubits`, `read-uint128`, `read-uint16`, `read-uint32`,
|
|
|
|
`read-uint64`, `read-uint8`, `readchars`, `readonly`, `ready`, `Real`, `reallocate`,
|
|
|
|
`reals`, `reason`, `rebless`, `receive`, `recv`, `redispatcher`, `redo`, `reduce`,
|
|
|
|
`rel2abs`, `relative`, `release`, `remove`, `rename`, `repeated`, `replacement`,
|
|
|
|
`replace-with`, `repo`, `repo-id`, `report`, `required`, `reserved`, `resolve`, `restore`,
|
|
|
|
`result`, `resume`, `rethrow`, `return`, `return-rw`, `returns`, `reverse`, `right`,
|
|
|
|
`rindex`, `rmdir`, `role`, `roles_to_compose`, `rolish`, `roll`, `rootdir`, `roots`,
|
|
|
|
`rotate`, `rotor`, `round`, `roundrobin`, `routine-type`, `run`, `RUN-MAIN`, `rw`, `rwx`,
|
|
|
|
`samecase`, `samemark`, `samewith`, `say`, `schedule-on`, `scheduler`, `scope`, `sec`,
|
|
|
|
`sech`, `second`, `secondary`, `seek`, `self`, `send`, `Seq`, `Set`, `set`, `serial`,
|
|
|
|
`set_hidden`, `set_name`, `set_package`, `set_rw`, `set_value`, `set_api`, `set_auth`,
|
|
|
|
`set_composalizer`, `set_export_callback`, `set_is_mixin`, `set_mixin_attribute`,
|
|
|
|
`set_package`, `set_ver`, `set_why`, `SetHash`, `Setty`, `set-instruments`,
|
|
|
|
`setup_finalization`, `setup_mixin_cache`, `shape`, `share`, `shell`, `short-id`,
|
|
|
|
`short-name`, `shortname`, `shift`, `sibling`, `sigil`, `sign`, `signal`, `signals`,
|
|
|
|
`signature`, `sin`, `sinh`, `sink`, `sink-all`, `skip`, `skip-at-least`,
|
|
|
|
`skip-at-least-pull-one`, `skip-one`, `skip-rest`, `sleep`, `sleep-timer`, `sleep-until`,
|
|
|
|
`Slip`, `slip`, `slurp`, `slurp-rest`, `slurpy`, `snap`, `snapper`, `so`, `socket-host`,
|
|
|
|
`socket-port`, `sort`, `source`, `source-package`, `spawn`, `SPEC`, `splice`, `split`,
|
|
|
|
`splitdir`, `splitpath`, `sprintf`, `spurt`, `sqrt`, `squish`, `srand`, `stable`, `start`,
|
|
|
|
`started`, `starts-with`, `status`, `stderr`, `stdout`, `STORE`, `store-file`,
|
|
|
|
`store-repo-id`, `store-unit`, `Str`, `Stringy`, `sub_signature`, `subbuf`, `subbuf-rw`,
|
|
|
|
`subname`, `subparse`, `subst`, `subst-mutate`, `substr`, `substr-eq`, `substr-rw`,
|
|
|
|
`subtest`, `succ`, `sum`, `suffix`, `summary`, `Supply`, `symlink`, `T`, `t`, `tail`,
|
|
|
|
`take`, `take-rw`, `tan`, `tanh`, `tap`, `target`, `target-name`, `tc`, `tclc`, `tell`,
|
|
|
|
`term`, `tertiary`, `then`, `throttle`, `throw`, `throws-like`, `time`, `timezone`,
|
|
|
|
`tmpdir`, `to`, `today`, `todo`, `toggle`, `to-posix`, `total`, `total-memory`, `trailing`,
|
|
|
|
`trans`, `tree`, `trim`, `trim-leading`, `trim-trailing`, `truncate`, `truncated-to`,
|
|
|
|
`trusts`, `try_acquire`, `trying`, `twigil`, `type`, `type_captures`, `type_check`,
|
|
|
|
`typename`, `uc`, `udp`, `uncaught_handler`, `undefine`, `unimatch`, `unicmp`, `uniname`,
|
|
|
|
`uninames`, `uninstall`, `uniparse`, `uniprop`, `uniprops`, `unique`, `unival`, `univals`,
|
|
|
|
`unlike`, `unlink`, `unlock`, `unpack`, `unpolar`, `unset`, `unshift`, `unwrap`, `updir`,
|
|
|
|
`USAGE`, `usage-name`, `use-ok`, `utc`, `val`, `value`, `values`, `VAR`, `variable`, `ver`,
|
|
|
|
`verbose-config`, `Version`, `version`, `VMnames`, `volume`, `vow`, `w`, `wait`, `warn`,
|
|
|
|
`watch`, `watch-path`, `week`, `weekday-of-month`, `week-number`, `week-year`, `WHAT`,
|
|
|
|
`what`, `when`, `WHERE`, `WHEREFORE`, `WHICH`, `WHO`, `whole-second`, `WHY`, `why`,
|
|
|
|
`with-lock-hidden-from-recursion-check`, `wordcase`, `words`, `workaround`, `wrap`,
|
|
|
|
`write`, `write-bits`, `write-int128`, `write-int16`, `write-int32`, `write-int64`,
|
|
|
|
`write-int8`, `write-num32`, `write-num64`, `write-ubits`, `write-uint128`, `write-uint16`,
|
|
|
|
`write-uint32`, `write-uint64`, `write-uint8`, `write-to`, `x`, `yada`, `year`, `yield`,
|
|
|
|
`yyyy-mm-dd`, `z`, `zip`, `zip-latest`, `HOW`, `s`, `DEPRECATED`, `trait_mod`,
|
|
|
|
}
|
|
|
|
|
2021-05-08 01:10:18 +02:00
|
|
|
builtinRoutinesPattern := Words(`(?<!['\w:-])`, `(?!['\w-])`, builtinRoutines...)
|
2021-04-28 20:54:41 -05:00
|
|
|
|
|
|
|
// A map of opening and closing brackets
|
|
|
|
brackets := map[rune]rune{
|
|
|
|
'\u0028': '\u0029', '\u003c': '\u003e', '\u005b': '\u005d',
|
|
|
|
'\u007b': '\u007d', '\u00ab': '\u00bb', '\u0f3a': '\u0f3b',
|
|
|
|
'\u0f3c': '\u0f3d', '\u169b': '\u169c', '\u2018': '\u2019',
|
|
|
|
'\u201a': '\u2019', '\u201b': '\u2019', '\u201c': '\u201d',
|
|
|
|
'\u201e': '\u201d', '\u201f': '\u201d', '\u2039': '\u203a',
|
|
|
|
'\u2045': '\u2046', '\u207d': '\u207e', '\u208d': '\u208e',
|
|
|
|
'\u2208': '\u220b', '\u2209': '\u220c', '\u220a': '\u220d',
|
|
|
|
'\u2215': '\u29f5', '\u223c': '\u223d', '\u2243': '\u22cd',
|
|
|
|
'\u2252': '\u2253', '\u2254': '\u2255', '\u2264': '\u2265',
|
|
|
|
'\u2266': '\u2267', '\u2268': '\u2269', '\u226a': '\u226b',
|
|
|
|
'\u226e': '\u226f', '\u2270': '\u2271', '\u2272': '\u2273',
|
|
|
|
'\u2274': '\u2275', '\u2276': '\u2277', '\u2278': '\u2279',
|
|
|
|
'\u227a': '\u227b', '\u227c': '\u227d', '\u227e': '\u227f',
|
|
|
|
'\u2280': '\u2281', '\u2282': '\u2283', '\u2284': '\u2285',
|
|
|
|
'\u2286': '\u2287', '\u2288': '\u2289', '\u228a': '\u228b',
|
|
|
|
'\u228f': '\u2290', '\u2291': '\u2292', '\u2298': '\u29b8',
|
|
|
|
'\u22a2': '\u22a3', '\u22a6': '\u2ade', '\u22a8': '\u2ae4',
|
|
|
|
'\u22a9': '\u2ae3', '\u22ab': '\u2ae5', '\u22b0': '\u22b1',
|
|
|
|
'\u22b2': '\u22b3', '\u22b4': '\u22b5', '\u22b6': '\u22b7',
|
|
|
|
'\u22c9': '\u22ca', '\u22cb': '\u22cc', '\u22d0': '\u22d1',
|
|
|
|
'\u22d6': '\u22d7', '\u22d8': '\u22d9', '\u22da': '\u22db',
|
|
|
|
'\u22dc': '\u22dd', '\u22de': '\u22df', '\u22e0': '\u22e1',
|
|
|
|
'\u22e2': '\u22e3', '\u22e4': '\u22e5', '\u22e6': '\u22e7',
|
|
|
|
'\u22e8': '\u22e9', '\u22ea': '\u22eb', '\u22ec': '\u22ed',
|
|
|
|
'\u22f0': '\u22f1', '\u22f2': '\u22fa', '\u22f3': '\u22fb',
|
|
|
|
'\u22f4': '\u22fc', '\u22f6': '\u22fd', '\u22f7': '\u22fe',
|
|
|
|
'\u2308': '\u2309', '\u230a': '\u230b', '\u2329': '\u232a',
|
|
|
|
'\u23b4': '\u23b5', '\u2768': '\u2769', '\u276a': '\u276b',
|
|
|
|
'\u276c': '\u276d', '\u276e': '\u276f', '\u2770': '\u2771',
|
|
|
|
'\u2772': '\u2773', '\u2774': '\u2775', '\u27c3': '\u27c4',
|
|
|
|
'\u27c5': '\u27c6', '\u27d5': '\u27d6', '\u27dd': '\u27de',
|
|
|
|
'\u27e2': '\u27e3', '\u27e4': '\u27e5', '\u27e6': '\u27e7',
|
|
|
|
'\u27e8': '\u27e9', '\u27ea': '\u27eb', '\u2983': '\u2984',
|
|
|
|
'\u2985': '\u2986', '\u2987': '\u2988', '\u2989': '\u298a',
|
|
|
|
'\u298b': '\u298c', '\u298d': '\u298e', '\u298f': '\u2990',
|
|
|
|
'\u2991': '\u2992', '\u2993': '\u2994', '\u2995': '\u2996',
|
|
|
|
'\u2997': '\u2998', '\u29c0': '\u29c1', '\u29c4': '\u29c5',
|
|
|
|
'\u29cf': '\u29d0', '\u29d1': '\u29d2', '\u29d4': '\u29d5',
|
|
|
|
'\u29d8': '\u29d9', '\u29da': '\u29db', '\u29f8': '\u29f9',
|
|
|
|
'\u29fc': '\u29fd', '\u2a2b': '\u2a2c', '\u2a2d': '\u2a2e',
|
|
|
|
'\u2a34': '\u2a35', '\u2a3c': '\u2a3d', '\u2a64': '\u2a65',
|
|
|
|
'\u2a79': '\u2a7a', '\u2a7d': '\u2a7e', '\u2a7f': '\u2a80',
|
|
|
|
'\u2a81': '\u2a82', '\u2a83': '\u2a84', '\u2a8b': '\u2a8c',
|
|
|
|
'\u2a91': '\u2a92', '\u2a93': '\u2a94', '\u2a95': '\u2a96',
|
|
|
|
'\u2a97': '\u2a98', '\u2a99': '\u2a9a', '\u2a9b': '\u2a9c',
|
|
|
|
'\u2aa1': '\u2aa2', '\u2aa6': '\u2aa7', '\u2aa8': '\u2aa9',
|
|
|
|
'\u2aaa': '\u2aab', '\u2aac': '\u2aad', '\u2aaf': '\u2ab0',
|
|
|
|
'\u2ab3': '\u2ab4', '\u2abb': '\u2abc', '\u2abd': '\u2abe',
|
|
|
|
'\u2abf': '\u2ac0', '\u2ac1': '\u2ac2', '\u2ac3': '\u2ac4',
|
|
|
|
'\u2ac5': '\u2ac6', '\u2acd': '\u2ace', '\u2acf': '\u2ad0',
|
|
|
|
'\u2ad1': '\u2ad2', '\u2ad3': '\u2ad4', '\u2ad5': '\u2ad6',
|
|
|
|
'\u2aec': '\u2aed', '\u2af7': '\u2af8', '\u2af9': '\u2afa',
|
|
|
|
'\u2e02': '\u2e03', '\u2e04': '\u2e05', '\u2e09': '\u2e0a',
|
|
|
|
'\u2e0c': '\u2e0d', '\u2e1c': '\u2e1d', '\u2e20': '\u2e21',
|
|
|
|
'\u3008': '\u3009', '\u300a': '\u300b', '\u300c': '\u300d',
|
|
|
|
'\u300e': '\u300f', '\u3010': '\u3011', '\u3014': '\u3015',
|
|
|
|
'\u3016': '\u3017', '\u3018': '\u3019', '\u301a': '\u301b',
|
|
|
|
'\u301d': '\u301e', '\ufd3e': '\ufd3f', '\ufe17': '\ufe18',
|
|
|
|
'\ufe35': '\ufe36', '\ufe37': '\ufe38', '\ufe39': '\ufe3a',
|
|
|
|
'\ufe3b': '\ufe3c', '\ufe3d': '\ufe3e', '\ufe3f': '\ufe40',
|
|
|
|
'\ufe41': '\ufe42', '\ufe43': '\ufe44', '\ufe47': '\ufe48',
|
|
|
|
'\ufe59': '\ufe5a', '\ufe5b': '\ufe5c', '\ufe5d': '\ufe5e',
|
|
|
|
'\uff08': '\uff09', '\uff1c': '\uff1e', '\uff3b': '\uff3d',
|
|
|
|
'\uff5b': '\uff5d', '\uff5f': '\uff60', '\uff62': '\uff63',
|
|
|
|
}
|
|
|
|
|
|
|
|
bracketsPattern := `[` + regexp.QuoteMeta(joinRuneMap(brackets)) + `]`
|
|
|
|
|
|
|
|
// Finds opening brackets and their closing counterparts (including pod and heredoc)
|
|
|
|
// and modifies state groups and position accordingly
|
2021-08-28 15:41:28 +04:30
|
|
|
findBrackets := func(tokenClass RakuToken) MutatorFunc {
|
2021-04-28 20:54:41 -05:00
|
|
|
return func(state *LexerState) error {
|
|
|
|
var openingChars []rune
|
|
|
|
var adverbs []rune
|
|
|
|
switch tokenClass {
|
2021-06-08 22:53:54 +04:30
|
|
|
case rakuPod:
|
|
|
|
openingChars = []rune(strings.Join(state.Groups[1:5], ``))
|
|
|
|
default:
|
|
|
|
adverbs = []rune(state.NamedGroups[`adverbs`])
|
|
|
|
openingChars = []rune(state.NamedGroups[`opening_delimiters`])
|
2021-04-28 20:54:41 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
openingChar := openingChars[0]
|
|
|
|
|
|
|
|
nChars := len(openingChars)
|
|
|
|
|
|
|
|
var closingChar rune
|
|
|
|
var closingCharExists bool
|
|
|
|
var closingChars []rune
|
|
|
|
|
|
|
|
switch tokenClass {
|
|
|
|
case rakuPod:
|
|
|
|
closingCharExists = true
|
|
|
|
default:
|
|
|
|
closingChar, closingCharExists = brackets[openingChar]
|
|
|
|
}
|
|
|
|
|
|
|
|
switch tokenClass {
|
|
|
|
case rakuPodFormatter:
|
2021-08-28 15:41:28 +04:30
|
|
|
formatter := StringOther
|
|
|
|
|
2021-06-08 22:53:54 +04:30
|
|
|
switch state.NamedGroups[`keyword`] {
|
2021-04-28 20:54:41 -05:00
|
|
|
case "B":
|
|
|
|
formatter = GenericStrong
|
|
|
|
case "I":
|
|
|
|
formatter = GenericEmph
|
|
|
|
case "U":
|
|
|
|
formatter = GenericUnderline
|
|
|
|
}
|
2021-08-28 15:41:28 +04:30
|
|
|
|
|
|
|
formatterRule := ruleReplacingConfig{
|
|
|
|
pattern: `.+?`,
|
|
|
|
tokenType: formatter,
|
|
|
|
mutator: nil,
|
|
|
|
stateName: `pod-formatter`,
|
|
|
|
rulePosition: bottomRule,
|
|
|
|
}
|
|
|
|
|
|
|
|
err := replaceRule(formatterRule)(state)
|
|
|
|
if err != nil {
|
|
|
|
panic(err)
|
|
|
|
}
|
|
|
|
|
|
|
|
err = replaceRule(ruleReplacingConfig{
|
|
|
|
delimiter: []rune{closingChar},
|
|
|
|
tokenType: Punctuation,
|
|
|
|
stateName: `pod-formatter`,
|
|
|
|
pushState: true,
|
|
|
|
numberOfDelimiterChars: nChars,
|
|
|
|
appendMutator: popRule(formatterRule),
|
|
|
|
})(state)
|
|
|
|
if err != nil {
|
|
|
|
panic(err)
|
|
|
|
}
|
2021-04-28 20:54:41 -05:00
|
|
|
|
|
|
|
return nil
|
2021-08-28 15:41:28 +04:30
|
|
|
case rakuMatchRegex:
|
2021-06-08 22:53:54 +04:30
|
|
|
var delimiter []rune
|
|
|
|
if closingCharExists {
|
|
|
|
delimiter = []rune{closingChar}
|
|
|
|
} else {
|
|
|
|
delimiter = openingChars
|
|
|
|
}
|
2021-04-28 20:54:41 -05:00
|
|
|
|
2021-08-28 15:41:28 +04:30
|
|
|
err := replaceRule(ruleReplacingConfig{
|
|
|
|
delimiter: delimiter,
|
|
|
|
tokenType: Punctuation,
|
|
|
|
stateName: `regex`,
|
|
|
|
popState: true,
|
|
|
|
pushState: true,
|
|
|
|
})(state)
|
|
|
|
if err != nil {
|
|
|
|
panic(err)
|
|
|
|
}
|
|
|
|
|
|
|
|
return nil
|
|
|
|
case rakuSubstitutionRegex:
|
|
|
|
delimiter := regexp2.Escape(string(openingChars))
|
|
|
|
|
|
|
|
err := replaceRule(ruleReplacingConfig{
|
|
|
|
pattern: `(` + delimiter + `)` + `((?:\\\\|\\/|.)*?)` + `(` + delimiter + `)`,
|
|
|
|
tokenType: ByGroups(Punctuation, UsingSelf(`qq`), Punctuation),
|
|
|
|
rulePosition: topRule,
|
|
|
|
stateName: `regex`,
|
|
|
|
popState: true,
|
|
|
|
pushState: true,
|
|
|
|
})(state)
|
|
|
|
if err != nil {
|
|
|
|
panic(err)
|
|
|
|
}
|
2021-04-28 20:54:41 -05:00
|
|
|
|
2021-06-08 22:53:54 +04:30
|
|
|
return nil
|
2021-04-28 20:54:41 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
text := state.Text
|
|
|
|
|
|
|
|
var endPos int
|
|
|
|
|
|
|
|
var nonMirroredOpeningCharPosition int
|
|
|
|
|
|
|
|
if !closingCharExists {
|
|
|
|
// it's not a mirrored character, which means we
|
|
|
|
// just need to look for the next occurrence
|
2021-08-28 15:41:28 +04:30
|
|
|
closingChars = openingChars
|
|
|
|
nonMirroredOpeningCharPosition = indexAt(text, closingChars, state.Pos)
|
2021-04-28 20:54:41 -05:00
|
|
|
endPos = nonMirroredOpeningCharPosition
|
|
|
|
} else {
|
2021-08-28 15:41:28 +04:30
|
|
|
var podRegex *regexp2.Regexp
|
|
|
|
if tokenClass == rakuPod {
|
|
|
|
podRegex = regexp2.MustCompile(
|
|
|
|
state.NamedGroups[`ws`]+`=end`+`\s+`+regexp2.Escape(state.NamedGroups[`name`]),
|
|
|
|
0,
|
|
|
|
)
|
|
|
|
} else {
|
2021-04-28 20:54:41 -05:00
|
|
|
closingChars = []rune(strings.Repeat(string(closingChar), nChars))
|
|
|
|
}
|
|
|
|
|
|
|
|
// we need to look for the corresponding closing character,
|
|
|
|
// keep nesting in mind
|
|
|
|
nestingLevel := 1
|
|
|
|
|
|
|
|
searchPos := state.Pos - nChars
|
|
|
|
|
|
|
|
var nextClosePos int
|
|
|
|
|
|
|
|
for nestingLevel > 0 {
|
2021-08-28 15:41:28 +04:30
|
|
|
if tokenClass == rakuPod {
|
|
|
|
match, err := podRegex.FindRunesMatchStartingAt(text, searchPos+nChars)
|
|
|
|
if err == nil {
|
|
|
|
closingChars = match.Runes()
|
|
|
|
nextClosePos = match.Index
|
|
|
|
} else {
|
|
|
|
nextClosePos = -1
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
nextClosePos = indexAt(text, closingChars, searchPos+nChars)
|
|
|
|
}
|
|
|
|
|
2021-04-28 20:54:41 -05:00
|
|
|
nextOpenPos := indexAt(text, openingChars, searchPos+nChars)
|
|
|
|
|
|
|
|
switch {
|
|
|
|
case nextClosePos == -1:
|
|
|
|
nextClosePos = len(text)
|
|
|
|
nestingLevel = 0
|
|
|
|
case nextOpenPos != -1 && nextOpenPos < nextClosePos:
|
|
|
|
nestingLevel++
|
|
|
|
nChars = len(openingChars)
|
|
|
|
searchPos = nextOpenPos
|
|
|
|
default: // next_close_pos < next_open_pos
|
|
|
|
nestingLevel--
|
|
|
|
nChars = len(closingChars)
|
|
|
|
searchPos = nextClosePos
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
endPos = nextClosePos
|
|
|
|
}
|
|
|
|
|
|
|
|
if endPos < 0 {
|
|
|
|
// if we didn't find a closer, just highlight the
|
|
|
|
// rest of the text in this class
|
|
|
|
endPos = len(text)
|
|
|
|
}
|
|
|
|
|
|
|
|
adverbre := regexp.MustCompile(`:to\b|:heredoc\b`)
|
|
|
|
var heredocTerminator []rune
|
2021-09-18 23:14:41 +04:30
|
|
|
var endHeredocPos int
|
2021-04-28 20:54:41 -05:00
|
|
|
if adverbre.MatchString(string(adverbs)) {
|
2021-09-18 23:14:41 +04:30
|
|
|
if endPos != len(text) {
|
|
|
|
heredocTerminator = text[state.Pos:endPos]
|
2021-04-28 20:54:41 -05:00
|
|
|
nChars = len(heredocTerminator)
|
|
|
|
} else {
|
2021-09-18 23:14:41 +04:30
|
|
|
endPos = state.Pos + 1
|
|
|
|
heredocTerminator = []rune{}
|
|
|
|
nChars = 0
|
|
|
|
}
|
|
|
|
|
|
|
|
if nChars > 0 {
|
|
|
|
endHeredocPos = indexAt(text[endPos:], heredocTerminator, 0)
|
|
|
|
if endHeredocPos > -1 {
|
|
|
|
endPos += endHeredocPos
|
|
|
|
} else {
|
|
|
|
endPos = len(text)
|
|
|
|
}
|
2021-04-28 20:54:41 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
textBetweenBrackets := string(text[state.Pos:endPos])
|
|
|
|
switch tokenClass {
|
2021-06-08 22:53:54 +04:30
|
|
|
case rakuPod, rakuPodDeclaration, rakuNameAttribute:
|
|
|
|
state.NamedGroups[`value`] = textBetweenBrackets
|
|
|
|
state.NamedGroups[`closing_delimiters`] = string(closingChars)
|
2021-04-28 20:54:41 -05:00
|
|
|
case rakuQuote:
|
|
|
|
if len(heredocTerminator) > 0 {
|
|
|
|
// Length of heredoc terminator + closing chars + `;`
|
2021-09-18 23:14:41 +04:30
|
|
|
heredocFristPunctuationLen := nChars + len(openingChars) + 1
|
2021-04-28 20:54:41 -05:00
|
|
|
|
2021-06-08 22:53:54 +04:30
|
|
|
state.NamedGroups[`opening_delimiters`] = string(openingChars) +
|
2021-04-28 20:54:41 -05:00
|
|
|
string(text[state.Pos:state.Pos+heredocFristPunctuationLen])
|
|
|
|
|
2021-06-08 22:53:54 +04:30
|
|
|
state.NamedGroups[`value`] =
|
2021-04-28 20:54:41 -05:00
|
|
|
string(text[state.Pos+heredocFristPunctuationLen : endPos])
|
|
|
|
|
2021-09-18 23:14:41 +04:30
|
|
|
if endHeredocPos > -1 {
|
|
|
|
state.NamedGroups[`closing_delimiters`] = string(heredocTerminator)
|
|
|
|
}
|
2021-04-28 20:54:41 -05:00
|
|
|
} else {
|
2021-06-08 22:53:54 +04:30
|
|
|
state.NamedGroups[`value`] = textBetweenBrackets
|
2021-09-18 23:14:41 +04:30
|
|
|
if nChars > 0 {
|
|
|
|
state.NamedGroups[`closing_delimiters`] = string(closingChars)
|
|
|
|
}
|
2021-04-28 20:54:41 -05:00
|
|
|
}
|
|
|
|
default:
|
|
|
|
state.Groups = []string{state.Groups[0] + string(text[state.Pos:endPos+nChars])}
|
|
|
|
}
|
|
|
|
|
|
|
|
state.Pos = endPos + nChars
|
|
|
|
|
|
|
|
return nil
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// Raku rules
|
2021-08-28 15:41:28 +04:30
|
|
|
// Empty capture groups are placeholders and will be replaced by mutators
|
2021-04-28 20:54:41 -05:00
|
|
|
// DO NOT REMOVE THEM!
|
|
|
|
return Rules{
|
|
|
|
"root": {
|
2021-08-28 15:41:28 +04:30
|
|
|
// Placeholder, will be overwritten by mutators, DO NOT REMOVE!
|
|
|
|
{`\A\z`, nil, nil},
|
2021-04-28 20:54:41 -05:00
|
|
|
Include("common"),
|
2021-08-28 15:41:28 +04:30
|
|
|
{`{`, Punctuation, Push(`root`)},
|
|
|
|
{`\(`, Punctuation, Push(`root`)},
|
|
|
|
{`[)}]`, Punctuation, Pop(1)},
|
|
|
|
{`;`, Punctuation, nil},
|
2021-05-02 11:59:14 +04:30
|
|
|
{`\[|\]`, Operator, nil},
|
2021-04-28 20:54:41 -05:00
|
|
|
{`.+?`, Text, nil},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"common": {
|
|
|
|
{`^#![^\n]*$`, CommentHashbang, nil},
|
|
|
|
Include("pod"),
|
|
|
|
// Multi-line, Embedded comment
|
|
|
|
{
|
2021-06-08 22:53:54 +04:30
|
|
|
"#`(?<opening_delimiters>(?<delimiter>" + bracketsPattern + `)\k<delimiter>*)`,
|
2021-04-28 20:54:41 -05:00
|
|
|
CommentMultiline,
|
2021-08-28 15:41:28 +04:30
|
|
|
findBrackets(rakuMultilineComment),
|
2021-04-28 20:54:41 -05:00
|
|
|
},
|
|
|
|
{`#[^\n]*$`, CommentSingle, nil},
|
|
|
|
// /regex/
|
|
|
|
{
|
2021-08-28 15:41:28 +04:30
|
|
|
`(?<=(?:^|\(|=|:|~~|\[|{|,|=>)\s*)(/)(?!\]|\))((?:\\\\|\\/|.)*?)((?<!(?<!\\)\\)/(?!'|"))`,
|
2021-04-28 20:54:41 -05:00
|
|
|
ByGroups(Punctuation, UsingSelf("regex"), Punctuation),
|
2021-08-28 15:41:28 +04:30
|
|
|
nil,
|
2021-04-28 20:54:41 -05:00
|
|
|
},
|
|
|
|
Include("variable"),
|
|
|
|
// ::?VARIABLE
|
|
|
|
{`::\?\w+(?::[_UD])?`, NameVariableGlobal, nil},
|
|
|
|
// Version
|
|
|
|
{
|
|
|
|
`\b(v)(\d+)((?:\.(?:\*|[\d\w]+))*)(\+)?`,
|
|
|
|
ByGroups(Keyword, NumberInteger, NameEntity, Operator),
|
|
|
|
nil,
|
|
|
|
},
|
|
|
|
Include("number"),
|
|
|
|
// Hyperoperator | »*«
|
|
|
|
{`(>>)(\S+?)(<<)`, ByGroups(Operator, UsingSelf("root"), Operator), nil},
|
|
|
|
{`(»)(\S+?)(«)`, ByGroups(Operator, UsingSelf("root"), Operator), nil},
|
2021-05-02 11:59:14 +04:30
|
|
|
// Hyperoperator | «*«
|
|
|
|
{`(<<)(\S+?)(<<)`, ByGroups(Operator, UsingSelf("root"), Operator), nil},
|
|
|
|
{`(«)(\S+?)(«)`, ByGroups(Operator, UsingSelf("root"), Operator), nil},
|
|
|
|
// Hyperoperator | »*»
|
|
|
|
{`(>>)(\S+?)(>>)`, ByGroups(Operator, UsingSelf("root"), Operator), nil},
|
|
|
|
{`(»)(\S+?)(»)`, ByGroups(Operator, UsingSelf("root"), Operator), nil},
|
2021-04-28 20:54:41 -05:00
|
|
|
// <<quoted words>>
|
2021-08-28 15:41:28 +04:30
|
|
|
{`(?<!(?:\d+|\.(?:Int|Numeric)|[$@%]\*?[\w':-]+\s+|[\])}]\s+)\s*)(<<)(?!(?:(?!>>)[^\n])+?[},;] *\n)(?!(?:(?!>>).)+?>>\S+?>>)`, Punctuation, Push("<<")},
|
2021-04-28 20:54:41 -05:00
|
|
|
// «quoted words»
|
2021-08-28 15:41:28 +04:30
|
|
|
{`(?<!(?:\d+|\.(?:Int|Numeric)|[$@%]\*?[\w':-]+\s+|[\])}]\s+)\s*)(«)(?![^»]+?[},;] *\n)(?![^»]+?»\S+?»)`, Punctuation, Push("«")},
|
2021-04-28 20:54:41 -05:00
|
|
|
// [<]
|
|
|
|
{`(?<=\[\\?)<(?=\])`, Operator, nil},
|
|
|
|
// < and > operators | something < onething > something
|
|
|
|
{
|
|
|
|
`(?<=[$@%&]?\w[\w':-]* +)(<=?)( *[^ ]+? *)(>=?)(?= *[$@%&]?\w[\w':-]*)`,
|
|
|
|
ByGroups(Operator, UsingSelf("root"), Operator),
|
|
|
|
nil,
|
|
|
|
},
|
|
|
|
// <quoted words>
|
|
|
|
{
|
2021-08-28 15:41:28 +04:30
|
|
|
`(?<!(?:\d+|\.(?:Int|Numeric)|[$@%]\*?[\w':-]+\s+|[\])}]\s+)\s*)(<)((?:(?![,;)}] *(?:#[^\n]+)?\n)[^<>])+?)(>)(?!\s*(?:\d+|\.(?:Int|Numeric)|[$@%]\*?\w[\w':-]*[^(]|\s+\[))`,
|
2021-04-28 20:54:41 -05:00
|
|
|
ByGroups(Punctuation, String, Punctuation),
|
|
|
|
nil,
|
|
|
|
},
|
|
|
|
{`C?X::['\w:-]+`, NameException, nil},
|
|
|
|
Include("metaoperator"),
|
|
|
|
// Pair | key => value
|
|
|
|
{
|
2021-08-28 15:41:28 +04:30
|
|
|
`(\w[\w'-]*)(\s*)(=>)`,
|
|
|
|
ByGroups(String, Text, Operator),
|
2021-04-28 20:54:41 -05:00
|
|
|
nil,
|
|
|
|
},
|
|
|
|
Include("colon-pair"),
|
|
|
|
// Token
|
|
|
|
{
|
2021-08-20 17:47:30 +04:30
|
|
|
`(?<=(?:^|\s)(?:regex|token|rule)(\s+))` + namePattern + colonPairLookahead + `\s*[({])`,
|
2021-04-28 20:54:41 -05:00
|
|
|
NameFunction,
|
|
|
|
Push("token", "name-adverb"),
|
|
|
|
},
|
|
|
|
// Substitution
|
|
|
|
{`(?<=^|\b|\s)(?<!\.)(ss|S|s|TR|tr)\b(\s*)`, ByGroups(Keyword, Text), Push("substitution")},
|
|
|
|
{keywordsPattern, Keyword, nil},
|
2023-03-04 09:42:50 +03:30
|
|
|
{builtinTypesPattern, KeywordType, nil},
|
2021-04-28 20:54:41 -05:00
|
|
|
{builtinRoutinesPattern, NameBuiltin, nil},
|
|
|
|
// Class name
|
|
|
|
{
|
|
|
|
`(?<=(?:^|\s)(?:class|grammar|role|does|but|is|subset|of)\s+)` + namePattern,
|
|
|
|
NameClass,
|
|
|
|
Push("name-adverb"),
|
|
|
|
},
|
|
|
|
// Routine
|
|
|
|
{
|
2021-08-20 17:47:30 +04:30
|
|
|
`(?<=(?:^|\s)(?:sub|method|multi sub|multi)\s+)!?` + namePattern + colonPairLookahead + `\s*[({])`,
|
2021-04-28 20:54:41 -05:00
|
|
|
NameFunction,
|
|
|
|
Push("name-adverb"),
|
|
|
|
},
|
|
|
|
// Constant
|
|
|
|
{`(?<=\bconstant\s+)` + namePattern, NameConstant, Push("name-adverb")},
|
|
|
|
// Namespace
|
|
|
|
{`(?<=\b(?:use|module|package)\s+)` + namePattern, NameNamespace, Push("name-adverb")},
|
|
|
|
Include("operator"),
|
|
|
|
Include("single-quote"),
|
|
|
|
{`(?<!(?<!\\)\\)"`, Punctuation, Push("double-quotes")},
|
|
|
|
// m,rx regex
|
|
|
|
{`(?<=^|\b|\s)(ms|m|rx)\b(\s*)`, ByGroups(Keyword, Text), Push("rx")},
|
|
|
|
// Quote constructs
|
|
|
|
{
|
2021-06-08 22:53:54 +04:30
|
|
|
`(?<=^|\b|\s)(?<keyword>(?:qq|q|Q))(?<adverbs>(?::?(?:heredoc|to|qq|ww|q|w|s|a|h|f|c|b|to|v|x))*)(?<ws>\s*)(?<opening_delimiters>(?<delimiter>[^0-9a-zA-Z:\s])\k<delimiter>*)`,
|
2021-04-28 20:54:41 -05:00
|
|
|
EmitterFunc(quote),
|
2021-08-28 15:41:28 +04:30
|
|
|
findBrackets(rakuQuote),
|
2021-04-28 20:54:41 -05:00
|
|
|
},
|
|
|
|
// Function
|
|
|
|
{
|
2021-08-20 17:47:30 +04:30
|
|
|
`\b` + namePattern + colonPairLookahead + `\()`,
|
2021-04-28 20:54:41 -05:00
|
|
|
NameFunction,
|
|
|
|
Push("name-adverb"),
|
|
|
|
},
|
|
|
|
// Method
|
|
|
|
{
|
2021-08-20 17:47:30 +04:30
|
|
|
`(?<!\.\.[?^*+]?)(?<=(?:\.[?^*+&]?)|self!)` + namePattern + colonPairLookahead + `\b)`,
|
2021-04-28 20:54:41 -05:00
|
|
|
NameFunction,
|
|
|
|
Push("name-adverb"),
|
|
|
|
},
|
|
|
|
// Indirect invocant
|
|
|
|
{namePattern + `(?=\s+\W?['\w:-]+:\W)`, NameFunction, Push("name-adverb")},
|
|
|
|
{`(?<=\W)(?:∅|i|e|𝑒|tau|τ|pi|π|Inf|∞)(?=\W)`, NameConstant, nil},
|
|
|
|
{`(「)([^」]*)(」)`, ByGroups(Punctuation, String, Punctuation), nil},
|
|
|
|
{`(?<=^ *)\b` + namePattern + `(?=:\s*(?:for|while|loop))`, NameLabel, nil},
|
|
|
|
// Sigilless variable
|
|
|
|
{
|
|
|
|
`(?<=\b(?:my|our|constant|let|temp)\s+)\\` + namePattern,
|
|
|
|
NameVariable,
|
|
|
|
Push("name-adverb"),
|
|
|
|
},
|
|
|
|
{namePattern, Name, Push("name-adverb")},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"rx": {
|
|
|
|
Include("colon-pair-attribute"),
|
|
|
|
{
|
2021-06-08 22:53:54 +04:30
|
|
|
`(?<opening_delimiters>(?<delimiter>[^\w:\s])\k<delimiter>*)`,
|
2021-08-28 15:41:28 +04:30
|
|
|
ByGroupNames(
|
|
|
|
map[string]Emitter{
|
|
|
|
`opening_delimiters`: Punctuation,
|
|
|
|
`delimiter`: nil,
|
|
|
|
},
|
|
|
|
),
|
|
|
|
findBrackets(rakuMatchRegex),
|
2021-04-28 20:54:41 -05:00
|
|
|
},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"substitution": {
|
|
|
|
Include("colon-pair-attribute"),
|
|
|
|
// Substitution | s{regex} = value
|
|
|
|
{
|
2021-06-08 22:53:54 +04:30
|
|
|
`(?<opening_delimiters>(?<delimiter>` + bracketsPattern + `)\k<delimiter>*)`,
|
|
|
|
ByGroupNames(map[string]Emitter{
|
|
|
|
`opening_delimiters`: Punctuation,
|
|
|
|
`delimiter`: nil,
|
|
|
|
}),
|
2021-08-28 15:41:28 +04:30
|
|
|
findBrackets(rakuMatchRegex),
|
2021-04-28 20:54:41 -05:00
|
|
|
},
|
2021-06-08 22:53:54 +04:30
|
|
|
// Substitution | s/regex/string/
|
2021-04-28 20:54:41 -05:00
|
|
|
{
|
2021-08-28 15:41:28 +04:30
|
|
|
`(?<opening_delimiters>[^\w:\s])`,
|
|
|
|
Punctuation,
|
|
|
|
findBrackets(rakuSubstitutionRegex),
|
2021-04-28 20:54:41 -05:00
|
|
|
},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"number": {
|
|
|
|
{`0_?[0-7]+(_[0-7]+)*`, LiteralNumberOct, nil},
|
|
|
|
{`0x[0-9A-Fa-f]+(_[0-9A-Fa-f]+)*`, LiteralNumberHex, nil},
|
|
|
|
{`0b[01]+(_[01]+)*`, LiteralNumberBin, nil},
|
|
|
|
{
|
|
|
|
`(?i)(\d*(_\d*)*\.\d+(_\d*)*|\d+(_\d*)*\.\d+(_\d*)*)(e[+-]?\d+)?`,
|
|
|
|
LiteralNumberFloat,
|
|
|
|
nil,
|
|
|
|
},
|
|
|
|
{`(?i)\d+(_\d*)*e[+-]?\d+(_\d*)*`, LiteralNumberFloat, nil},
|
|
|
|
{`(?<=\d+)i`, NameConstant, nil},
|
|
|
|
{`\d+(_\d+)*`, LiteralNumberInteger, nil},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"name-adverb": {
|
|
|
|
Include("colon-pair-attribute-keyvalue"),
|
|
|
|
Default(Pop(1)),
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"colon-pair": {
|
|
|
|
// :key(value)
|
2021-08-28 15:41:28 +04:30
|
|
|
{colonPairPattern, colonPair(String), findBrackets(rakuNameAttribute)},
|
2021-04-28 20:54:41 -05:00
|
|
|
// :123abc
|
|
|
|
{
|
2021-08-01 15:50:47 +04:30
|
|
|
`(:)(\d+)(\w[\w'-]*)`,
|
|
|
|
ByGroups(Punctuation, UsingSelf("number"), String),
|
2021-04-28 20:54:41 -05:00
|
|
|
nil,
|
|
|
|
},
|
2021-05-02 11:59:14 +04:30
|
|
|
// :key
|
2021-08-28 15:41:28 +04:30
|
|
|
{`(:)(!?)(\w[\w'-]*)`, ByGroups(Punctuation, Operator, String), nil},
|
2021-04-28 20:54:41 -05:00
|
|
|
{`\s+`, Text, nil},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"colon-pair-attribute": {
|
|
|
|
// :key(value)
|
2021-08-28 15:41:28 +04:30
|
|
|
{colonPairPattern, colonPair(NameAttribute), findBrackets(rakuNameAttribute)},
|
2021-04-28 20:54:41 -05:00
|
|
|
// :123abc
|
|
|
|
{
|
2021-08-01 15:50:47 +04:30
|
|
|
`(:)(\d+)(\w[\w'-]*)`,
|
|
|
|
ByGroups(Punctuation, UsingSelf("number"), NameAttribute),
|
2021-04-28 20:54:41 -05:00
|
|
|
nil,
|
|
|
|
},
|
2021-05-02 11:59:14 +04:30
|
|
|
// :key
|
2021-08-28 15:41:28 +04:30
|
|
|
{`(:)(!?)(\w[\w'-]*)`, ByGroups(Punctuation, Operator, NameAttribute), nil},
|
2021-04-28 20:54:41 -05:00
|
|
|
{`\s+`, Text, nil},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"colon-pair-attribute-keyvalue": {
|
|
|
|
// :key(value)
|
2021-08-28 15:41:28 +04:30
|
|
|
{colonPairPattern, colonPair(NameAttribute), findBrackets(rakuNameAttribute)},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"escape-qq": {
|
|
|
|
{
|
|
|
|
`(?<!(?<!\\)\\)(\\qq)(\[)(.+?)(\])`,
|
|
|
|
ByGroups(StringEscape, Punctuation, UsingSelf("qq"), Punctuation),
|
|
|
|
nil,
|
|
|
|
},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-08-28 15:41:28 +04:30
|
|
|
`escape-char`: {
|
|
|
|
{`(?<!(?<!\\)\\)(\\[abfrnrt])`, StringEscape, nil},
|
|
|
|
},
|
|
|
|
`escape-single-quote`: {
|
|
|
|
{`(?<!(?<!\\)\\)(\\)(['\\])`, ByGroups(StringEscape, StringSingle), nil},
|
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"escape-c-name": {
|
|
|
|
{
|
|
|
|
`(?<!(?<!\\)\\)(\\[c|C])(\[)(.+?)(\])`,
|
|
|
|
ByGroups(StringEscape, Punctuation, String, Punctuation),
|
|
|
|
nil,
|
|
|
|
},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"escape-hexadecimal": {
|
|
|
|
{
|
|
|
|
`(?<!(?<!\\)\\)(\\[x|X])(\[)([0-9a-fA-F]+)(\])`,
|
|
|
|
ByGroups(StringEscape, Punctuation, NumberHex, Punctuation),
|
|
|
|
nil,
|
|
|
|
},
|
|
|
|
{`(\\[x|X])([0-9a-fA-F]+)`, ByGroups(StringEscape, NumberHex), nil},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"regex": {
|
2021-08-28 15:41:28 +04:30
|
|
|
// Placeholder, will be overwritten by mutators, DO NOT REMOVE!
|
|
|
|
{`\A\z`, nil, nil},
|
2021-04-28 20:54:41 -05:00
|
|
|
Include("regex-escape-class"),
|
2021-08-28 15:41:28 +04:30
|
|
|
Include(`regex-character-escape`),
|
2021-05-02 11:59:14 +04:30
|
|
|
// $(code)
|
|
|
|
{
|
2021-08-28 15:41:28 +04:30
|
|
|
`([$@])((?<!(?<!\\)\\)\()`,
|
|
|
|
ByGroups(Keyword, Punctuation),
|
|
|
|
replaceRule(ruleReplacingConfig{
|
|
|
|
delimiter: []rune(`)`),
|
|
|
|
tokenType: Punctuation,
|
|
|
|
stateName: `root`,
|
|
|
|
pushState: true,
|
|
|
|
}),
|
2021-05-02 11:59:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
// Exclude $/ from variables, because we can't get out of the end of the slash regex: $/;
|
|
|
|
{`\$(?=/)`, NameEntity, nil},
|
|
|
|
// Exclude $ from variables
|
|
|
|
{`\$(?=\z|\s|[^<(\w*!.])`, NameEntity, nil},
|
|
|
|
Include("variable"),
|
|
|
|
Include("escape-c-name"),
|
|
|
|
Include("escape-hexadecimal"),
|
|
|
|
Include("number"),
|
|
|
|
Include("single-quote"),
|
|
|
|
// :my variable code ...
|
|
|
|
{
|
2021-08-28 15:41:28 +04:30
|
|
|
`(?<!(?<!\\)\\)(:)(my|our|state|constant|temp|let)`,
|
|
|
|
ByGroups(Operator, KeywordDeclaration),
|
|
|
|
replaceRule(ruleReplacingConfig{
|
|
|
|
delimiter: []rune(`;`),
|
|
|
|
tokenType: Punctuation,
|
|
|
|
stateName: `root`,
|
|
|
|
pushState: true,
|
|
|
|
}),
|
2021-04-28 20:54:41 -05:00
|
|
|
},
|
|
|
|
// <{code}>
|
|
|
|
{
|
2021-08-28 15:41:28 +04:30
|
|
|
`(?<!(?<!\\)\\)(<)([?!.]*)((?<!(?<!\\)\\){)`,
|
|
|
|
ByGroups(Punctuation, Operator, Punctuation),
|
|
|
|
replaceRule(ruleReplacingConfig{
|
|
|
|
delimiter: []rune(`}>`),
|
|
|
|
tokenType: Punctuation,
|
|
|
|
stateName: `root`,
|
|
|
|
pushState: true,
|
|
|
|
}),
|
2021-04-28 20:54:41 -05:00
|
|
|
},
|
|
|
|
// {code}
|
2021-08-28 15:41:28 +04:30
|
|
|
Include(`closure`),
|
2021-04-28 20:54:41 -05:00
|
|
|
// Properties
|
|
|
|
{`(:)(\w+)`, ByGroups(Punctuation, NameAttribute), nil},
|
|
|
|
// Operator
|
|
|
|
{`\|\||\||&&|&|\.\.|\*\*|%%|%|:|!|<<|«|>>|»|\+|\*\*|\*|\?|=|~|<~~>`, Operator, nil},
|
|
|
|
// Anchors
|
|
|
|
{`\^\^|\^|\$\$|\$`, NameEntity, nil},
|
|
|
|
{`\.`, NameEntity, nil},
|
|
|
|
{`#[^\n]*\n`, CommentSingle, nil},
|
|
|
|
// Lookaround
|
|
|
|
{
|
2021-08-22 14:31:20 +04:30
|
|
|
`(?<!(?<!\\)\\)(<)(\s*)([?!.]+)(\s*)(after|before)`,
|
2021-04-28 20:54:41 -05:00
|
|
|
ByGroups(Punctuation, Text, Operator, Text, OperatorWord),
|
2021-08-28 15:41:28 +04:30
|
|
|
replaceRule(ruleReplacingConfig{
|
|
|
|
delimiter: []rune(`>`),
|
|
|
|
tokenType: Punctuation,
|
|
|
|
stateName: `regex`,
|
|
|
|
pushState: true,
|
|
|
|
}),
|
2021-04-28 20:54:41 -05:00
|
|
|
},
|
|
|
|
{
|
|
|
|
`(?<!(?<!\\)\\)(<)([|!?.]*)(wb|ww|ws|w)(>)`,
|
|
|
|
ByGroups(Punctuation, Operator, OperatorWord, Punctuation),
|
|
|
|
nil,
|
|
|
|
},
|
|
|
|
// <$variable>
|
|
|
|
{
|
2021-08-28 15:41:28 +04:30
|
|
|
`(?<!(?<!\\)\\)(<)([?!.]*)([$@]\w[\w:-]*)(>)`,
|
2021-08-22 14:31:20 +04:30
|
|
|
ByGroups(Punctuation, Operator, NameVariable, Punctuation),
|
2021-04-28 20:54:41 -05:00
|
|
|
nil,
|
|
|
|
},
|
|
|
|
// Capture markers
|
2021-05-02 11:59:14 +04:30
|
|
|
{`(?<!(?<!\\)\\)<\(|\)>`, Operator, nil},
|
2021-08-28 15:41:28 +04:30
|
|
|
{
|
|
|
|
`(?<!(?<!\\)\\)(<)(\w[\w:-]*)(=\.?)`,
|
|
|
|
ByGroups(Punctuation, NameVariable, Operator),
|
|
|
|
Push(`regex-variable`),
|
|
|
|
},
|
|
|
|
{
|
|
|
|
`(?<!(?<!\\)\\)(<)([|!?.&]*)(\w(?:(?!:\s)[\w':-])*)`,
|
|
|
|
ByGroups(Punctuation, Operator, NameFunction),
|
|
|
|
Push(`regex-function`),
|
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
{`(?<!(?<!\\)\\)<`, Punctuation, Push("regex-property")},
|
|
|
|
{`(?<!(?<!\\)\\)"`, Punctuation, Push("double-quotes")},
|
2021-08-28 15:41:28 +04:30
|
|
|
{`(?<!(?<!\\)\\)(?:\]|\))`, Punctuation, Pop(1)},
|
2021-04-28 20:54:41 -05:00
|
|
|
{`(?<!(?<!\\)\\)(?:\[|\()`, Punctuation, Push("regex")},
|
|
|
|
{`.+?`, StringRegex, nil},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"regex-class-builtin": {
|
|
|
|
{
|
|
|
|
`\b(?:alnum|alpha|blank|cntrl|digit|graph|lower|print|punct|space|upper|xdigit|same|ident)\b`,
|
|
|
|
NameBuiltin,
|
|
|
|
nil,
|
|
|
|
},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-08-28 15:41:28 +04:30
|
|
|
"regex-function": {
|
|
|
|
// <function>
|
2021-04-28 20:54:41 -05:00
|
|
|
{`(?<!(?<!\\)\\)>`, Punctuation, Pop(1)},
|
2021-08-28 15:41:28 +04:30
|
|
|
// <function(parameter)>
|
2021-04-28 20:54:41 -05:00
|
|
|
{
|
2021-08-28 15:41:28 +04:30
|
|
|
`\(`,
|
|
|
|
Punctuation,
|
|
|
|
replaceRule(ruleReplacingConfig{
|
|
|
|
delimiter: []rune(`)>`),
|
|
|
|
tokenType: Punctuation,
|
|
|
|
stateName: `root`,
|
|
|
|
popState: true,
|
|
|
|
pushState: true,
|
|
|
|
}),
|
2021-04-28 20:54:41 -05:00
|
|
|
},
|
2021-08-28 15:41:28 +04:30
|
|
|
// <function value>
|
2021-04-28 20:54:41 -05:00
|
|
|
{
|
2021-08-28 15:41:28 +04:30
|
|
|
`\s+`,
|
|
|
|
StringRegex,
|
|
|
|
replaceRule(ruleReplacingConfig{
|
|
|
|
delimiter: []rune(`>`),
|
|
|
|
tokenType: Punctuation,
|
|
|
|
stateName: `regex`,
|
|
|
|
popState: true,
|
|
|
|
pushState: true,
|
|
|
|
}),
|
2021-04-28 20:54:41 -05:00
|
|
|
},
|
2021-08-28 15:41:28 +04:30
|
|
|
// <function: value>
|
|
|
|
{
|
|
|
|
`:`,
|
|
|
|
Punctuation,
|
|
|
|
replaceRule(ruleReplacingConfig{
|
|
|
|
delimiter: []rune(`>`),
|
|
|
|
tokenType: Punctuation,
|
|
|
|
stateName: `root`,
|
|
|
|
popState: true,
|
|
|
|
pushState: true,
|
|
|
|
}),
|
|
|
|
},
|
|
|
|
},
|
|
|
|
"regex-variable": {
|
|
|
|
Include(`regex-starting-operators`),
|
|
|
|
// <var=function(
|
|
|
|
{
|
|
|
|
`(&)?(\w(?:(?!:\s)[\w':-])*)(?=\()`,
|
|
|
|
ByGroups(Operator, NameFunction),
|
|
|
|
Mutators(Pop(1), Push(`regex-function`)),
|
|
|
|
},
|
|
|
|
// <var=function>
|
|
|
|
{`(&)?(\w[\w':-]*)(>)`, ByGroups(Operator, NameFunction, Punctuation), Pop(1)},
|
|
|
|
// <var=
|
|
|
|
Default(Pop(1), Push(`regex-property`)),
|
|
|
|
},
|
|
|
|
"regex-property": {
|
|
|
|
{`(?<!(?<!\\)\\)>`, Punctuation, Pop(1)},
|
|
|
|
Include("regex-class-builtin"),
|
|
|
|
Include("variable"),
|
|
|
|
Include(`regex-starting-operators`),
|
2021-04-28 20:54:41 -05:00
|
|
|
Include("colon-pair-attribute"),
|
|
|
|
{`(?<!(?<!\\)\\)\[`, Punctuation, Push("regex-character-class")},
|
|
|
|
{`\+|\-`, Operator, nil},
|
2021-08-28 15:41:28 +04:30
|
|
|
{`@[\w':-]+`, NameVariable, nil},
|
2021-04-28 20:54:41 -05:00
|
|
|
{`.+?`, StringRegex, nil},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-08-28 15:41:28 +04:30
|
|
|
`regex-starting-operators`: {
|
|
|
|
{`(?<=<)[|!?.]+`, Operator, nil},
|
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"regex-escape-class": {
|
|
|
|
{`(?i)\\n|\\t|\\h|\\v|\\s|\\d|\\w`, StringEscape, nil},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-08-28 15:41:28 +04:30
|
|
|
`regex-character-escape`: {
|
|
|
|
{`(?<!(?<!\\)\\)(\\)(.)`, ByGroups(StringEscape, StringRegex), nil},
|
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"regex-character-class": {
|
|
|
|
{`(?<!(?<!\\)\\)\]`, Punctuation, Pop(1)},
|
|
|
|
Include("regex-escape-class"),
|
|
|
|
Include("escape-c-name"),
|
|
|
|
Include("escape-hexadecimal"),
|
2021-08-28 15:41:28 +04:30
|
|
|
Include(`regex-character-escape`),
|
2021-04-28 20:54:41 -05:00
|
|
|
Include("number"),
|
|
|
|
{`\.\.`, Operator, nil},
|
|
|
|
{`.+?`, StringRegex, nil},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"metaoperator": {
|
|
|
|
// Z[=>]
|
|
|
|
{
|
|
|
|
`\b([RZX]+)\b(\[)([^\s\]]+?)(\])`,
|
|
|
|
ByGroups(OperatorWord, Punctuation, UsingSelf("root"), Punctuation),
|
|
|
|
nil,
|
|
|
|
},
|
|
|
|
// Z=>
|
|
|
|
{`\b([RZX]+)\b([^\s\]]+)`, ByGroups(OperatorWord, UsingSelf("operator")), nil},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"operator": {
|
|
|
|
// Word Operator
|
|
|
|
{wordOperatorsPattern, OperatorWord, nil},
|
|
|
|
// Operator
|
|
|
|
{operatorsPattern, Operator, nil},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"pod": {
|
|
|
|
// Single-line pod declaration
|
|
|
|
{`(#[|=])\s`, Keyword, Push("pod-single")},
|
|
|
|
// Multi-line pod declaration
|
|
|
|
{
|
2021-06-08 22:53:54 +04:30
|
|
|
"(?<keyword>#[|=])(?<opening_delimiters>(?<delimiter>" + bracketsPattern + `)\k<delimiter>*)(?<value>)(?<closing_delimiters>)`,
|
|
|
|
ByGroupNames(
|
|
|
|
map[string]Emitter{
|
|
|
|
`keyword`: Keyword,
|
|
|
|
`opening_delimiters`: Punctuation,
|
|
|
|
`delimiter`: nil,
|
2021-08-01 15:50:47 +04:30
|
|
|
`value`: UsingSelf("pod-declaration"),
|
2021-06-08 22:53:54 +04:30
|
|
|
`closing_delimiters`: Punctuation,
|
|
|
|
}),
|
2021-08-28 15:41:28 +04:30
|
|
|
findBrackets(rakuPodDeclaration),
|
2021-04-28 20:54:41 -05:00
|
|
|
},
|
|
|
|
Include("pod-blocks"),
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"pod-blocks": {
|
|
|
|
// =begin code
|
|
|
|
{
|
2021-06-08 22:53:54 +04:30
|
|
|
`(?<=^ *)(?<ws> *)(?<keyword>=begin)(?<ws2> +)(?<name>code)(?<config>[^\n]*)(?<value>.*?)(?<ws3>^\k<ws>)(?<end_keyword>=end)(?<ws4> +)\k<name>`,
|
2021-04-28 20:54:41 -05:00
|
|
|
EmitterFunc(podCode),
|
|
|
|
nil,
|
|
|
|
},
|
|
|
|
// =begin
|
|
|
|
{
|
2021-06-08 22:53:54 +04:30
|
|
|
`(?<=^ *)(?<ws> *)(?<keyword>=begin)(?<ws2> +)(?!code)(?<name>\w[\w'-]*)(?<config>[^\n]*)(?<value>)(?<closing_delimiters>)`,
|
|
|
|
ByGroupNames(
|
|
|
|
map[string]Emitter{
|
|
|
|
`ws`: Comment,
|
|
|
|
`keyword`: Keyword,
|
|
|
|
`ws2`: StringDoc,
|
|
|
|
`name`: Keyword,
|
|
|
|
`config`: EmitterFunc(podConfig),
|
|
|
|
`value`: UsingSelf("pod-begin"),
|
|
|
|
`closing_delimiters`: Keyword,
|
|
|
|
}),
|
2021-08-28 15:41:28 +04:30
|
|
|
findBrackets(rakuPod),
|
2021-04-28 20:54:41 -05:00
|
|
|
},
|
|
|
|
// =for ...
|
|
|
|
{
|
2021-06-08 22:53:54 +04:30
|
|
|
`(?<=^ *)(?<ws> *)(?<keyword>=(?:for|defn))(?<ws2> +)(?<name>\w[\w'-]*)(?<config>[^\n]*\n)`,
|
|
|
|
ByGroups(Comment, Keyword, StringDoc, Keyword, EmitterFunc(podConfig)),
|
2021-04-28 20:54:41 -05:00
|
|
|
Push("pod-paragraph"),
|
|
|
|
},
|
|
|
|
// =config
|
|
|
|
{
|
2021-06-08 22:53:54 +04:30
|
|
|
`(?<=^ *)(?<ws> *)(?<keyword>=config)(?<ws2> +)(?<name>\w[\w'-]*)(?<config>[^\n]*\n)`,
|
|
|
|
ByGroups(Comment, Keyword, StringDoc, Keyword, EmitterFunc(podConfig)),
|
2021-04-28 20:54:41 -05:00
|
|
|
nil,
|
|
|
|
},
|
|
|
|
// =alias
|
|
|
|
{
|
2021-06-08 22:53:54 +04:30
|
|
|
`(?<=^ *)(?<ws> *)(?<keyword>=alias)(?<ws2> +)(?<name>\w[\w'-]*)(?<value>[^\n]*\n)`,
|
|
|
|
ByGroups(Comment, Keyword, StringDoc, Keyword, StringDoc),
|
2021-04-28 20:54:41 -05:00
|
|
|
nil,
|
|
|
|
},
|
|
|
|
// =encoding
|
|
|
|
{
|
2021-06-08 22:53:54 +04:30
|
|
|
`(?<=^ *)(?<ws> *)(?<keyword>=encoding)(?<ws2> +)(?<name>[^\n]+)`,
|
|
|
|
ByGroups(Comment, Keyword, StringDoc, Name),
|
2021-04-28 20:54:41 -05:00
|
|
|
nil,
|
|
|
|
},
|
|
|
|
// =para ...
|
|
|
|
{
|
2021-06-08 22:53:54 +04:30
|
|
|
`(?<=^ *)(?<ws> *)(?<keyword>=(?:para|table|pod))(?<config>(?<!\n\s*)[^\n]*\n)`,
|
|
|
|
ByGroups(Comment, Keyword, EmitterFunc(podConfig)),
|
2021-04-28 20:54:41 -05:00
|
|
|
Push("pod-paragraph"),
|
|
|
|
},
|
|
|
|
// =head1 ...
|
|
|
|
{
|
2021-06-08 22:53:54 +04:30
|
|
|
`(?<=^ *)(?<ws> *)(?<keyword>=head\d+)(?<ws2> *)(?<config>#?)`,
|
|
|
|
ByGroups(Comment, Keyword, GenericHeading, Keyword),
|
2021-08-01 15:50:47 +04:30
|
|
|
Push("pod-heading"),
|
2021-04-28 20:54:41 -05:00
|
|
|
},
|
|
|
|
// =item ...
|
|
|
|
{
|
2021-06-08 22:53:54 +04:30
|
|
|
`(?<=^ *)(?<ws> *)(?<keyword>=(?:item\d*|comment|data|[A-Z]+))(?<ws2> *)(?<config>#?)`,
|
|
|
|
ByGroups(Comment, Keyword, StringDoc, Keyword),
|
2021-08-01 15:50:47 +04:30
|
|
|
Push("pod-paragraph"),
|
2021-04-28 20:54:41 -05:00
|
|
|
},
|
|
|
|
{
|
2021-06-08 22:53:54 +04:30
|
|
|
`(?<=^ *)(?<ws> *)(?<keyword>=finish)(?<config>[^\n]*)`,
|
|
|
|
ByGroups(Comment, Keyword, EmitterFunc(podConfig)),
|
2021-04-28 20:54:41 -05:00
|
|
|
Push("pod-finish"),
|
|
|
|
},
|
|
|
|
// ={custom} ...
|
|
|
|
{
|
2021-06-08 22:53:54 +04:30
|
|
|
`(?<=^ *)(?<ws> *)(?<name>=\w[\w'-]*)(?<ws2> *)(?<config>#?)`,
|
|
|
|
ByGroups(Comment, Name, StringDoc, Keyword),
|
2021-08-01 15:50:47 +04:30
|
|
|
Push("pod-paragraph"),
|
2021-04-28 20:54:41 -05:00
|
|
|
},
|
|
|
|
// = podconfig
|
|
|
|
{
|
2021-06-08 22:53:54 +04:30
|
|
|
`(?<=^ *)(?<keyword> *=)(?<ws> *)(?<config>(?::\w[\w'-]*(?:` + colonPairOpeningBrackets + `.+?` +
|
2021-04-28 20:54:41 -05:00
|
|
|
colonPairClosingBrackets + `) *)*\n)`,
|
|
|
|
ByGroups(Keyword, StringDoc, EmitterFunc(podConfig)),
|
|
|
|
nil,
|
|
|
|
},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"pod-begin": {
|
|
|
|
Include("pod-blocks"),
|
|
|
|
Include("pre-pod-formatter"),
|
|
|
|
{`.+?`, StringDoc, nil},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-08-01 15:50:47 +04:30
|
|
|
"pod-declaration": {
|
|
|
|
Include("pre-pod-formatter"),
|
|
|
|
{`.+?`, StringDoc, nil},
|
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"pod-paragraph": {
|
2021-08-01 15:50:47 +04:30
|
|
|
{`\n *\n|\n(?=^ *=)`, StringDoc, Pop(1)},
|
2021-04-28 20:54:41 -05:00
|
|
|
Include("pre-pod-formatter"),
|
|
|
|
{`.+?`, StringDoc, nil},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"pod-single": {
|
|
|
|
{`\n`, StringDoc, Pop(1)},
|
|
|
|
Include("pre-pod-formatter"),
|
|
|
|
{`.+?`, StringDoc, nil},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-08-01 15:50:47 +04:30
|
|
|
"pod-heading": {
|
|
|
|
{`\n *\n|\n(?=^ *=)`, GenericHeading, Pop(1)},
|
2021-04-28 20:54:41 -05:00
|
|
|
Include("pre-pod-formatter"),
|
|
|
|
{`.+?`, GenericHeading, nil},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"pod-finish": {
|
|
|
|
{`\z`, nil, Pop(1)},
|
|
|
|
Include("pre-pod-formatter"),
|
|
|
|
{`.+?`, StringDoc, nil},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"pre-pod-formatter": {
|
|
|
|
// C<code>, B<bold>, ...
|
|
|
|
{
|
2021-06-08 22:53:54 +04:30
|
|
|
`(?<keyword>[CBIUDTKRPAELZVMSXN])(?<opening_delimiters><+|«)`,
|
2021-04-28 20:54:41 -05:00
|
|
|
ByGroups(Keyword, Punctuation),
|
2021-08-28 15:41:28 +04:30
|
|
|
findBrackets(rakuPodFormatter),
|
2021-04-28 20:54:41 -05:00
|
|
|
},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"pod-formatter": {
|
2021-08-28 15:41:28 +04:30
|
|
|
// Placeholder rule, will be replaced by mutators. DO NOT REMOVE!
|
2021-04-28 20:54:41 -05:00
|
|
|
{`>`, Punctuation, Pop(1)},
|
|
|
|
Include("pre-pod-formatter"),
|
2021-08-28 15:41:28 +04:30
|
|
|
// Placeholder rule, will be replaced by mutators. DO NOT REMOVE!
|
2021-04-28 20:54:41 -05:00
|
|
|
{`.+?`, StringOther, nil},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"variable": {
|
|
|
|
{variablePattern, NameVariable, Push("name-adverb")},
|
|
|
|
{globalVariablePattern, NameVariableGlobal, Push("name-adverb")},
|
2021-08-22 14:40:31 +04:30
|
|
|
{`[$@]<[^>]+>`, NameVariable, nil},
|
2021-08-28 15:41:28 +04:30
|
|
|
{`\$[/!¢]`, NameVariable, nil},
|
2021-04-28 20:54:41 -05:00
|
|
|
{`[$@%]`, NameVariable, nil},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"single-quote": {
|
|
|
|
{`(?<!(?<!\\)\\)'`, Punctuation, Push("single-quote-inner")},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"single-quote-inner": {
|
|
|
|
{`(?<!(?<!(?<!\\)\\)\\)'`, Punctuation, Pop(1)},
|
2021-08-28 15:41:28 +04:30
|
|
|
Include("escape-single-quote"),
|
2021-04-28 20:54:41 -05:00
|
|
|
Include("escape-qq"),
|
|
|
|
{`(?:\\\\|\\[^\\]|[^'\\])+?`, StringSingle, nil},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"double-quotes": {
|
|
|
|
{`(?<!(?<!\\)\\)"`, Punctuation, Pop(1)},
|
|
|
|
Include("qq"),
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"<<": {
|
2021-08-28 15:41:28 +04:30
|
|
|
{`>>(?!\s*(?:\d+|\.(?:Int|Numeric)|[$@%]\*?[\w':-]+|\s+\[))`, Punctuation, Pop(1)},
|
2021-04-28 20:54:41 -05:00
|
|
|
Include("ww"),
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"«": {
|
2021-08-28 15:41:28 +04:30
|
|
|
{`»(?!\s*(?:\d+|\.(?:Int|Numeric)|[$@%]\*?[\w':-]+|\s+\[))`, Punctuation, Pop(1)},
|
2021-04-28 20:54:41 -05:00
|
|
|
Include("ww"),
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"ww": {
|
|
|
|
Include("single-quote"),
|
|
|
|
Include("qq"),
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"qq": {
|
2021-05-02 11:59:14 +04:30
|
|
|
Include("qq-variable"),
|
2021-04-28 20:54:41 -05:00
|
|
|
Include("closure"),
|
2021-08-28 15:41:28 +04:30
|
|
|
Include(`escape-char`),
|
2021-04-28 20:54:41 -05:00
|
|
|
Include("escape-hexadecimal"),
|
|
|
|
Include("escape-c-name"),
|
|
|
|
Include("escape-qq"),
|
|
|
|
{`.+?`, StringDouble, nil},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"qq-variable": {
|
2021-05-02 11:59:14 +04:30
|
|
|
{
|
2021-08-28 15:41:28 +04:30
|
|
|
`(?<!(?<!\\)\\)(?:` + variablePattern + `|` + globalVariablePattern + `)` + colonPairLookahead + `)`,
|
2021-05-02 11:59:14 +04:30
|
|
|
NameVariable,
|
|
|
|
Push("qq-variable-extras", "name-adverb"),
|
|
|
|
},
|
|
|
|
},
|
|
|
|
"qq-variable-extras": {
|
2021-04-28 20:54:41 -05:00
|
|
|
// Method
|
|
|
|
{
|
2021-08-28 15:41:28 +04:30
|
|
|
`(?<operator>\.)(?<method_name>` + namePattern + `)` + colonPairLookahead + `\()`,
|
|
|
|
ByGroupNames(map[string]Emitter{
|
|
|
|
`operator`: Operator,
|
|
|
|
`method_name`: NameFunction,
|
|
|
|
}),
|
|
|
|
Push(`name-adverb`),
|
|
|
|
},
|
|
|
|
// Function/Signature
|
|
|
|
{
|
|
|
|
`\(`, Punctuation, replaceRule(
|
|
|
|
ruleReplacingConfig{
|
|
|
|
delimiter: []rune(`)`),
|
|
|
|
tokenType: Punctuation,
|
|
|
|
stateName: `root`,
|
|
|
|
pushState: true,
|
|
|
|
}),
|
2021-04-28 20:54:41 -05:00
|
|
|
},
|
|
|
|
Default(Pop(1)),
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"Q": {
|
|
|
|
Include("escape-qq"),
|
|
|
|
{`.+?`, String, nil},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"Q-closure": {
|
|
|
|
Include("escape-qq"),
|
|
|
|
Include("closure"),
|
|
|
|
{`.+?`, String, nil},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"Q-variable": {
|
|
|
|
Include("escape-qq"),
|
2021-05-02 11:59:14 +04:30
|
|
|
Include("qq-variable"),
|
2021-04-28 20:54:41 -05:00
|
|
|
{`.+?`, String, nil},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"closure": {
|
2021-08-28 15:41:28 +04:30
|
|
|
{`(?<!(?<!\\)\\){`, Punctuation, replaceRule(
|
|
|
|
ruleReplacingConfig{
|
|
|
|
delimiter: []rune(`}`),
|
|
|
|
tokenType: Punctuation,
|
|
|
|
stateName: `root`,
|
|
|
|
pushState: true,
|
|
|
|
}),
|
|
|
|
},
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
"token": {
|
|
|
|
// Token signature
|
2021-08-28 15:41:28 +04:30
|
|
|
{`\(`, Punctuation, replaceRule(
|
|
|
|
ruleReplacingConfig{
|
|
|
|
delimiter: []rune(`)`),
|
|
|
|
tokenType: Punctuation,
|
|
|
|
stateName: `root`,
|
|
|
|
pushState: true,
|
|
|
|
}),
|
|
|
|
},
|
|
|
|
{`{`, Punctuation, replaceRule(
|
|
|
|
ruleReplacingConfig{
|
|
|
|
delimiter: []rune(`}`),
|
|
|
|
tokenType: Punctuation,
|
|
|
|
stateName: `regex`,
|
|
|
|
popState: true,
|
|
|
|
pushState: true,
|
|
|
|
}),
|
|
|
|
},
|
|
|
|
{`\s*`, Text, nil},
|
|
|
|
Default(Pop(1)),
|
2021-04-28 03:25:14 +04:30
|
|
|
},
|
2021-04-28 20:54:41 -05:00
|
|
|
}
|
2021-04-28 03:25:14 +04:30
|
|
|
}
|
|
|
|
|
2021-08-15 15:56:48 +04:30
|
|
|
// Joins keys of rune map
|
2021-04-28 03:25:14 +04:30
|
|
|
func joinRuneMap(m map[rune]rune) string {
|
2021-08-15 15:56:48 +04:30
|
|
|
runes := make([]rune, 0, len(m))
|
|
|
|
for k := range m {
|
2021-04-28 03:25:14 +04:30
|
|
|
runes = append(runes, k)
|
|
|
|
}
|
|
|
|
|
|
|
|
return string(runes)
|
|
|
|
}
|
|
|
|
|
|
|
|
// Finds the index of substring in the string starting at position n
|
|
|
|
func indexAt(str []rune, substr []rune, pos int) int {
|
2021-08-28 15:41:28 +04:30
|
|
|
strFromPos := str[pos:]
|
|
|
|
text := string(strFromPos)
|
2021-04-28 03:25:14 +04:30
|
|
|
|
|
|
|
idx := strings.Index(text, string(substr))
|
|
|
|
if idx > -1 {
|
|
|
|
idx = utf8.RuneCountInString(text[:idx])
|
2021-08-28 15:41:28 +04:30
|
|
|
|
|
|
|
// Search again if the substr is escaped with backslash
|
|
|
|
if (idx > 1 && strFromPos[idx-1] == '\\' && strFromPos[idx-2] != '\\') ||
|
|
|
|
(idx == 1 && strFromPos[idx-1] == '\\') {
|
|
|
|
idx = indexAt(str[pos:], substr, idx+1)
|
|
|
|
|
|
|
|
idx = utf8.RuneCountInString(text[:idx])
|
|
|
|
|
|
|
|
if idx < 0 {
|
|
|
|
return idx
|
|
|
|
}
|
|
|
|
}
|
2021-04-28 03:25:14 +04:30
|
|
|
idx += pos
|
|
|
|
}
|
|
|
|
|
|
|
|
return idx
|
|
|
|
}
|
|
|
|
|
|
|
|
// Tells if an array of string contains a string
|
|
|
|
func contains(s []string, e string) bool {
|
|
|
|
for _, value := range s {
|
|
|
|
if value == e {
|
|
|
|
return true
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return false
|
|
|
|
}
|
|
|
|
|
2021-08-28 15:41:28 +04:30
|
|
|
type rulePosition int
|
|
|
|
|
|
|
|
const (
|
|
|
|
topRule rulePosition = 0
|
|
|
|
bottomRule = -1
|
|
|
|
)
|
|
|
|
|
|
|
|
type ruleMakingConfig struct {
|
|
|
|
delimiter []rune
|
|
|
|
pattern string
|
|
|
|
tokenType Emitter
|
|
|
|
mutator Mutator
|
|
|
|
numberOfDelimiterChars int
|
|
|
|
}
|
|
|
|
|
|
|
|
type ruleReplacingConfig struct {
|
|
|
|
delimiter []rune
|
|
|
|
pattern string
|
|
|
|
tokenType Emitter
|
|
|
|
numberOfDelimiterChars int
|
|
|
|
mutator Mutator
|
|
|
|
appendMutator Mutator
|
|
|
|
rulePosition rulePosition
|
|
|
|
stateName string
|
|
|
|
pop bool
|
|
|
|
popState bool
|
|
|
|
pushState bool
|
2021-04-28 03:25:14 +04:30
|
|
|
}
|
|
|
|
|
2021-08-28 15:41:28 +04:30
|
|
|
// Pops rule from state-stack and replaces the rule with the previous rule
|
|
|
|
func popRule(rule ruleReplacingConfig) MutatorFunc {
|
|
|
|
return func(state *LexerState) error {
|
|
|
|
stackName := genStackName(rule.stateName, rule.rulePosition)
|
|
|
|
|
|
|
|
stack, ok := state.Get(stackName).([]ruleReplacingConfig)
|
|
|
|
|
|
|
|
if ok && len(stack) > 0 {
|
|
|
|
// Pop from stack
|
|
|
|
stack = stack[:len(stack)-1]
|
|
|
|
lastRule := stack[len(stack)-1]
|
|
|
|
lastRule.pushState = false
|
|
|
|
lastRule.popState = false
|
|
|
|
lastRule.pop = true
|
|
|
|
state.Set(stackName, stack)
|
|
|
|
|
|
|
|
// Call replaceRule to use the last rule
|
|
|
|
err := replaceRule(lastRule)(state)
|
|
|
|
if err != nil {
|
|
|
|
panic(err)
|
|
|
|
}
|
2021-04-28 03:25:14 +04:30
|
|
|
}
|
|
|
|
|
2021-08-28 15:41:28 +04:30
|
|
|
return nil
|
|
|
|
}
|
2021-04-28 03:25:14 +04:30
|
|
|
}
|
|
|
|
|
2021-08-28 15:41:28 +04:30
|
|
|
// Replaces a state's rule based on the rule config and position
|
|
|
|
func replaceRule(rule ruleReplacingConfig) MutatorFunc {
|
|
|
|
return func(state *LexerState) error {
|
|
|
|
stateName := rule.stateName
|
|
|
|
stackName := genStackName(rule.stateName, rule.rulePosition)
|
2021-04-28 03:25:14 +04:30
|
|
|
|
2021-08-28 15:41:28 +04:30
|
|
|
stack, ok := state.Get(stackName).([]ruleReplacingConfig)
|
|
|
|
if !ok {
|
|
|
|
stack = []ruleReplacingConfig{}
|
|
|
|
}
|
2021-04-28 03:25:14 +04:30
|
|
|
|
2021-08-28 15:41:28 +04:30
|
|
|
// If state-stack is empty fill it with the placeholder rule
|
|
|
|
if len(stack) == 0 {
|
|
|
|
stack = []ruleReplacingConfig{
|
|
|
|
{
|
|
|
|
// Placeholder, will be overwritten by mutators, DO NOT REMOVE!
|
|
|
|
pattern: `\A\z`,
|
|
|
|
tokenType: nil,
|
|
|
|
mutator: nil,
|
|
|
|
stateName: stateName,
|
|
|
|
rulePosition: rule.rulePosition,
|
|
|
|
},
|
|
|
|
}
|
|
|
|
state.Set(stackName, stack)
|
|
|
|
}
|
2021-04-28 03:25:14 +04:30
|
|
|
|
2021-08-28 15:41:28 +04:30
|
|
|
var mutator Mutator
|
|
|
|
mutators := []Mutator{}
|
|
|
|
|
|
|
|
switch {
|
|
|
|
case rule.rulePosition == topRule && rule.mutator == nil:
|
|
|
|
// Default mutator for top rule
|
|
|
|
mutators = []Mutator{Pop(1), popRule(rule)}
|
|
|
|
case rule.rulePosition == topRule && rule.mutator != nil:
|
|
|
|
// Default mutator for top rule, when rule.mutator is set
|
|
|
|
mutators = []Mutator{rule.mutator, popRule(rule)}
|
|
|
|
case rule.mutator != nil:
|
|
|
|
mutators = []Mutator{rule.mutator}
|
|
|
|
}
|
2021-04-28 03:25:14 +04:30
|
|
|
|
2021-08-28 15:41:28 +04:30
|
|
|
if rule.appendMutator != nil {
|
|
|
|
mutators = append(mutators, rule.appendMutator)
|
|
|
|
}
|
|
|
|
|
|
|
|
if len(mutators) > 0 {
|
|
|
|
mutator = Mutators(mutators...)
|
|
|
|
} else {
|
|
|
|
mutator = nil
|
|
|
|
}
|
|
|
|
|
|
|
|
ruleConfig := ruleMakingConfig{
|
|
|
|
pattern: rule.pattern,
|
|
|
|
delimiter: rule.delimiter,
|
|
|
|
numberOfDelimiterChars: rule.numberOfDelimiterChars,
|
|
|
|
tokenType: rule.tokenType,
|
|
|
|
mutator: mutator,
|
|
|
|
}
|
|
|
|
|
|
|
|
cRule := makeRule(ruleConfig)
|
|
|
|
|
|
|
|
switch rule.rulePosition {
|
|
|
|
case topRule:
|
|
|
|
state.Rules[stateName][0] = cRule
|
|
|
|
case bottomRule:
|
|
|
|
state.Rules[stateName][len(state.Rules[stateName])-1] = cRule
|
|
|
|
}
|
|
|
|
|
|
|
|
// Pop state name from stack if asked. State should be popped first before Pushing
|
|
|
|
if rule.popState {
|
Version 2 of Chroma
This cleans up the API in general, removing a bunch of deprecated stuff,
cleaning up circular imports, etc.
But the biggest change is switching to an optional XML format for the
regex lexer.
Having lexers defined only in Go is not ideal for a couple of reasons.
Firstly, it impedes a significant portion of contributors who use Chroma
in Hugo, but don't know Go. Secondly, it bloats the binary size of any
project that imports Chroma.
Why XML? YAML is an abomination and JSON is not human editable. XML
also compresses very well (eg. Go template lexer XML compresses from
3239 bytes to 718).
Why a new syntax format? All major existing formats rely on the
Oniguruma regex engine, which is extremely complex and for which there
is no Go port.
Why not earlier? Prior to the existence of fs.FS this was not a viable
option.
Benchmarks:
$ hyperfine --warmup 3 \
'./chroma.master --version' \
'./chroma.xml-pre-opt --version' \
'./chroma.xml --version'
Benchmark 1: ./chroma.master --version
Time (mean ± σ): 5.3 ms ± 0.5 ms [User: 3.6 ms, System: 1.4 ms]
Range (min … max): 4.2 ms … 6.6 ms 233 runs
Benchmark 2: ./chroma.xml-pre-opt --version
Time (mean ± σ): 50.6 ms ± 0.5 ms [User: 52.4 ms, System: 3.6 ms]
Range (min … max): 49.2 ms … 51.5 ms 51 runs
Benchmark 3: ./chroma.xml --version
Time (mean ± σ): 6.9 ms ± 1.1 ms [User: 5.1 ms, System: 1.5 ms]
Range (min … max): 5.7 ms … 19.9 ms 196 runs
Summary
'./chroma.master --version' ran
1.30 ± 0.23 times faster than './chroma.xml --version'
9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version'
A slight increase in init time, but I think this is okay given the
increase in flexibility.
And binary size difference:
$ du -h lexers.test*
$ du -sh chroma* 951371ms
8.8M chroma.master
7.8M chroma.xml
7.8M chroma.xml-pre-opt
Benchmarks:
$ hyperfine --warmup 3 \
'./chroma.master --version' \
'./chroma.xml-pre-opt --version' \
'./chroma.xml --version'
Benchmark 1: ./chroma.master --version
Time (mean ± σ): 5.3 ms ± 0.5 ms [User: 3.6 ms, System: 1.4 ms]
Range (min … max): 4.2 ms … 6.6 ms 233 runs
Benchmark 2: ./chroma.xml-pre-opt --version
Time (mean ± σ): 50.6 ms ± 0.5 ms [User: 52.4 ms, System: 3.6 ms]
Range (min … max): 49.2 ms … 51.5 ms 51 runs
Benchmark 3: ./chroma.xml --version
Time (mean ± σ): 6.9 ms ± 1.1 ms [User: 5.1 ms, System: 1.5 ms]
Range (min … max): 5.7 ms … 19.9 ms 196 runs
Summary
'./chroma.master --version' ran
1.30 ± 0.23 times faster than './chroma.xml --version'
9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version'
Incompatible changes:
- (*RegexLexer).SetAnalyser: changed from func(func(text string) float32) *RegexLexer to func(func(text string) float32) Lexer
- (*TokenType).UnmarshalJSON: removed
- Lexer.AnalyseText: added
- Lexer.SetAnalyser: added
- Lexer.SetRegistry: added
- MustNewLazyLexer: removed
- MustNewLexer: changed from func(*Config, Rules) *RegexLexer to func(*Config, func() Rules) *RegexLexer
- Mutators: changed from func(...Mutator) MutatorFunc to func(...Mutator) Mutator
- NewLazyLexer: removed
- NewLexer: changed from func(*Config, Rules) (*RegexLexer, error) to func(*Config, func() Rules) (*RegexLexer, error)
- Pop: changed from func(int) MutatorFunc to func(int) Mutator
- Push: changed from func(...string) MutatorFunc to func(...string) Mutator
- TokenType.MarshalJSON: removed
- Using: changed from func(Lexer) Emitter to func(string) Emitter
- UsingByGroup: changed from func(func(string) Lexer, int, int, ...Emitter) Emitter to func(int, int, ...Emitter) Emitter
2022-01-03 23:51:17 +11:00
|
|
|
err := Pop(1).Mutate(state)
|
2021-08-28 15:41:28 +04:30
|
|
|
if err != nil {
|
|
|
|
panic(err)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// Push state name to stack if asked
|
|
|
|
if rule.pushState {
|
Version 2 of Chroma
This cleans up the API in general, removing a bunch of deprecated stuff,
cleaning up circular imports, etc.
But the biggest change is switching to an optional XML format for the
regex lexer.
Having lexers defined only in Go is not ideal for a couple of reasons.
Firstly, it impedes a significant portion of contributors who use Chroma
in Hugo, but don't know Go. Secondly, it bloats the binary size of any
project that imports Chroma.
Why XML? YAML is an abomination and JSON is not human editable. XML
also compresses very well (eg. Go template lexer XML compresses from
3239 bytes to 718).
Why a new syntax format? All major existing formats rely on the
Oniguruma regex engine, which is extremely complex and for which there
is no Go port.
Why not earlier? Prior to the existence of fs.FS this was not a viable
option.
Benchmarks:
$ hyperfine --warmup 3 \
'./chroma.master --version' \
'./chroma.xml-pre-opt --version' \
'./chroma.xml --version'
Benchmark 1: ./chroma.master --version
Time (mean ± σ): 5.3 ms ± 0.5 ms [User: 3.6 ms, System: 1.4 ms]
Range (min … max): 4.2 ms … 6.6 ms 233 runs
Benchmark 2: ./chroma.xml-pre-opt --version
Time (mean ± σ): 50.6 ms ± 0.5 ms [User: 52.4 ms, System: 3.6 ms]
Range (min … max): 49.2 ms … 51.5 ms 51 runs
Benchmark 3: ./chroma.xml --version
Time (mean ± σ): 6.9 ms ± 1.1 ms [User: 5.1 ms, System: 1.5 ms]
Range (min … max): 5.7 ms … 19.9 ms 196 runs
Summary
'./chroma.master --version' ran
1.30 ± 0.23 times faster than './chroma.xml --version'
9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version'
A slight increase in init time, but I think this is okay given the
increase in flexibility.
And binary size difference:
$ du -h lexers.test*
$ du -sh chroma* 951371ms
8.8M chroma.master
7.8M chroma.xml
7.8M chroma.xml-pre-opt
Benchmarks:
$ hyperfine --warmup 3 \
'./chroma.master --version' \
'./chroma.xml-pre-opt --version' \
'./chroma.xml --version'
Benchmark 1: ./chroma.master --version
Time (mean ± σ): 5.3 ms ± 0.5 ms [User: 3.6 ms, System: 1.4 ms]
Range (min … max): 4.2 ms … 6.6 ms 233 runs
Benchmark 2: ./chroma.xml-pre-opt --version
Time (mean ± σ): 50.6 ms ± 0.5 ms [User: 52.4 ms, System: 3.6 ms]
Range (min … max): 49.2 ms … 51.5 ms 51 runs
Benchmark 3: ./chroma.xml --version
Time (mean ± σ): 6.9 ms ± 1.1 ms [User: 5.1 ms, System: 1.5 ms]
Range (min … max): 5.7 ms … 19.9 ms 196 runs
Summary
'./chroma.master --version' ran
1.30 ± 0.23 times faster than './chroma.xml --version'
9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version'
Incompatible changes:
- (*RegexLexer).SetAnalyser: changed from func(func(text string) float32) *RegexLexer to func(func(text string) float32) Lexer
- (*TokenType).UnmarshalJSON: removed
- Lexer.AnalyseText: added
- Lexer.SetAnalyser: added
- Lexer.SetRegistry: added
- MustNewLazyLexer: removed
- MustNewLexer: changed from func(*Config, Rules) *RegexLexer to func(*Config, func() Rules) *RegexLexer
- Mutators: changed from func(...Mutator) MutatorFunc to func(...Mutator) Mutator
- NewLazyLexer: removed
- NewLexer: changed from func(*Config, Rules) (*RegexLexer, error) to func(*Config, func() Rules) (*RegexLexer, error)
- Pop: changed from func(int) MutatorFunc to func(int) Mutator
- Push: changed from func(...string) MutatorFunc to func(...string) Mutator
- TokenType.MarshalJSON: removed
- Using: changed from func(Lexer) Emitter to func(string) Emitter
- UsingByGroup: changed from func(func(string) Lexer, int, int, ...Emitter) Emitter to func(int, int, ...Emitter) Emitter
2022-01-03 23:51:17 +11:00
|
|
|
err := Push(stateName).Mutate(state)
|
2021-08-28 15:41:28 +04:30
|
|
|
if err != nil {
|
|
|
|
panic(err)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if !rule.pop {
|
|
|
|
state.Set(stackName, append(stack, rule))
|
|
|
|
}
|
|
|
|
|
|
|
|
return nil
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// Generates rule replacing stack using state name and rule position
|
|
|
|
func genStackName(stateName string, rulePosition rulePosition) (stackName string) {
|
|
|
|
switch rulePosition {
|
|
|
|
case topRule:
|
|
|
|
stackName = stateName + `-top-stack`
|
|
|
|
case bottomRule:
|
|
|
|
stackName = stateName + `-bottom-stack`
|
|
|
|
}
|
|
|
|
return
|
2021-04-28 03:25:14 +04:30
|
|
|
}
|
|
|
|
|
2021-08-28 15:41:28 +04:30
|
|
|
// Makes a compiled rule and returns it
|
|
|
|
func makeRule(config ruleMakingConfig) *CompiledRule {
|
2021-04-28 03:25:14 +04:30
|
|
|
var rePattern string
|
2021-08-28 15:41:28 +04:30
|
|
|
|
2021-04-28 03:25:14 +04:30
|
|
|
if len(config.delimiter) > 0 {
|
2021-08-28 15:41:28 +04:30
|
|
|
delimiter := string(config.delimiter)
|
|
|
|
|
|
|
|
if config.numberOfDelimiterChars > 1 {
|
|
|
|
delimiter = strings.Repeat(delimiter, config.numberOfDelimiterChars)
|
|
|
|
}
|
|
|
|
|
|
|
|
rePattern = `(?<!(?<!\\)\\)` + regexp2.Escape(delimiter)
|
2021-04-28 03:25:14 +04:30
|
|
|
} else {
|
|
|
|
rePattern = config.pattern
|
|
|
|
}
|
2021-08-28 15:41:28 +04:30
|
|
|
|
2021-04-28 20:54:41 -05:00
|
|
|
regex := regexp2.MustCompile(rePattern, regexp2.None)
|
2021-04-28 03:25:14 +04:30
|
|
|
|
2021-04-28 20:54:41 -05:00
|
|
|
cRule := &CompiledRule{
|
2021-04-28 03:25:14 +04:30
|
|
|
Rule: Rule{rePattern, config.tokenType, config.mutator},
|
|
|
|
Regexp: regex,
|
|
|
|
}
|
|
|
|
|
|
|
|
return cRule
|
|
|
|
}
|
|
|
|
|
|
|
|
// Emitter for colon pairs, changes token state based on key and brackets
|
|
|
|
func colonPair(tokenClass TokenType) Emitter {
|
2021-05-06 14:37:30 +04:30
|
|
|
return EmitterFunc(func(groups []string, state *LexerState) Iterator {
|
2021-04-28 03:25:14 +04:30
|
|
|
iterators := []Iterator{}
|
|
|
|
tokens := []Token{
|
2021-06-08 22:53:54 +04:30
|
|
|
{Punctuation, state.NamedGroups[`colon`]},
|
|
|
|
{Punctuation, state.NamedGroups[`opening_delimiters`]},
|
|
|
|
{Punctuation, state.NamedGroups[`closing_delimiters`]},
|
2021-04-28 03:25:14 +04:30
|
|
|
}
|
|
|
|
|
|
|
|
// Append colon
|
|
|
|
iterators = append(iterators, Literator(tokens[0]))
|
|
|
|
|
|
|
|
if tokenClass == NameAttribute {
|
2021-06-08 22:53:54 +04:30
|
|
|
iterators = append(iterators, Literator(Token{NameAttribute, state.NamedGroups[`key`]}))
|
2021-04-28 03:25:14 +04:30
|
|
|
} else {
|
|
|
|
var keyTokenState string
|
2021-04-28 20:54:41 -05:00
|
|
|
keyre := regexp.MustCompile(`^\d+$`)
|
2021-06-08 22:53:54 +04:30
|
|
|
if keyre.MatchString(state.NamedGroups[`key`]) {
|
2021-04-28 03:25:14 +04:30
|
|
|
keyTokenState = "common"
|
|
|
|
} else {
|
|
|
|
keyTokenState = "Q"
|
|
|
|
}
|
|
|
|
|
|
|
|
// Use token state to Tokenise key
|
|
|
|
if keyTokenState != "" {
|
2021-05-06 14:37:30 +04:30
|
|
|
iterator, err := state.Lexer.Tokenise(
|
2021-04-28 03:25:14 +04:30
|
|
|
&TokeniseOptions{
|
|
|
|
State: keyTokenState,
|
|
|
|
Nested: true,
|
2021-06-08 22:53:54 +04:30
|
|
|
}, state.NamedGroups[`key`])
|
2021-04-28 03:25:14 +04:30
|
|
|
|
|
|
|
if err != nil {
|
|
|
|
panic(err)
|
|
|
|
} else {
|
|
|
|
// Append key
|
|
|
|
iterators = append(iterators, iterator)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// Append punctuation
|
|
|
|
iterators = append(iterators, Literator(tokens[1]))
|
|
|
|
|
|
|
|
var valueTokenState string
|
|
|
|
|
2021-06-08 22:53:54 +04:30
|
|
|
switch state.NamedGroups[`opening_delimiters`] {
|
2021-04-28 03:25:14 +04:30
|
|
|
case "(", "{", "[":
|
|
|
|
valueTokenState = "root"
|
|
|
|
case "<<", "«":
|
|
|
|
valueTokenState = "ww"
|
|
|
|
case "<":
|
|
|
|
valueTokenState = "Q"
|
|
|
|
}
|
|
|
|
|
|
|
|
// Use token state to Tokenise value
|
|
|
|
if valueTokenState != "" {
|
2021-05-06 14:37:30 +04:30
|
|
|
iterator, err := state.Lexer.Tokenise(
|
2021-04-28 03:25:14 +04:30
|
|
|
&TokeniseOptions{
|
|
|
|
State: valueTokenState,
|
|
|
|
Nested: true,
|
2021-06-08 22:53:54 +04:30
|
|
|
}, state.NamedGroups[`value`])
|
2021-04-28 03:25:14 +04:30
|
|
|
|
|
|
|
if err != nil {
|
|
|
|
panic(err)
|
|
|
|
} else {
|
|
|
|
// Append value
|
|
|
|
iterators = append(iterators, iterator)
|
|
|
|
}
|
|
|
|
}
|
2021-06-08 22:53:54 +04:30
|
|
|
// Append last punctuation
|
2021-04-28 03:25:14 +04:30
|
|
|
iterators = append(iterators, Literator(tokens[2]))
|
|
|
|
|
|
|
|
return Concaterator(iterators...)
|
|
|
|
})
|
|
|
|
}
|
|
|
|
|
|
|
|
// Emitter for quoting constructs, changes token state based on quote name and adverbs
|
2021-05-06 14:37:30 +04:30
|
|
|
func quote(groups []string, state *LexerState) Iterator {
|
2021-06-08 22:53:54 +04:30
|
|
|
keyword := state.NamedGroups[`keyword`]
|
|
|
|
adverbsStr := state.NamedGroups[`adverbs`]
|
2021-04-28 03:25:14 +04:30
|
|
|
iterators := []Iterator{}
|
|
|
|
tokens := []Token{
|
2021-06-08 22:53:54 +04:30
|
|
|
{Keyword, keyword},
|
|
|
|
{StringAffix, adverbsStr},
|
|
|
|
{Text, state.NamedGroups[`ws`]},
|
|
|
|
{Punctuation, state.NamedGroups[`opening_delimiters`]},
|
|
|
|
{Punctuation, state.NamedGroups[`closing_delimiters`]},
|
2021-04-28 03:25:14 +04:30
|
|
|
}
|
|
|
|
|
2021-06-08 22:53:54 +04:30
|
|
|
// Append all tokens before dealing with the main string
|
|
|
|
iterators = append(iterators, Literator(tokens[:4]...))
|
|
|
|
|
2021-04-28 03:25:14 +04:30
|
|
|
var tokenStates []string
|
|
|
|
|
|
|
|
// Set tokenStates based on adverbs
|
2021-06-08 22:53:54 +04:30
|
|
|
adverbs := strings.Split(adverbsStr, ":")
|
2021-04-28 03:25:14 +04:30
|
|
|
for _, adverb := range adverbs {
|
|
|
|
switch adverb {
|
|
|
|
case "c", "closure":
|
|
|
|
tokenStates = append(tokenStates, "Q-closure")
|
|
|
|
case "qq":
|
|
|
|
tokenStates = append(tokenStates, "qq")
|
|
|
|
case "ww":
|
|
|
|
tokenStates = append(tokenStates, "ww")
|
|
|
|
case "s", "scalar", "a", "array", "h", "hash", "f", "function":
|
|
|
|
tokenStates = append(tokenStates, "Q-variable")
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
var tokenState string
|
|
|
|
|
|
|
|
switch {
|
2021-06-08 22:53:54 +04:30
|
|
|
case keyword == "qq" || contains(tokenStates, "qq"):
|
2021-04-28 03:25:14 +04:30
|
|
|
tokenState = "qq"
|
2021-06-08 22:53:54 +04:30
|
|
|
case adverbsStr == "ww" || contains(tokenStates, "ww"):
|
2021-04-28 03:25:14 +04:30
|
|
|
tokenState = "ww"
|
|
|
|
case contains(tokenStates, "Q-closure") && contains(tokenStates, "Q-variable"):
|
|
|
|
tokenState = "qq"
|
|
|
|
case contains(tokenStates, "Q-closure"):
|
|
|
|
tokenState = "Q-closure"
|
|
|
|
case contains(tokenStates, "Q-variable"):
|
|
|
|
tokenState = "Q-variable"
|
|
|
|
default:
|
|
|
|
tokenState = "Q"
|
|
|
|
}
|
|
|
|
|
2021-05-06 14:37:30 +04:30
|
|
|
iterator, err := state.Lexer.Tokenise(
|
2021-04-28 03:25:14 +04:30
|
|
|
&TokeniseOptions{
|
|
|
|
State: tokenState,
|
|
|
|
Nested: true,
|
2021-06-08 22:53:54 +04:30
|
|
|
}, state.NamedGroups[`value`])
|
2021-04-28 03:25:14 +04:30
|
|
|
|
|
|
|
if err != nil {
|
|
|
|
panic(err)
|
|
|
|
} else {
|
|
|
|
iterators = append(iterators, iterator)
|
|
|
|
}
|
|
|
|
|
|
|
|
// Append the last punctuation
|
2021-06-08 22:53:54 +04:30
|
|
|
iterators = append(iterators, Literator(tokens[4]))
|
2021-04-28 03:25:14 +04:30
|
|
|
|
|
|
|
return Concaterator(iterators...)
|
|
|
|
}
|
|
|
|
|
|
|
|
// Emitter for pod config, tokenises the properties with "colon-pair-attribute" state
|
2021-05-06 14:37:30 +04:30
|
|
|
func podConfig(groups []string, state *LexerState) Iterator {
|
2021-04-28 03:25:14 +04:30
|
|
|
// Tokenise pod config
|
2021-05-06 14:37:30 +04:30
|
|
|
iterator, err := state.Lexer.Tokenise(
|
2021-04-28 03:25:14 +04:30
|
|
|
&TokeniseOptions{
|
|
|
|
State: "colon-pair-attribute",
|
|
|
|
Nested: true,
|
|
|
|
}, groups[0])
|
|
|
|
|
|
|
|
if err != nil {
|
|
|
|
panic(err)
|
|
|
|
} else {
|
|
|
|
return iterator
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// Emitter for pod code, tokenises the code based on the lang specified
|
2021-05-06 14:37:30 +04:30
|
|
|
func podCode(groups []string, state *LexerState) Iterator {
|
2021-04-28 03:25:14 +04:30
|
|
|
iterators := []Iterator{}
|
|
|
|
tokens := []Token{
|
2021-06-08 22:53:54 +04:30
|
|
|
{Comment, state.NamedGroups[`ws`]},
|
|
|
|
{Keyword, state.NamedGroups[`keyword`]},
|
|
|
|
{Keyword, state.NamedGroups[`ws2`]},
|
|
|
|
{Keyword, state.NamedGroups[`name`]},
|
|
|
|
{StringDoc, state.NamedGroups[`value`]},
|
|
|
|
{Comment, state.NamedGroups[`ws3`]},
|
|
|
|
{Keyword, state.NamedGroups[`end_keyword`]},
|
|
|
|
{Keyword, state.NamedGroups[`ws4`]},
|
|
|
|
{Keyword, state.NamedGroups[`name`]},
|
2021-04-28 03:25:14 +04:30
|
|
|
}
|
|
|
|
|
|
|
|
// Append all tokens before dealing with the pod config
|
|
|
|
iterators = append(iterators, Literator(tokens[:4]...))
|
|
|
|
|
|
|
|
// Tokenise pod config
|
2021-06-08 22:53:54 +04:30
|
|
|
iterators = append(iterators, podConfig([]string{state.NamedGroups[`config`]}, state))
|
2021-04-28 03:25:14 +04:30
|
|
|
|
2021-06-08 22:53:54 +04:30
|
|
|
langMatch := regexp.MustCompile(`:lang\W+(\w+)`).FindStringSubmatch(state.NamedGroups[`config`])
|
2021-04-28 03:25:14 +04:30
|
|
|
var lang string
|
|
|
|
if len(langMatch) > 1 {
|
|
|
|
lang = langMatch[1]
|
|
|
|
}
|
|
|
|
|
|
|
|
// Tokenise code based on lang property
|
Version 2 of Chroma
This cleans up the API in general, removing a bunch of deprecated stuff,
cleaning up circular imports, etc.
But the biggest change is switching to an optional XML format for the
regex lexer.
Having lexers defined only in Go is not ideal for a couple of reasons.
Firstly, it impedes a significant portion of contributors who use Chroma
in Hugo, but don't know Go. Secondly, it bloats the binary size of any
project that imports Chroma.
Why XML? YAML is an abomination and JSON is not human editable. XML
also compresses very well (eg. Go template lexer XML compresses from
3239 bytes to 718).
Why a new syntax format? All major existing formats rely on the
Oniguruma regex engine, which is extremely complex and for which there
is no Go port.
Why not earlier? Prior to the existence of fs.FS this was not a viable
option.
Benchmarks:
$ hyperfine --warmup 3 \
'./chroma.master --version' \
'./chroma.xml-pre-opt --version' \
'./chroma.xml --version'
Benchmark 1: ./chroma.master --version
Time (mean ± σ): 5.3 ms ± 0.5 ms [User: 3.6 ms, System: 1.4 ms]
Range (min … max): 4.2 ms … 6.6 ms 233 runs
Benchmark 2: ./chroma.xml-pre-opt --version
Time (mean ± σ): 50.6 ms ± 0.5 ms [User: 52.4 ms, System: 3.6 ms]
Range (min … max): 49.2 ms … 51.5 ms 51 runs
Benchmark 3: ./chroma.xml --version
Time (mean ± σ): 6.9 ms ± 1.1 ms [User: 5.1 ms, System: 1.5 ms]
Range (min … max): 5.7 ms … 19.9 ms 196 runs
Summary
'./chroma.master --version' ran
1.30 ± 0.23 times faster than './chroma.xml --version'
9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version'
A slight increase in init time, but I think this is okay given the
increase in flexibility.
And binary size difference:
$ du -h lexers.test*
$ du -sh chroma* 951371ms
8.8M chroma.master
7.8M chroma.xml
7.8M chroma.xml-pre-opt
Benchmarks:
$ hyperfine --warmup 3 \
'./chroma.master --version' \
'./chroma.xml-pre-opt --version' \
'./chroma.xml --version'
Benchmark 1: ./chroma.master --version
Time (mean ± σ): 5.3 ms ± 0.5 ms [User: 3.6 ms, System: 1.4 ms]
Range (min … max): 4.2 ms … 6.6 ms 233 runs
Benchmark 2: ./chroma.xml-pre-opt --version
Time (mean ± σ): 50.6 ms ± 0.5 ms [User: 52.4 ms, System: 3.6 ms]
Range (min … max): 49.2 ms … 51.5 ms 51 runs
Benchmark 3: ./chroma.xml --version
Time (mean ± σ): 6.9 ms ± 1.1 ms [User: 5.1 ms, System: 1.5 ms]
Range (min … max): 5.7 ms … 19.9 ms 196 runs
Summary
'./chroma.master --version' ran
1.30 ± 0.23 times faster than './chroma.xml --version'
9.56 ± 0.83 times faster than './chroma.xml-pre-opt --version'
Incompatible changes:
- (*RegexLexer).SetAnalyser: changed from func(func(text string) float32) *RegexLexer to func(func(text string) float32) Lexer
- (*TokenType).UnmarshalJSON: removed
- Lexer.AnalyseText: added
- Lexer.SetAnalyser: added
- Lexer.SetRegistry: added
- MustNewLazyLexer: removed
- MustNewLexer: changed from func(*Config, Rules) *RegexLexer to func(*Config, func() Rules) *RegexLexer
- Mutators: changed from func(...Mutator) MutatorFunc to func(...Mutator) Mutator
- NewLazyLexer: removed
- NewLexer: changed from func(*Config, Rules) (*RegexLexer, error) to func(*Config, func() Rules) (*RegexLexer, error)
- Pop: changed from func(int) MutatorFunc to func(int) Mutator
- Push: changed from func(...string) MutatorFunc to func(...string) Mutator
- TokenType.MarshalJSON: removed
- Using: changed from func(Lexer) Emitter to func(string) Emitter
- UsingByGroup: changed from func(func(string) Lexer, int, int, ...Emitter) Emitter to func(int, int, ...Emitter) Emitter
2022-01-03 23:51:17 +11:00
|
|
|
sublexer := Get(lang)
|
2021-04-28 03:25:14 +04:30
|
|
|
if sublexer != nil {
|
2021-06-08 22:53:54 +04:30
|
|
|
iterator, err := sublexer.Tokenise(nil, state.NamedGroups[`value`])
|
2021-04-28 03:25:14 +04:30
|
|
|
|
2021-06-08 22:53:54 +04:30
|
|
|
if err != nil {
|
|
|
|
panic(err)
|
2021-04-28 03:25:14 +04:30
|
|
|
} else {
|
2021-06-08 22:53:54 +04:30
|
|
|
iterators = append(iterators, iterator)
|
2021-04-28 03:25:14 +04:30
|
|
|
}
|
|
|
|
} else {
|
|
|
|
iterators = append(iterators, Literator(tokens[4]))
|
|
|
|
}
|
|
|
|
|
|
|
|
// Append the rest of the tokens
|
|
|
|
iterators = append(iterators, Literator(tokens[5:]...))
|
|
|
|
|
|
|
|
return Concaterator(iterators...)
|
|
|
|
}
|