mirror of
https://github.com/alecthomas/chroma.git
synced 2025-04-11 11:32:05 +02:00
Huge hardcoded character lists have a cost. In some current lexers, such as Haskell and JavaScript, this bloats lexer init time to 60-80ms on some current systems, as opposed to sub-ms for many others. Replacing them with character classes such as `\p{L}` seems to eliminate this cost, reducing lexer init time to the norm (around 1ms). In addition, this significantly reduces and simplifies the code. The current tests pass, but there may be inaccuracies not covered by tests. This requires a review. This change is likely to cause edge case regressions, as the sets of characters considered "letters" vary between languages. However, Chroma lexers don't aim to be perfectly accurate. Performance should be just as much a goal as accuracy. I believe this tradeoff to be justified. This commit leaves at least two lexers unfixed: Julia and Kotlin. Judging by the code, they might have the same issue, and should also be addressed.