1
0
mirror of https://github.com/alecthomas/chroma.git synced 2025-04-04 22:24:23 +02:00

10 Commits

Author SHA1 Message Date
Nelo Mitranim
eafea0d771 replace expensive char lists with char classes
Huge hardcoded character lists have a cost. In some current lexers, such
as Haskell and JavaScript, this bloats lexer init time to 60-80ms on
some current systems, as opposed to sub-ms for many others.

Replacing them with character classes such as `\p{L}` seems to
eliminate this cost, reducing lexer init time to the norm (around 1ms).
In addition, this significantly reduces and simplifies the code.

The current tests pass, but there may be inaccuracies not covered by
tests. This requires a review.

This change is likely to cause edge case regressions, as the sets of
characters considered "letters" vary between languages. However, Chroma
lexers don't aim to be perfectly accurate. Performance should be just as
much a goal as accuracy. I believe this tradeoff to be justified.

This commit leaves at least two lexers unfixed: Julia and Kotlin.
Judging by the code, they might have the same issue, and should also be
addressed.
2021-07-27 22:04:33 +10:00
mlpo
1b7d2dd620 Update Python lexers and add tests for them 2021-05-08 18:18:27 +10:00
mlpo
8bba42c1ff Update mimetypes in Python lexers 2021-05-05 16:59:58 +10:00
mlpo
515a389ccc gofmt 2021-05-05 14:44:48 +10:00
mlpo
ba03a8b276 Use Python 3 by default 2021-05-05 14:44:48 +10:00
Ville Skyttä
b3d969cafc python: add *.pyi 2021-05-05 08:18:23 +10:00
Steven Penny
7a68f3e25d LiteralNumberHex: underscore support
Most languages allow for underscore in number literals. Fix support for a few
languages. References:

- https://docs.python.org/reference/lexical_analysis.html#integer-literals
- https://golang.org/ref/spec#Integer_literals
- https://php.net/language.types.integer
2021-03-03 12:57:43 +11:00
Cameron Moore
59126c5b32
Add NewLazyLexer to defer rules definitions and reduce init costs (#449)
Add NewLazyLexer and MustNewLazyLexer which accept a function that
returns the rules for the lexer.  This allows us to defer the rules
definitions until they're needed.

Lexers in a, g, s, and x packages have been updated to use the new lazy
lexer.
2021-02-08 12:16:49 +11:00
Steven Penny
881d54f096 Improve number literals for several languages
Using these examples:

~~~
0x21
1_000
1e3
~~~

Several languages failed that support these syntax. Here are ones I found:

Lexer      | 0x21 | 1_000 | 1e3
-----------|------|-------|-----
C#         | fail | fail  | fail
Go         |      | fail  |
JavaScript |      | fail  | fail
PHP        |      | fail  |
Python     |      | fail  |
Ruby       |      |       | fail

I fixed these issues, and added tests.
2020-11-22 08:30:29 +11:00
Alec Thomas
563aadc53c Moved lexers into alphabetical sub-packages.
This was done to speed up incremental compilation when working on
lexers. That is, modifying a single lexer will no longer require
recompiling all lexers.

This is a (slightly) backwards breaking change in that lexers are no
longer exported directly in the lexers package. The registry API is
"aliased" at the old location.
2018-02-15 21:09:02 +11:00