1
0
mirror of https://github.com/alecthomas/chroma.git synced 2025-11-23 22:24:39 +02:00

Don't output extra whitespace in YAML multiline (#993)

This resolves a particular issue with parsing YAML multiline, for
example:
```yaml
a: |
  multiline literal
  line 2
```

The regex used would capture the amount of indentation in the third
capture group and then use that as a kind of "status" to know which
lines are part of the indented multiline. However, because its a
captured group it has to be assigned a token which was `TextWhitespace`.
This meant that the indentation was outputted after the multiline,
technically it should be seen as an non-captured group, but then its no
longer to refer to it in the regex. Therefore I've gone with the
solution to add a new token, Ignore, which will not be emitted as a
token in the iterator, which can safely be used to make use of capture
groups but not have them show up in the output.

## Before

![image](https://github.com/user-attachments/assets/c29353c5-9e15-4f14-a733-57a60fb51910)

## After

![image](https://github.com/user-attachments/assets/57b5d129-a9d3-4b84-ae1f-dc05182b9ad3)
This commit is contained in:
Gusted
2024-08-22 22:58:31 +02:00
committed by GitHub
parent 895a0488b5
commit 4d11870090
6 changed files with 448 additions and 424 deletions

View File

@@ -194,6 +194,9 @@ func (l *LexerState) Iterator() Token { // nolint: gocognit
for len(l.iteratorStack) > 0 {
n := len(l.iteratorStack) - 1
t := l.iteratorStack[n]()
if t.Type == Ignore {
continue
}
if t == EOF {
l.iteratorStack = l.iteratorStack[:n]
continue
@@ -243,6 +246,9 @@ func (l *LexerState) Iterator() Token { // nolint: gocognit
for len(l.iteratorStack) > 0 {
n := len(l.iteratorStack) - 1
t := l.iteratorStack[n]()
if t.Type == Ignore {
continue
}
if t == EOF {
l.iteratorStack = l.iteratorStack[:n]
continue