Bump tcell dependency to v3

2026-05-22 10:15:43 +02:00 · 2026-04-01 16:12:16 +02:00
parent 64996d12d9
commit 5d3715f96b
142 changed files with 15398 additions and 8297 deletions
@@ -0,0 +1,3 @@
+.DS_Store
+*.out
+*.test
@@ -0,0 +1,51 @@
+The goals and overview of this package can be found in the README.md file,
+start by reading that.
+
+The goal of this package is to determine the display (column) width of a
+string, UTF-8 bytes, or runes, as would happen in a monospace font, especially
+in a terminal.
+
+When troubleshooting, write Go unit tests instead of executing debug scripts.
+The tests can return whatever logs or output you need. If those tests are
+only for temporary troubleshooting, clean up the tests after the debugging is
+done.
+
+(Separate executable debugging scripts are messy, tend to have conflicting
+dependencies and are hard to cleanup.)
+
+If you make changes to the trie generation in internal/gen, it can be invoked
+by running `go generate` from the top package directory.
+
+## Pull Requests and branches
+
+For PRs (pull requests), you can use the gh CLI tool. Compare the current branch with main. Reviewing a PR and reviewing a branch are about the same, but the PR may add context.
+
+Understand the goals of the PR. Note any API changes, especially breaking changes.
+
+Look for thoroughness of tests, as well as GoDoc comments.
+
+Retrieve and consider the comments on the PR, which may have come from GitHub Copilot or Cursor BugBot. Think like GitHub Copilot or Cursor BugBot.
+
+Offer to optionally post a brief summary of the review to the PR, via the gh CLI tool.
+
+## Tagged Go releases
+
+If I ask you whether we are ready to release, this means a tagged Go release on the main branch. Go releases are git tagged with a version number.
+
+Review the changes since the last release, i.e. the previous git tag. Ensure that the changes are complete and correct. Identify new features, bug fixes, and performance improvements.
+
+Identify breaking changes, especially API changes.
+
+Ensure good test coverage. Look for performance changes, especially performance regressions, by running benchmarks against the previous release.
+
+Ensure that the documentation in READMEs and GoDocs are complete, correct and consistent.
+
+## Comparisons to go-runewidth
+
+We originally attempted to make this package compatible with go-runewidth.
+However, we found that there were too many differences in the handling of
+certain characters and properties.
+
+We believe, preliminarily, that our choices are more correct and complete,
+by using more complete categories such as Unicode Cf (format) for zero-width
+and Mn (Nonspacing_Mark) for combining marks.
@@ -0,0 +1,129 @@
+# Changelog
+
+## [0.11.0]
+
+[Compare](https://github.com/clipperhouse/displaywidth/compare/v0.10.0...v0.11.0)
+
+### Added
+- New `ControlSequences8Bit` option to treat 8-bit ECMA-48 (C1) escape sequences as zero-width. (#22)
+
+### Changed
+- Upgraded uax29 dependency to v2.7.0 for 8-bit escape sequence support in the grapheme iterator.
+- Truncation now validates that preserved trailing escape sequences are zero-width, preventing edge cases where non-zero-width sequences could leak into output.
+
+### Note
+- `ControlSequences8Bit` is deliberately ignored by `TruncateString` and `TruncateBytes`, because C1 byte values (0x80–0x9F) overlap with UTF-8 multi-byte encoding.
+
+## [0.10.0]
+
+[Compare](https://github.com/clipperhouse/displaywidth/compare/v0.9.0...v0.10.0)
+
+### Added
+- New `ControlSequences` option to treat ECMA-48/ANSI escape sequences as zero-width. (#20)
+- `TruncateString` and `TruncateBytes` now preserve trailing ANSI escape sequences (such as SGR resets) when `ControlSequences` is true, preventing color bleed in terminal output.
+
+### Changed
+- Removed `stringish` dependency; generic type constraints are now inline `~string | []byte`.
+- Upgraded uax29 dependency to v2.6.0 for ANSI escape sequence support in the grapheme iterator.
+
+## [0.9.0]
+
+[Compare](https://github.com/clipperhouse/displaywidth/compare/v0.8.0...v0.9.0)
+
+### Changed
+- Unicode 17 support: East Asian Width and emoji data updated to Unicode 17.0.0. (#18)
+- Upgraded uax29 dependency to v2.5.0 (Unicode 17 grapheme segmentation).
+
+## [0.8.0]
+
+[Compare](https://github.com/clipperhouse/displaywidth/compare/v0.7.0...v0.8.0)
+
+### Changed
+- Performance: ASCII fast path that applies to any run of printable
+  ASCII. 2x-10x faster for ASCII text vs v0.7.0. (#16)
+- Upgraded uax29 dependency to v2.4.0 for Unicode 16 support. Text that includes
+  Indic_Conjunct_Break may segment differently (and more correctly). (#15)
+
+## [0.7.0]
+
+[Compare](https://github.com/clipperhouse/displaywidth/compare/v0.6.2...v0.7.0)
+
+### Added
+- New `TruncateString` and `TruncateBytes` methods to truncate strings to a
+  maximum display width, with optional tail (like an ellipsis). (#13)
+
+## [0.6.2]
+
+[Compare](https://github.com/clipperhouse/displaywidth/compare/v0.6.1...v0.6.2)
+
+### Changed
+- Internal: reduced property categories for simpler trie.
+
+## [0.6.1]
+
+[Compare](https://github.com/clipperhouse/displaywidth/compare/v0.6.0...v0.6.1)
+
+### Changed
+- Perf improvements: replaced the ASCII lookup table with a simple
+  function. A bit more cache-friendly. More inlining.
+- Bug fix: single regional indicators are now treated as width 2, since that
+  is what actual terminals do.
+
+## [0.6.0]
+
+[Compare](https://github.com/clipperhouse/displaywidth/compare/v0.5.0...v0.6.0)
+
+### Added
+- New `StringGraphemes` and `BytesGraphemes` methods, for iterating over the
+widths of grapheme clusters.
+
+### Changed
+- Fast ASCII lookups
+
+## [0.5.0]
+
+[Compare](https://github.com/clipperhouse/displaywidth/compare/v0.4.1...v0.5.0)
+
+### Added
+- Unicode 16 support
+- Improved emoji presentation handling per Unicode TR51
+
+### Changed
+- Corrected VS15 (U+FE0E) handling: now preserves base character width (no-op) per Unicode TR51
+- Performance optimizations: reduced property lookups
+
+### Fixed
+- VS15 variation selector now correctly preserves base character width instead of forcing width 1
+
+## [0.4.1]
+
+[Compare](https://github.com/clipperhouse/displaywidth/compare/v0.4.0...v0.4.1)
+
+### Changed
+- Updated uax29 dependency
+- Improved flag handling
+
+## [0.4.0]
+
+[Compare](https://github.com/clipperhouse/displaywidth/compare/v0.3.1...v0.4.0)
+
+### Added
+- Support for variation selectors (VS15, VS16) and regional indicator pairs (flags)
+
+## [0.3.1]
+
+[Compare](https://github.com/clipperhouse/displaywidth/compare/v0.3.0...v0.3.1)
+
+### Added
+- Fuzz testing support
+
+### Changed
+- Updated stringish dependency
+
+## [0.3.0]
+
+[Compare](https://github.com/clipperhouse/displaywidth/compare/v0.2.0...v0.3.0)
+
+### Changed
+- Dropped compatibility with go-runewidth
+- Trie implementation cleanup
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2025 Matt Sherman
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
@@ -0,0 +1,190 @@
+# displaywidth
+
+A high-performance Go package for measuring the monospace display width of strings, UTF-8 bytes, and runes.
+
+[![Documentation](https://pkg.go.dev/badge/github.com/clipperhouse/displaywidth.svg)](https://pkg.go.dev/github.com/clipperhouse/displaywidth)
+[![Test](https://github.com/clipperhouse/displaywidth/actions/workflows/gotest.yml/badge.svg)](https://github.com/clipperhouse/displaywidth/actions/workflows/gotest.yml)
+[![Fuzz](https://github.com/clipperhouse/displaywidth/actions/workflows/gofuzz.yml/badge.svg)](https://github.com/clipperhouse/displaywidth/actions/workflows/gofuzz.yml)
+
+## Install
+```bash
+go get github.com/clipperhouse/displaywidth
+```
+
+## Usage
+
+```go
+package main
+
+import (
+    "fmt"
+    "github.com/clipperhouse/displaywidth"
+)
+
+func main() {
+    width := displaywidth.String("Hello, 世界!")
+    fmt.Println(width)
+
+    width = displaywidth.Bytes([]byte("🌍"))
+    fmt.Println(width)
+
+    width = displaywidth.Rune('🌍')
+    fmt.Println(width)
+}
+```
+
+For most purposes, you should use the `String` or `Bytes` methods. They sum
+the widths of grapheme clusters in the string or byte slice.
+
+> Note: in your application, iterating over runes to measure width is likely incorrect;
+the smallest unit of display is a grapheme, not a rune.
+
+### Iterating over graphemes
+
+If you need the individual graphemes:
+
+```go
+import (
+    "fmt"
+    "github.com/clipperhouse/displaywidth"
+)
+
+func main() {
+    g := displaywidth.StringGraphemes("Hello, 世界!")
+    for g.Next() {
+        width := g.Width()
+        value := g.Value()
+        // do something with the width or value
+    }
+}
+```
+
+### Options
+
+Create the options you need, and then use methods on the options struct.
+
+```go
+var myOptions = displaywidth.Options{
+    EastAsianWidth: true,
+    ControlSequences: true,
+}
+
+width := myOptions.String("Hello, 世界!")
+```
+
+#### ControlSequences
+
+`ControlSequences` specifies whether to ignore ECMA-48 escape sequences
+when calculating the display width. When `false` (default), ANSI escape
+sequences are treated as just a series of characters. When `true`, they are
+treated as a single zero-width unit.
+
+#### ControlSequences8Bit
+
+`ControlSequences8Bit` specifies whether to ignore 8-bit ECMA-48 escape sequences
+when calculating the display width. When `false` (default), these are treated
+as just a series of characters. When `true`, they are treated as a single
+zero-width unit.
+
+Note: this option is ignored by the `Truncate` methods, as the concatenation
+can lead to unintended UTF-8 semantics.
+
+#### EastAsianWidth
+
+`EastAsianWidth` defines how
+[East Asian Ambiguous characters](https://www.unicode.org/reports/tr11/#Ambiguous)
+are treated.
+
+When `false` (default), East Asian Ambiguous characters are treated as width 1.
+When `true`, they are treated as width 2.
+
+You may wish to configure this based on environment variables or locale.
+ `go-runewidth`, for example, does so
+ [during package initialization](https://github.com/mattn/go-runewidth/blob/master/runewidth.go#L26C1-L45C2). `displaywidth` does not do this automatically, we prefer to leave it to you.
+
+
+## Technical standards and compatibility
+
+This package implements the Unicode East Asian Width standard
+([UAX #11](https://www.unicode.org/reports/tr11/tr11-43.html)), and handles
+[version selectors](https://en.wikipedia.org/wiki/Variation_Selectors_(Unicode_block)),
+and [regional indicator pairs](https://en.wikipedia.org/wiki/Regional_indicator_symbol)
+(flags). We implement [Unicode TR51](https://www.unicode.org/reports/tr51/tr51-27.html)
+for emojis. We are keeping an eye on
+[emerging standards](https://www.jeffquast.com/post/state-of-terminal-emulation-2025/).
+
+For control sequences, we implement the [ECMA-48](https://ecma-international.org/publications-and-standards/standards/ecma-48/) standard for 7-bit and 8-bit control sequences.
+
+`clipperhouse/displaywidth`, `mattn/go-runewidth`, and `rivo/uniseg` will
+give the same outputs for most real-world text. Extensive details are in the
+[compatibility analysis](comparison/COMPATIBILITY_ANALYSIS.md).
+
+## Invalid UTF-8
+
+This package does not validate UTF-8. If you pass invalid UTF-8, the results
+are undefined. We fuzz against invalid UTF-8 to ensure we don't panic or
+loop indefinitely.
+
+The `ControlSequences8Bit` option means that we will segment valid 8-bit
+control sequences, which are typically _not_ valid UTF-8. 8-bit control bytes
+happen to also be UTF-8 continuation bytes. Use with caution.
+
+## Prior Art
+
+[mattn/go-runewidth](https://github.com/mattn/go-runewidth)
+
+[rivo/uniseg](https://github.com/rivo/uniseg)
+
+[x/text/width](https://pkg.go.dev/golang.org/x/text/width)
+
+[x/text/internal/triegen](https://pkg.go.dev/golang.org/x/text/internal/triegen)
+
+## Benchmarks
+
+```bash
+cd comparison
+go test -bench=. -benchmem
+```
+
+```
+goos: darwin
+goarch: arm64
+pkg: github.com/clipperhouse/displaywidth/comparison
+cpu: Apple M2
+
+BenchmarkString_Mixed/clipperhouse/displaywidth-8             5784 ns/op	      291.69 MB/s	      0 B/op	   0 allocs/op
+BenchmarkString_Mixed/mattn/go-runewidth-8                   14751 ns/op	      114.36 MB/s	      0 B/op	   0 allocs/op
+BenchmarkString_Mixed/rivo/uniseg-8                          19360 ns/op	       87.14 MB/s	      0 B/op	   0 allocs/op
+
+BenchmarkString_ASCII/clipperhouse/displaywidth-8               54.60 ns/op	     2344.32 MB/s	      0 B/op	   0 allocs/op
+BenchmarkString_ASCII/mattn/go-runewidth-8                    1195 ns/op	      107.08 MB/s	      0 B/op	   0 allocs/op
+BenchmarkString_ASCII/rivo/uniseg-8                           1578 ns/op	       81.13 MB/s	      0 B/op	   0 allocs/op
+
+BenchmarkString_EastAsian/clipperhouse/displaywidth-8         5837 ns/op	      289.01 MB/s	      0 B/op	   0 allocs/op
+BenchmarkString_EastAsian/mattn/go-runewidth-8               24418 ns/op	       69.09 MB/s	      0 B/op	   0 allocs/op
+BenchmarkString_EastAsian/rivo/uniseg-8                      19339 ns/op	       87.23 MB/s	      0 B/op	   0 allocs/op
+
+BenchmarkString_Emoji/clipperhouse/displaywidth-8             3225 ns/op	      224.51 MB/s	      0 B/op	   0 allocs/op
+BenchmarkString_Emoji/mattn/go-runewidth-8                    4851 ns/op	      149.25 MB/s	      0 B/op	   0 allocs/op
+BenchmarkString_Emoji/rivo/uniseg-8                           6591 ns/op	      109.85 MB/s	      0 B/op	   0 allocs/op
+
+BenchmarkRune_Mixed/clipperhouse/displaywidth-8               3385 ns/op	      498.34 MB/s	      0 B/op	   0 allocs/op
+BenchmarkRune_Mixed/mattn/go-runewidth-8                      5354 ns/op	      315.07 MB/s	      0 B/op	   0 allocs/op
+
+BenchmarkRune_EastAsian/clipperhouse/displaywidth-8           3397 ns/op	      496.56 MB/s	      0 B/op	   0 allocs/op
+BenchmarkRune_EastAsian/mattn/go-runewidth-8                 15673 ns/op	      107.64 MB/s	      0 B/op	   0 allocs/op
+
+BenchmarkRune_ASCII/clipperhouse/displaywidth-8                255.7 ns/op	      500.53 MB/s	      0 B/op	   0 allocs/op
+BenchmarkRune_ASCII/mattn/go-runewidth-8                       261.5 ns/op	      489.55 MB/s	      0 B/op	   0 allocs/op
+
+BenchmarkRune_Emoji/clipperhouse/displaywidth-8               1371 ns/op	      528.22 MB/s	      0 B/op	   0 allocs/op
+BenchmarkRune_Emoji/mattn/go-runewidth-8                      2267 ns/op	      319.43 MB/s	      0 B/op	   0 allocs/op
+
+BenchmarkTruncateWithTail/clipperhouse/displaywidth-8         3229 ns/op	       54.82 MB/s	    192 B/op	  14 allocs/op
+BenchmarkTruncateWithTail/mattn/go-runewidth-8                8408 ns/op	       21.05 MB/s	    192 B/op	  14 allocs/op
+
+BenchmarkTruncateWithoutTail/clipperhouse/displaywidth-8      3554 ns/op	       64.43 MB/s	      0 B/op	   0 allocs/op
+BenchmarkTruncateWithoutTail/mattn/go-runewidth-8            11189 ns/op	       20.47 MB/s	      0 B/op	   0 allocs/op
+```
+
+Here are some notes on [how to make Unicode things fast](https://clipperhouse.com/go-unicode/).
@@ -0,0 +1,3 @@
+package displaywidth
+
+//go:generate go run -C internal/gen .
@@ -0,0 +1,73 @@
+package displaywidth
+
+import (
+	"github.com/clipperhouse/uax29/v2/graphemes"
+)
+
+// Graphemes is an iterator over grapheme clusters.
+//
+// Iterate using the Next method, and get the width of the current grapheme
+// using the Width method.
+type Graphemes[T ~string | []byte] struct {
+	iter    *graphemes.Iterator[T]
+	options Options
+}
+
+// Next advances the iterator to the next grapheme cluster.
+func (g *Graphemes[T]) Next() bool {
+	return g.iter.Next()
+}
+
+// Value returns the current grapheme cluster.
+func (g *Graphemes[T]) Value() T {
+	return g.iter.Value()
+}
+
+// Width returns the display width of the current grapheme cluster.
+func (g *Graphemes[T]) Width() int {
+	return graphemeWidth(g.Value(), g.options)
+}
+
+// StringGraphemes returns an iterator over grapheme clusters for the given
+// string.
+//
+// Iterate using the Next method, and get the width of the current grapheme
+// using the Width method.
+func StringGraphemes(s string) Graphemes[string] {
+	return DefaultOptions.StringGraphemes(s)
+}
+
+// StringGraphemes returns an iterator over grapheme clusters for the given
+// string, with the given options.
+//
+// Iterate using the Next method, and get the width of the current grapheme
+// using the Width method.
+func (options Options) StringGraphemes(s string) Graphemes[string] {
+	g := graphemes.FromString(s)
+	g.AnsiEscapeSequences = options.ControlSequences
+	g.AnsiEscapeSequences8Bit = options.ControlSequences8Bit
+
+	return Graphemes[string]{iter: g, options: options}
+}
+
+// BytesGraphemes returns an iterator over grapheme clusters for the given
+// []byte.
+//
+// Iterate using the Next method, and get the width of the current grapheme
+// using the Width method.
+func BytesGraphemes(s []byte) Graphemes[[]byte] {
+	return DefaultOptions.BytesGraphemes(s)
+}
+
+// BytesGraphemes returns an iterator over grapheme clusters for the given
+// []byte, with the given options.
+//
+// Iterate using the Next method, and get the width of the current grapheme
+// using the Width method.
+func (options Options) BytesGraphemes(s []byte) Graphemes[[]byte] {
+	g := graphemes.FromBytes(s)
+	g.AnsiEscapeSequences = options.ControlSequences
+	g.AnsiEscapeSequences8Bit = options.ControlSequences8Bit
+
+	return Graphemes[[]byte]{iter: g, options: options}
+}
@@ -0,0 +1,30 @@
+package displaywidth
+
+// Options allows you to specify the treatment of ambiguous East Asian
+// characters and ANSI escape sequences.
+type Options struct {
+	// EastAsianWidth specifies whether to treat ambiguous East Asian characters
+	// as width 1 or 2. When false (default), ambiguous East Asian characters
+	// are treated as width 1. When true, they are width 2.
+	EastAsianWidth bool
+
+	// ControlSequences specifies whether to ignore 7-bit ECMA-48 escape sequences
+	// when calculating the display width. When false (default), ANSI escape
+	// sequences are treated as just a series of characters. When true, they are
+	// treated as a single zero-width unit.
+	ControlSequences bool
+	// ControlSequences8Bit specifies whether to ignore 8-bit ECMA-48 escape sequences
+	// when calculating the display width. When false (default), these are treated
+	// as just a series of characters. When true, they are treated as a single
+	// zero-width unit.
+	ControlSequences8Bit bool
+}
+
+// DefaultOptions is the default options for the display width
+// calculation, which is EastAsianWidth false, ControlSequences false, and
+// ControlSequences8Bit false.
+var DefaultOptions = Options{
+	EastAsianWidth:       false,
+	ControlSequences:     false,
+	ControlSequences8Bit: false,
+}
@@ -0,0 +1,149 @@
+package displaywidth
+
+import (
+	"strings"
+
+	"github.com/clipperhouse/uax29/v2/graphemes"
+)
+
+// TruncateString truncates a string to the given maxWidth, and appends the
+// given tail if the string is truncated.
+//
+// It ensures the visible width, including the width of the tail, is less than or
+// equal to maxWidth.
+//
+// When [Options.ControlSequences] is true, 7-bit ANSI escape sequences that
+// appear after the truncation point are preserved in the output. This ensures
+// that escape sequences such as SGR resets are not lost, preventing color
+// bleed in terminal output.
+//
+// [Options.ControlSequences8Bit] is ignored by truncation. 8-bit C1 byte values
+// (0x80-0x9F) overlap with UTF-8 multi-byte encoding, so manipulating them
+// during truncation can shift byte boundaries and form unintended visible
+// characters. Use [Options.String] or [Options.Bytes] for 8-bit-aware width
+// measurement.
+func (options Options) TruncateString(s string, maxWidth int, tail string) string {
+	// We deliberately ignore ControlSequences8Bit for truncation, see above.
+	options.ControlSequences8Bit = false
+
+	maxWidthWithoutTail := maxWidth - options.String(tail)
+
+	var pos, total int
+	g := graphemes.FromString(s)
+	g.AnsiEscapeSequences = options.ControlSequences
+
+	for g.Next() {
+		gw := graphemeWidth(g.Value(), options)
+		if total+gw <= maxWidthWithoutTail {
+			pos = g.End()
+		}
+		total += gw
+		if total > maxWidth {
+			if options.ControlSequences {
+				// Build result with trailing 7-bit ANSI escape sequences preserved
+				var b strings.Builder
+				b.Grow(len(s) + len(tail)) // at most original + tail
+				b.WriteString(s[:pos])
+				b.WriteString(tail)
+
+				rem := graphemes.FromString(s[pos:])
+				rem.AnsiEscapeSequences = options.ControlSequences
+
+				for rem.Next() {
+					v := rem.Value()
+					// Only preserve 7-bit escapes (ESC = 0x1B) that measure
+					// as zero-width on their own; some sequences (e.g. SOS)
+					// are only valid in their original context.
+					if len(v) > 0 && v[0] == 0x1B && options.String(v) == 0 {
+						b.WriteString(v)
+					}
+				}
+				return b.String()
+			}
+			return s[:pos] + tail
+		}
+	}
+	// No truncation
+	return s
+}
+
+// TruncateString truncates a string to the given maxWidth, and appends the
+// given tail if the string is truncated.
+//
+// It ensures the total width, including the width of the tail, is less than or
+// equal to maxWidth.
+func TruncateString(s string, maxWidth int, tail string) string {
+	return DefaultOptions.TruncateString(s, maxWidth, tail)
+}
+
+// TruncateBytes truncates a []byte to the given maxWidth, and appends the
+// given tail if the []byte is truncated.
+//
+// It ensures the visible width, including the width of the tail, is less than or
+// equal to maxWidth.
+//
+// When [Options.ControlSequences] is true, 7-bit ANSI escape sequences that
+// appear after the truncation point are preserved in the output. This ensures
+// that escape sequences such as SGR resets are not lost, preventing color
+// bleed in terminal output.
+//
+// [Options.ControlSequences8Bit] is ignored by truncation. 8-bit C1 byte values
+// (0x80-0x9F) overlap with UTF-8 multi-byte encoding, so manipulating them
+// during truncation can shift byte boundaries and form unintended visible
+// characters. Use [Options.String] or [Options.Bytes] for 8-bit-aware width
+// measurement.
+func (options Options) TruncateBytes(s []byte, maxWidth int, tail []byte) []byte {
+	// We deliberately ignore ControlSequences8Bit for truncation, see above.
+	options.ControlSequences8Bit = false
+
+	maxWidthWithoutTail := maxWidth - options.Bytes(tail)
+
+	var pos, total int
+	g := graphemes.FromBytes(s)
+	g.AnsiEscapeSequences = options.ControlSequences
+
+	for g.Next() {
+		gw := graphemeWidth(g.Value(), options)
+		if total+gw <= maxWidthWithoutTail {
+			pos = g.End()
+		}
+		total += gw
+		if total > maxWidth {
+			if options.ControlSequences {
+				// Build result with trailing 7-bit ANSI escape sequences preserved
+				result := make([]byte, 0, len(s)+len(tail)) // at most original + tail
+				result = append(result, s[:pos]...)
+				result = append(result, tail...)
+
+				rem := graphemes.FromBytes(s[pos:])
+				rem.AnsiEscapeSequences = options.ControlSequences
+
+				for rem.Next() {
+					v := rem.Value()
+					// Only preserve 7-bit escapes (ESC = 0x1B) that measure
+					// as zero-width on their own; some sequences (e.g. SOS)
+					// are only valid in their original context.
+					if len(v) > 0 && v[0] == 0x1B && options.Bytes(v) == 0 {
+						result = append(result, v...)
+					}
+				}
+				return result
+			}
+			result := make([]byte, 0, pos+len(tail))
+			result = append(result, s[:pos]...)
+			result = append(result, tail...)
+			return result
+		}
+	}
+	// No truncation
+	return s
+}
+
+// TruncateBytes truncates a []byte to the given maxWidth, and appends the
+// given tail if the []byte is truncated.
+//
+// It ensures the total width, including the width of the tail, is less than or
+// equal to maxWidth.
+func TruncateBytes(s []byte, maxWidth int, tail []byte) []byte {
+	return DefaultOptions.TruncateBytes(s, maxWidth, tail)
+}
@@ -0,0 +1,239 @@
+package displaywidth
+
+import (
+	"unicode/utf8"
+
+	"github.com/clipperhouse/uax29/v2/graphemes"
+)
+
+// String calculates the display width of a string,
+// by iterating over grapheme clusters in the string
+// and summing their widths.
+func String(s string) int {
+	return DefaultOptions.String(s)
+}
+
+// String calculates the display width of a string, for the given options, by
+// iterating over grapheme clusters in the string and summing their widths.
+func (options Options) String(s string) int {
+	width := 0
+	pos := 0
+
+	for pos < len(s) {
+		// Try ASCII optimization
+		asciiLen := printableASCIILength(s[pos:])
+		if asciiLen > 0 {
+			width += asciiLen
+			pos += asciiLen
+			continue
+		}
+
+		// Not ASCII, use grapheme parsing
+		g := graphemes.FromString(s[pos:])
+		g.AnsiEscapeSequences = options.ControlSequences
+		g.AnsiEscapeSequences8Bit = options.ControlSequences8Bit
+
+		start := pos
+
+		for g.Next() {
+			v := g.Value()
+			width += graphemeWidth(v, options)
+			pos += len(v)
+
+			// Quick check: if remaining might have printable ASCII, break to outer loop
+			if pos < len(s) && s[pos] >= 0x20 && s[pos] <= 0x7E {
+				break
+			}
+		}
+
+		// Defensive, should not happen: if no progress was made,
+		// skip a byte to prevent infinite loop. Only applies if
+		// the grapheme parser misbehaves.
+		if pos == start {
+			pos++
+		}
+	}
+
+	return width
+}
+
+// Bytes calculates the display width of a []byte,
+// by iterating over grapheme clusters in the byte slice
+// and summing their widths.
+func Bytes(s []byte) int {
+	return DefaultOptions.Bytes(s)
+}
+
+// Bytes calculates the display width of a []byte, for the given options, by
+// iterating over grapheme clusters in the slice and summing their widths.
+func (options Options) Bytes(s []byte) int {
+	width := 0
+	pos := 0
+
+	for pos < len(s) {
+		// Try ASCII optimization
+		asciiLen := printableASCIILength(s[pos:])
+		if asciiLen > 0 {
+			width += asciiLen
+			pos += asciiLen
+			continue
+		}
+
+		// Not ASCII, use grapheme parsing
+		g := graphemes.FromBytes(s[pos:])
+		g.AnsiEscapeSequences = options.ControlSequences
+		g.AnsiEscapeSequences8Bit = options.ControlSequences8Bit
+
+		start := pos
+
+		for g.Next() {
+			v := g.Value()
+			width += graphemeWidth(v, options)
+			pos += len(v)
+
+			// Quick check: if remaining might have printable ASCII, break to outer loop
+			if pos < len(s) && s[pos] >= 0x20 && s[pos] <= 0x7E {
+				break
+			}
+		}
+
+		// Defensive, should not happen: if no progress was made,
+		// skip a byte to prevent infinite loop. Only applies if
+		// the grapheme parser misbehaves.
+		if pos == start {
+			pos++
+		}
+	}
+
+	return width
+}
+
+// Rune calculates the display width of a rune. You
+// should almost certainly use [String] or [Bytes] for
+// most purposes.
+//
+// The smallest unit of display width is a grapheme
+// cluster, not a rune. Iterating over runes to measure
+// width is incorrect in many cases.
+func Rune(r rune) int {
+	return DefaultOptions.Rune(r)
+}
+
+// Rune calculates the display width of a rune, for the given options.
+//
+// You should almost certainly use [String] or [Bytes] for most purposes.
+//
+// The smallest unit of display width is a grapheme cluster, not a rune.
+// Iterating over runes to measure width is incorrect in many cases.
+func (options Options) Rune(r rune) int {
+	if r < utf8.RuneSelf {
+		return asciiWidth(byte(r))
+	}
+
+	// Surrogates (U+D800-U+DFFF) are invalid UTF-8.
+	if r >= 0xD800 && r <= 0xDFFF {
+		return 0
+	}
+
+	var buf [4]byte
+	n := utf8.EncodeRune(buf[:], r)
+
+	// Skip the grapheme iterator
+	return graphemeWidth(buf[:n], options)
+}
+
+const _Default property = 0
+
+// graphemeWidth returns the display width of a grapheme cluster.
+// The passed string must be a single grapheme cluster.
+func graphemeWidth[T ~string | []byte](s T, options Options) int {
+	if len(s) == 0 {
+		return 0
+	}
+
+	// C1 controls (0x80-0x9F) are zero-width when 8-bit control sequences
+	// are enabled. This must be checked before the single-byte optimization
+	// below, which would otherwise return width 1 for these bytes.
+	if options.ControlSequences8Bit && s[0] >= 0x80 && s[0] <= 0x9F {
+		return 0
+	}
+
+	// Optimization: single-byte graphemes need no property lookup
+	if len(s) == 1 {
+		return asciiWidth(s[0])
+	}
+
+	// Multi-byte grapheme clusters led by a C0 control (0x00-0x1F)
+	if s[0] <= 0x1F {
+		return 0
+	}
+
+	p, sz := lookup(s)
+	prop := property(p)
+
+	// Variation Selector 16 (VS16) requests emoji presentation
+	if prop != _Wide && sz > 0 && len(s) >= sz+3 {
+		vs := s[sz : sz+3]
+		if isVS16(vs) {
+			prop = _Wide
+		}
+		// VS15 (0x8E) requests text presentation but does not affect width,
+		// in my reading of Unicode TR51. Falls through to return the base
+		// character's property.
+	}
+
+	if options.EastAsianWidth && prop == _East_Asian_Ambiguous {
+		prop = _Wide
+	}
+
+	if prop > upperBound {
+		prop = _Default
+	}
+
+	return propertyWidths[prop]
+}
+
+func asciiWidth(b byte) int {
+	if b <= 0x1F || b == 0x7F {
+		return 0
+	}
+	return 1
+}
+
+// printableASCIILength returns the length of consecutive printable ASCII bytes
+// starting at the beginning of s.
+func printableASCIILength[T string | []byte](s T) int {
+	i := 0
+	for ; i < len(s); i++ {
+		b := s[i]
+		// Printable ASCII is 0x20-0x7E (space through tilde)
+		if b < 0x20 || b > 0x7E {
+			break
+		}
+	}
+
+	// If the next byte is non-ASCII (>= 0x80), back off by 1. The grapheme
+	// parser may group the last ASCII byte with subsequent non-ASCII bytes,
+	// such as combining marks.
+	if i > 0 && i < len(s) && s[i] >= 0x80 {
+		i--
+	}
+
+	return i
+}
+
+// isVS16 checks if the slice matches VS16 (U+FE0F) UTF-8 encoding
+// (EF B8 8F). It assumes len(s) >= 3.
+func isVS16[T ~string | []byte](s T) bool {
+	return s[0] == 0xEF && s[1] == 0xB8 && s[2] == 0x8F
+}
+
+// propertyWidths is a jump table of sorts, instead of a switch
+var propertyWidths = [4]int{
+	_Default:              1,
+	_Zero_Width:           0,
+	_Wide:                 2,
+	_East_Asian_Ambiguous: 1,
+}
+
+const upperBound = property(len(propertyWidths) - 1)
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2020 Matt Sherman
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
@@ -0,0 +1,120 @@
+An implementation of grapheme cluster boundaries from [Unicode text segmentation](https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries) (UAX 29), for Unicode 17.
+
+[![Documentation](https://pkg.go.dev/badge/github.com/clipperhouse/uax29/v2/graphemes.svg)](https://pkg.go.dev/github.com/clipperhouse/uax29/v2/graphemes)
+![Tests](https://github.com/clipperhouse/uax29/actions/workflows/gotest.yml/badge.svg)
+![Fuzz](https://github.com/clipperhouse/uax29/actions/workflows/gofuzz.yml/badge.svg)
+
+## Quick start
+
+```
+go get github.com/clipperhouse/uax29/v2/graphemes
+```
+
+```go
+import "github.com/clipperhouse/uax29/v2/graphemes"
+
+text := "Hello, 世界. Nice dog! 👍🐶"
+g := graphemes.FromString(text)
+
+for g.Next() {                     // Next() returns true until end of data
+	fmt.Println(g.Value())         // Do something with the current grapheme
+}
+```
+
+_A grapheme is a “single visible character”, which might be a simple as a single letter, or a complex emoji that consists of several Unicode code points._
+
+## Conformance
+
+We use the Unicode [test suite](https://unicode.org/reports/tr41/tr41-36.html#Tests29).
+
+![Tests](https://github.com/clipperhouse/uax29/actions/workflows/gotest.yml/badge.svg)
+![Fuzz](https://github.com/clipperhouse/uax29/actions/workflows/gofuzz.yml/badge.svg)
+
+## APIs
+
+### If you have a `string`
+
+```go
+text := "Hello, 世界. Nice dog! 👍🐶"
+g := graphemes.FromString(text)
+
+for g.Next() {                     // Next() returns true until end of data
+	fmt.Println(g.Value())         // Do something with the current grapheme
+}
+```
+
+### If you have an `io.Reader`
+
+`FromReader` embeds a [`bufio.Scanner`](https://pkg.go.dev/bufio#Scanner), so just use those methods.
+
+```go
+r := getYourReader()                    // from a file or network maybe
+g := graphemes.FromReader(r)
+
+for g.Scan() {                         // Scan() returns true until error or EOF
+	fmt.Println(g.Text())              // Do something with the current grapheme
+}
+
+if g.Err() != nil {                    // Check the error
+	log.Fatal(g.Err())
+}
+```
+
+### If you have a `[]byte`
+
+```go
+b := []byte("Hello, 世界. Nice dog! 👍🐶")
+
+g := graphemes.FromBytes(b)
+
+for g.Next() {                     // Next() returns true until end of data
+	fmt.Println(g.Value())         // Do something with the current grapheme
+}
+```
+
+### ANSI escape sequences
+
+By the UAX 29 specification, ANSI escape sequences are not grapheme clusters. To treat 7-bit ANSI escape sequences as a single cluster, set `AnsiEscapeSequences` to true.
+
+```go
+text := "Hello, \x1b[31mworld\x1b[0m!"
+g := graphemes.FromString(text)
+g.AnsiEscapeSequences = true
+
+for g.Next() {
+	fmt.Println(g.Value())
+}
+```
+
+To also parse 8-bit C1 controls (non-UTF-8 bytes), set `AnsiEscapeSequences8Bit` to true.
+
+```go
+g.AnsiEscapeSequences = true     // 7-bit forms (ESC ...)
+g.AnsiEscapeSequences8Bit = true // 8-bit C1 forms (0x80-0x9F), not valid UTF-8
+```
+
+For ESC-initiated (7-bit) control strings, only 7-bit terminators are recognized.
+For C1-initiated (8-bit) control strings, only C1 ST (`0x9C`) is recognized as ST.
+
+We implement [ECMA-48](https://ecma-international.org/publications-and-standards/standards/ecma-48/) control codes in both 7-bit and 8-bit representations. 8-bit control codes are not UTF-8 encoded and are not valid UTF-8, caveat emptor.
+
+### Benchmarks
+
+```
+goos: darwin
+goarch: arm64
+pkg: github.com/clipperhouse/uax29/graphemes/comparative
+cpu: Apple M2
+
+BenchmarkGraphemesMixed/clipperhouse/uax29-8  	    142635 ns/op	 245.12 MB/s    0 B/op	   0 allocs/op
+BenchmarkGraphemesMixed/rivo/uniseg-8         	   2018284 ns/op	  17.32 MB/s    0 B/op	   0 allocs/op
+
+BenchmarkGraphemesASCII/clipperhouse/uax29-8  	      8846 ns/op	 508.73 MB/s    0 B/op	   0 allocs/op
+BenchmarkGraphemesASCII/rivo/uniseg-8         	    366760 ns/op	  12.27 MB/s    0 B/op	   0 allocs/op
+```
+
+### Invalid inputs
+
+Invalid UTF-8 input is considered undefined behavior. We test to ensure that bad inputs will not cause pathological outcomes, such as a panic or infinite loop. Callers should expect “garbage-in, garbage-out”.
+
+Your pipeline should probably include a call to [`utf8.Valid()`](https://pkg.go.dev/unicode/utf8#Valid).
@@ -0,0 +1,138 @@
+package graphemes
+
+// ansiEscapeLength returns the byte length of a valid 7-bit ANSI escape
+// sequence at the start of data, or 0 if none.
+//
+// Recognized forms (ECMA-48 / ISO 6429):
+//   - CSI: ESC [ then parameter bytes (0x30-0x3F), intermediate (0x20-0x2F), final (0x40-0x7E)
+//   - OSC: ESC ] then payload until BEL (0x07), 7-bit ST (ESC \), CAN (0x18), or SUB (0x1A)
+//   - DCS, SOS, PM, APC: ESC P/X/^/_ then payload until 7-bit ST (ESC \), CAN, or SUB
+//   - Two-byte: ESC + Fe/Fs (0x40-0x7E excluding above), or Fp (0x30-0x3F), or nF (0x20-0x2F then final)
+func ansiEscapeLength[T ~string | ~[]byte](data T) int {
+	n := len(data)
+	if n < 2 || data[0] != esc {
+		return 0
+	}
+
+	b1 := data[1]
+	switch b1 {
+	case '[': // CSI
+		body := csiBodyLength(data[2:])
+		if body == 0 {
+			return 0
+		}
+		return 2 + body
+	case ']': // OSC - allows BEL or 7-bit ST terminator
+		body := oscLength(data[2:])
+		if body < 0 {
+			return 0
+		}
+		return 2 + body
+	case 'P', 'X', '^', '_': // DCS, SOS, PM, APC
+		body := stSequenceLength(data[2:])
+		if body < 0 {
+			return 0
+		}
+		return 2 + body
+	}
+
+	if b1 >= 0x40 && b1 <= 0x7E {
+		// Fe/Fs two-byte; [ ] P X ^ _ handled above
+		return 2
+	}
+	if b1 >= 0x30 && b1 <= 0x3F {
+		// Fp (private) two-byte
+		return 2
+	}
+	if b1 >= 0x20 && b1 <= 0x2F {
+		// nF: intermediates then one final (0x30-0x7E)
+		i := 2
+		for i < n && data[i] >= 0x20 && data[i] <= 0x2F {
+			i++
+		}
+		if i < n && data[i] >= 0x30 && data[i] <= 0x7E {
+			return i + 1
+		}
+		return 0
+	}
+
+	return 0
+}
+
+// csiBodyLength returns the length of the CSI body (param/intermediate/final bytes).
+// data is the slice after "ESC [".
+// Per ECMA-48, the CSI body has the form:
+//
+//	parameters (0x30–0x3F)*, intermediates (0x20–0x2F)*, final (0x40–0x7E)
+//
+// Once an intermediate byte is seen, subsequent parameter bytes are invalid.
+func csiBodyLength[T ~string | ~[]byte](data T) int {
+	seenIntermediate := false
+	for i := 0; i < len(data); i++ {
+		b := data[i]
+		if b >= 0x30 && b <= 0x3F {
+			if seenIntermediate {
+				return 0
+			}
+			continue
+		}
+		if b >= 0x20 && b <= 0x2F {
+			seenIntermediate = true
+			continue
+		}
+		if b >= 0x40 && b <= 0x7E {
+			return i + 1
+		}
+		return 0
+	}
+	return 0
+}
+
+// oscLength returns the length of the OSC body.
+// data is the slice after "ESC ]".
+//
+// Returns:
+//   - n >= 0: consumed body length (includes BEL/ST terminator when present)
+//   - -1: not terminated in the provided data
+//
+// OSC accepts BEL (0x07) or 7-bit ST (ESC \) as terminators by widespread convention.
+// Per ECMA-48, CAN (0x18) and SUB (0x1A) cancel the control string; in that
+// case they are not part of the OSC sequence length.
+func oscLength[T ~string | ~[]byte](data T) int {
+	for i := 0; i < len(data); i++ {
+		b := data[i]
+		if b == bel {
+			return i + 1
+		}
+		if b == can || b == sub {
+			return i
+		}
+		if b == esc && i+1 < len(data) && data[i+1] == '\\' {
+			return i + 2
+		}
+	}
+	return -1
+}
+
+// stSequenceLength returns the length of a control-string body.
+// data is the slice after "ESC x".
+//
+// Returns:
+//   - n >= 0: consumed body length (includes ST terminator when present)
+//   - -1: not terminated in the provided data
+//
+// Used for DCS, SOS, PM, and APC, which per ECMA-48 terminate with ST.
+// ST here is the 7-bit form (ESC \).
+// CAN (0x18) and SUB (0x1A) cancel the control string; in that case they are
+// not part of the sequence length.
+func stSequenceLength[T ~string | ~[]byte](data T) int {
+	for i := 0; i < len(data); i++ {
+		if data[i] == can || data[i] == sub {
+			return i
+		}
+		if data[i] == esc && i+1 < len(data) && data[i+1] == '\\' {
+			return i + 2
+		}
+	}
+	return -1
+}
@@ -0,0 +1,79 @@
+package graphemes
+
+// ansiEscapeLength8Bit returns the byte length of a valid 8-bit C1 ANSI
+// sequence at the start of data, or 0 if none.
+//
+// Recognized forms (ECMA-48 / ISO 6429):
+//   - C1 CSI (0x9B) body as parameter/intermediate/final bytes
+//   - C1 OSC (0x9D) body terminated by BEL, C1 ST, CAN, or SUB
+//   - C1 DCS/SOS/PM/APC (0x90/0x98/0x9E/0x9F) body terminated by C1 ST, CAN, or SUB
+//   - Standalone C1 controls (0x80..0x9F not listed above): single byte
+func ansiEscapeLength8Bit[T ~string | ~[]byte](data T) int {
+	if len(data) == 0 {
+		return 0
+	}
+
+	switch data[0] {
+	case 0x9B: // C1 CSI
+		body := csiBodyLength(data[1:])
+		if body == 0 {
+			return 0
+		}
+		return 1 + body
+	case 0x9D: // C1 OSC
+		body := oscLengthC1(data[1:])
+		if body < 0 {
+			return 0
+		}
+		return 1 + body
+	case 0x90, 0x98, 0x9E, 0x9F: // C1 DCS, SOS, PM, APC
+		body := stSequenceLengthC1(data[1:])
+		if body < 0 {
+			return 0
+		}
+		return 1 + body
+	default:
+		if data[0] >= 0x80 && data[0] <= 0x9F {
+			return 1
+		}
+	}
+
+	return 0
+}
+
+// oscLengthC1 returns the length of a C1 OSC body.
+// data is the slice after the C1 OSC initiator (0x9D).
+//
+// Returns:
+//   - n >= 0: consumed body length (includes BEL/ST terminator when present)
+//   - -1: not terminated in the provided data
+//
+// Terminators: BEL (0x07) or C1 ST (0x9C).
+// CAN (0x18) and SUB (0x1A) cancel the control string.
+func oscLengthC1[T ~string | ~[]byte](data T) int {
+	for i := 0; i < len(data); i++ {
+		b := data[i]
+		if b == bel || b == st {
+			return i + 1
+		}
+		if b == can || b == sub {
+			return i
+		}
+	}
+	return -1
+}
+
+// stSequenceLengthC1 parses DCS/SOS/PM/APC bodies that terminate with C1 ST
+// (0x9C), or are canceled by CAN/SUB.
+func stSequenceLengthC1[T ~string | ~[]byte](data T) int {
+	for i := 0; i < len(data); i++ {
+		b := data[i]
+		if b == can || b == sub {
+			return i
+		}
+		if b == st {
+			return i + 1
+		}
+	}
+	return -1
+}
@@ -0,0 +1,144 @@
+package graphemes
+
+import "unicode/utf8"
+
+// FromString returns an iterator for the grapheme clusters in the input string.
+// Iterate while Next() is true, and access the grapheme via Value().
+func FromString(s string) *Iterator[string] {
+	return &Iterator[string]{
+		split: splitFuncString,
+		data:  s,
+	}
+}
+
+// FromBytes returns an iterator for the grapheme clusters in the input bytes.
+// Iterate while Next() is true, and access the grapheme via Value().
+func FromBytes(b []byte) *Iterator[[]byte] {
+	return &Iterator[[]byte]{
+		split: splitFuncBytes,
+		data:  b,
+	}
+}
+
+// Iterator is a generic iterator for grapheme clusters in strings or byte slices,
+// with an ASCII hot path optimization.
+type Iterator[T ~string | ~[]byte] struct {
+	split func(T, bool) (int, T, error)
+	data  T
+	pos   int
+	start int
+	// AnsiEscapeSequences treats 7-bit ANSI escape sequences (ECMA-48) as
+	// single grapheme clusters when true. The default is false.
+	//
+	// 8-bit controls are not enabled by this option. See [AnsiEscapeSequences8Bit].
+	AnsiEscapeSequences bool
+	// AnsiEscapeSequences8Bit treats 8-bit C1 ANSI escape sequences (ECMA-48) as single
+	// grapheme clusters when true. The default is false.
+	//
+	// 8-bit control bytes are not UTF-8 encoded, i.e. not valid UTF-8. If you
+	// choose this option, you are choosing to interpret non-UTF-8 data, caveat
+	// emptor.
+	AnsiEscapeSequences8Bit bool
+}
+
+var (
+	splitFuncString = splitFunc[string]
+	splitFuncBytes  = splitFunc[[]byte]
+)
+
+const (
+	esc = 0x1B
+	cr  = 0x0D
+	bel = 0x07
+	can = 0x18
+	sub = 0x1A
+	st  = 0x9C
+)
+
+// Next advances the iterator to the next grapheme cluster.
+// Returns false when there are no more grapheme clusters.
+func (iter *Iterator[T]) Next() bool {
+	if iter.pos >= len(iter.data) {
+		return false
+	}
+	iter.start = iter.pos
+
+	b := iter.data[iter.pos]
+	if iter.AnsiEscapeSequences && b == esc {
+		if a := ansiEscapeLength(iter.data[iter.pos:]); a > 0 {
+			iter.pos += a
+			return true
+		}
+	}
+	if iter.AnsiEscapeSequences8Bit && b >= 0x80 && b <= 0x9F {
+		if a := ansiEscapeLength8Bit(iter.data[iter.pos:]); a > 0 {
+			iter.pos += a
+			return true
+		}
+	}
+
+	// ASCII hot path: any ASCII is one grapheme when next byte is ASCII or end.
+	if b < utf8.RuneSelf && b != cr {
+		if iter.pos+1 >= len(iter.data) || iter.data[iter.pos+1] < utf8.RuneSelf {
+			iter.pos++
+			return true
+		}
+	}
+
+	// Fall back to UAX29 grapheme parsing
+	remaining := iter.data[iter.pos:]
+	advance, _, err := iter.split(remaining, true)
+	if err != nil {
+		panic(err)
+	}
+	if advance <= 0 {
+		panic("splitFunc returned a zero or negative advance")
+	}
+	iter.pos += advance
+	if iter.pos > len(iter.data) {
+		panic("splitFunc advanced beyond end of data")
+	}
+	return true
+}
+
+// Value returns the current grapheme cluster.
+func (iter *Iterator[T]) Value() T {
+	return iter.data[iter.start:iter.pos]
+}
+
+// Start returns the byte position of the current grapheme in the original data.
+func (iter *Iterator[T]) Start() int {
+	return iter.start
+}
+
+// End returns the byte position after the current grapheme in the original data.
+func (iter *Iterator[T]) End() int {
+	return iter.pos
+}
+
+// Reset resets the iterator to the beginning of the data.
+func (iter *Iterator[T]) Reset() {
+	iter.start = 0
+	iter.pos = 0
+}
+
+// SetText sets the data for the iterator to operate on, and resets all state.
+func (iter *Iterator[T]) SetText(data T) {
+	iter.data = data
+	iter.start = 0
+	iter.pos = 0
+}
+
+// First returns the first grapheme cluster without advancing the iterator.
+func (iter *Iterator[T]) First() T {
+	if len(iter.data) == 0 {
+		return iter.data
+	}
+
+	// Use a copy to leverage Next()'s ASCII optimization
+	cp := *iter
+	cp.pos = 0
+	cp.start = 0
+	cp.Next()
+	return cp.Value()
+}
@@ -0,0 +1,25 @@
+// Package graphemes implements Unicode grapheme cluster boundaries: https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries
+package graphemes
+
+import (
+	"bufio"
+	"io"
+)
+
+type Scanner struct {
+	*bufio.Scanner
+}
+
+// FromReader returns a Scanner, to split graphemes per
+// https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries.
+//
+// It embeds a [bufio.Scanner], so you can use its methods.
+//
+// Iterate through graphemes by calling Scan() until false, then check Err().
+func FromReader(r io.Reader) *Scanner {
+	sc := bufio.NewScanner(r)
+	sc.Split(SplitFunc)
+	return &Scanner{
+		Scanner: sc,
+	}
+}
@@ -0,0 +1,205 @@
+package graphemes
+
+import (
+	"bufio"
+)
+
+// is determines if lookup intersects propert(ies)
+func (lookup property) is(properties property) bool {
+	return (lookup & properties) != 0
+}
+
+const _Ignore = _Extend
+
+// incbState tracks state for GB9c rule (Indic conjunct clusters)
+// Pattern: Consonant (Extend|Linker)* Linker (Extend|Linker)* × Consonant
+type incbState int
+
+const (
+	incbNone      incbState = iota // initial/reset
+	incbConsonant                  // seen Consonant, awaiting Linker
+	incbLinker                     // seen Consonant and Linker (conjunct ready)
+)
+
+// SplitFunc is a bufio.SplitFunc implementation of Unicode grapheme cluster segmentation, for use with bufio.Scanner.
+//
+// See https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries.
+var SplitFunc bufio.SplitFunc = splitFunc[[]byte]
+
+func splitFunc[T ~string | ~[]byte](data T, atEOF bool) (advance int, token T, err error) {
+	var empty T
+	if len(data) == 0 {
+		return 0, empty, nil
+	}
+
+	// These vars are stateful across loop iterations
+	var pos int
+	var lastExIgnore property = 0     // "last excluding ignored categories"
+	var lastLastExIgnore property = 0 // "last one before that"
+	var regionalIndicatorCount int
+
+	// GB9c state: tracking Indic conjunct clusters
+	var incb incbState
+
+	// Rules are usually of the form Cat1 × Cat2; "current" refers to the first property
+	// to the right of the ×, from which we look back or forward
+
+	current, w := lookup(data[pos:])
+	if w == 0 {
+		if !atEOF {
+			// Rune extends past current data, request more
+			return 0, empty, nil
+		}
+		pos = len(data)
+		return pos, data[:pos], nil
+	}
+
+	// https://unicode.org/reports/tr29/#GB1
+	// Start of text always advances
+	pos += w
+
+	for {
+		eot := pos == len(data) // "end of text"
+
+		if eot {
+			if !atEOF {
+				// Token extends past current data, request more
+				return 0, empty, nil
+			}
+
+			// https://unicode.org/reports/tr29/#GB2
+			break
+		}
+
+		/*
+			We've switched the evaluation order of GB1↓ and GB2↑. It's ok:
+			because we've checked for len(data) at the top of this function,
+			sot and eot are mutually exclusive, order doesn't matter.
+		*/
+
+		// Rules are usually of the form Cat1 × Cat2; "current" refers to the first property
+		// to the right of the ×, from which we look back or forward
+
+		// Remember previous properties to avoid lookups/lookbacks
+		last := current
+		if !last.is(_Ignore) {
+			lastLastExIgnore = lastExIgnore
+			lastExIgnore = last
+		}
+
+		// Update GB9c state based on what we just advanced past
+		if last.is(_InCBConsonant | _InCBLinker | _InCBExtend) {
+			switch {
+			case last.is(_InCBConsonant):
+				if incb != incbLinker {
+					incb = incbConsonant
+				}
+			case last.is(_InCBLinker):
+				if incb >= incbConsonant {
+					incb = incbLinker
+				}
+				// case last.is(_InCBExtend): stay in current state
+			}
+		} else {
+			incb = incbNone
+		}
+
+		current, w = lookup(data[pos:])
+		if w == 0 {
+			if atEOF {
+				// Just return the bytes, we can't do anything with them
+				pos = len(data)
+				break
+			}
+			// Rune extends past current data, request more
+			return 0, empty, nil
+		}
+
+		// Optimization: no rule can possibly apply
+		if current|last == 0 { // i.e. both are zero
+			break
+		}
+
+		// https://unicode.org/reports/tr29/#GB3
+		if current.is(_LF) && last.is(_CR) {
+			pos += w
+			continue
+		}
+
+		// https://unicode.org/reports/tr29/#GB4
+		// https://unicode.org/reports/tr29/#GB5
+		if (current | last).is(_Control | _CR | _LF) {
+			break
+		}
+
+		// https://unicode.org/reports/tr29/#GB6
+		if current.is(_L|_V|_LV|_LVT) && last.is(_L) {
+			pos += w
+			continue
+		}
+
+		// https://unicode.org/reports/tr29/#GB7
+		if current.is(_V|_T) && last.is(_LV|_V) {
+			pos += w
+			continue
+		}
+
+		// https://unicode.org/reports/tr29/#GB8
+		if current.is(_T) && last.is(_LVT|_T) {
+			pos += w
+			continue
+		}
+
+		// https://unicode.org/reports/tr29/#GB9
+		if current.is(_Extend | _ZWJ) {
+			pos += w
+			continue
+		}
+
+		// https://unicode.org/reports/tr29/#GB9a
+		if current.is(_SpacingMark) {
+			pos += w
+			continue
+		}
+
+		// https://unicode.org/reports/tr29/#GB9b
+		if last.is(_Prepend) {
+			pos += w
+			continue
+		}
+
+		// https://unicode.org/reports/tr29/#GB9c
+		// Do not break within certain combinations with Indic_Conjunct_Break (InCB)=Linker.
+		if incb == incbLinker && current.is(_InCBConsonant) {
+			// After matching the pattern, reset state to start tracking a new pattern
+			// The current Consonant becomes the start of the new pattern
+			incb = incbConsonant
+			pos += w
+			continue
+		}
+
+		// https://unicode.org/reports/tr29/#GB11
+		if current.is(_ExtendedPictographic) && last.is(_ZWJ) && lastLastExIgnore.is(_ExtendedPictographic) {
+			pos += w
+			continue
+		}
+
+		// https://unicode.org/reports/tr29/#GB12
+		// https://unicode.org/reports/tr29/#GB13
+		if (current & last).is(_RegionalIndicator) {
+			regionalIndicatorCount++
+
+			odd := regionalIndicatorCount%2 == 1
+			if odd {
+				pos += w
+				continue
+			}
+		}
+
+		// If we fall through all the above rules, it's a grapheme cluster break
+		break
+	}
+
+	// Return token
+	return pos, data[:pos], nil
+}