Strings and Runes
Go strings are immutable byte slices encoded in UTF-8. A rune is a Unicode code point (int32) - the distinction matters on any non-ASCII input.
In Go, a string is an immutable sequence of bytes - not characters. Most Go strings happen to be valid UTF-8, but the language does not enforce that. A rune is an alias for int32 and represents a single Unicode code point.
len(s) returns the number of bytes, not characters. To count Unicode code points (runes), convert to []rune first or use the utf8 package.
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
s := "Hello, 世界"
fmt.Println(len(s)) // 13 - bytes
fmt.Println(utf8.RuneCountInString(s)) // 9 - runes (code points)
fmt.Println(len([]rune(s))) // 9 - same, via conversion
}Index a string to get a byte. Range over a string to get rune values at each Unicode code point. Slicing a string produces bytes - slicing in the middle of a multi-byte rune produces invalid UTF-8.
package main
import "fmt"
func main() {
s := "café"
// Indexing gives bytes - the 'é' spans bytes 3 and 4
fmt.Printf("%x\n", s[3]) // c3 - first byte of 'é'
// Range gives runes
for i, r := range s {
fmt.Printf("byte %d: %c (%d)\n", i, r, r)
}
// byte 0: c (99)
// byte 1: a (97)
// byte 2: f (102)
// byte 3: é (233)
}Convert between strings, byte slices, and rune slices explicitly. Each conversion copies the data. Use strings.Builder for efficient string construction in a loop - it avoids the repeated allocations of the + operator.
package main
import (
"fmt"
"strings"
)
func main() {
s := "hello"
// String <-> []byte (copy)
b := []byte(s)
b[0] = 'H'
fmt.Println(string(b)) // Hello
fmt.Println(s) // hello - original unchanged
// Efficient concatenation with strings.Builder
var sb strings.Builder
for i := 0; i < 5; i++ {
fmt.Fprintf(&sb, "item%d ", i)
}
fmt.Println(sb.String()) // item0 item1 item2 item3 item4
}A rune literal is a single Unicode code point in single quotes. The utf8 package provides utilities for validating and decoding UTF-8 without converting to a rune slice.
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
r := 'é'
fmt.Println(r) // 233 - the Unicode code point
fmt.Printf("%c\n", r) // é
// Decode the first rune from a byte slice
s := "café"
r2, size := utf8.DecodeRuneInString(s)
fmt.Printf("first rune: %c, byte size: %d\n", r2, size)
// first rune: c, byte size: 1
// Validate UTF-8
fmt.Println(utf8.ValidString(s)) // true
fmt.Println(utf8.ValidString("\xff")) // false
}In production
len(s) returns bytes, not characters - a user-visible "character" may be a multi-rune grapheme cluster (a flag emoji is two runes, a family emoji can be seven). For user-facing string operations such as truncating at N display characters or counting words in a tweet, the stdlib strings package works at the byte and rune level and will silently produce wrong counts or broken output on international text. Use golang.org/x/text for locale-aware and grapheme-cluster-aware operations. The conversion []rune(s) is correct for most Latin + CJK work but will still count multi-codepoint graphemes incorrectly. Know your input domain before choosing the right level of abstraction.
Enjoyed this? Get more essays on software craft delivered to your inbox.
Subscribe free