Go by Example

Strings and Runes

Go strings are immutable byte slices encoded in UTF-8. A rune is a Unicode code point (int32) - the distinction matters on any non-ASCII input.

In Go, a string is an immutable sequence of bytes - not characters. Most Go strings happen to be valid UTF-8, but the language does not enforce that. A rune is an alias for int32 and represents a single Unicode code point.

len(s) returns the number of bytes, not characters. To count Unicode code points (runes), convert to []rune first or use the utf8 package.

package main
 
import (
    "fmt"
    "unicode/utf8"
)
 
func main() {
    s := "Hello, 世界"
 
    fmt.Println(len(s))                    // 13 - bytes
    fmt.Println(utf8.RuneCountInString(s)) // 9  - runes (code points)
    fmt.Println(len([]rune(s)))            // 9  - same, via conversion
}

Index a string to get a byte. Range over a string to get rune values at each Unicode code point. Slicing a string produces bytes - slicing in the middle of a multi-byte rune produces invalid UTF-8.

package main
 
import "fmt"
 
func main() {
    s := "café"
 
    // Indexing gives bytes - the 'é' spans bytes 3 and 4
    fmt.Printf("%x\n", s[3]) // c3 - first byte of 'é'
 
    // Range gives runes
    for i, r := range s {
        fmt.Printf("byte %d: %c (%d)\n", i, r, r)
    }
    // byte 0: c (99)
    // byte 1: a (97)
    // byte 2: f (102)
    // byte 3: é (233)
}

Convert between strings, byte slices, and rune slices explicitly. Each conversion copies the data. Use strings.Builder for efficient string construction in a loop - it avoids the repeated allocations of the + operator.

package main
 
import (
    "fmt"
    "strings"
)
 
func main() {
    s := "hello"
 
    // String <-> []byte (copy)
    b := []byte(s)
    b[0] = 'H'
    fmt.Println(string(b)) // Hello
    fmt.Println(s)         // hello - original unchanged
 
    // Efficient concatenation with strings.Builder
    var sb strings.Builder
    for i := 0; i < 5; i++ {
        fmt.Fprintf(&sb, "item%d ", i)
    }
    fmt.Println(sb.String()) // item0 item1 item2 item3 item4
}

A rune literal is a single Unicode code point in single quotes. The utf8 package provides utilities for validating and decoding UTF-8 without converting to a rune slice.

package main
 
import (
    "fmt"
    "unicode/utf8"
)
 
func main() {
    r := 'é'
    fmt.Println(r)          // 233 - the Unicode code point
    fmt.Printf("%c\n", r)   // é
 
    // Decode the first rune from a byte slice
    s := "café"
    r2, size := utf8.DecodeRuneInString(s)
    fmt.Printf("first rune: %c, byte size: %d\n", r2, size)
    // first rune: c, byte size: 1
 
    // Validate UTF-8
    fmt.Println(utf8.ValidString(s))      // true
    fmt.Println(utf8.ValidString("\xff")) // false
}

In production

len(s) returns bytes, not characters - a user-visible "character" may be a multi-rune grapheme cluster (a flag emoji is two runes, a family emoji can be seven). For user-facing string operations such as truncating at N display characters or counting words in a tweet, the stdlib strings package works at the byte and rune level and will silently produce wrong counts or broken output on international text. Use golang.org/x/text for locale-aware and grapheme-cluster-aware operations. The conversion []rune(s) is correct for most Latin + CJK work but will still count multi-codepoint graphemes incorrectly. Know your input domain before choosing the right level of abstraction.

Enjoyed this? Get more essays on software craft delivered to your inbox.

Subscribe free