The Go Blog

Allocating on the Stack

Keith Randall
27 February 2026

The Go team continuously seeks performance improvements. Recent releases have targeted a specific bottleneck: heap allocations. Every heap allocation triggers substantial code execution and adds burden to the garbage collector. Despite advances like Green Tea, GC overhead remains significant.

The solution? Shift more allocations to the stack. Stack allocations cost less—sometimes nothing at all. They bypass the garbage collector entirely, getting cleaned up automatically when their stack frame ends. This approach also improves cache locality through immediate memory reuse.

Stack allocation of constant-sized slices

Take this common pattern for collecting tasks:

func process(c chan task) {
var tasks []task
for t := range c {
tasks = append(tasks, t)
}
processAll(tasks)
}

Here's what happens during execution. The first append allocates a backing store of size 1. The second iteration finds that store full and allocates size 2, discarding the original. The third iteration allocates size 4, discarding size 2. The fourth iteration finally reuses existing space. The fifth iteration allocates size 8.

This doubling strategy eventually reduces allocation frequency, but the startup phase generates considerable overhead and garbage. For programs where slices stay small, this startup cost dominates.

A manual optimization might pre-allocate capacity:

func process2(c chan task) {
tasks := make([]task, 0, 10) // probably at most 10 tasks
for t := range c {
tasks = append(tasks, t)
}
processAll(tasks)
}

This remains correct regardless of the guess. Too small means append handles overflow; too large wastes memory. With an accurate guess of 10 elements, you'd expect one allocation from make.

Benchmarking reveals something unexpected: zero allocations. The compiler places the backing store on the stack because it knows the required size (10 tasks). This works only when the backing store doesn't escape to the heap within processAll.

Stack allocation of variable-sized slices

Hard-coded sizes lack flexibility. What about parameterizing the estimate?

func process3(c chan task, lengthGuess int) {
tasks := make([]task, 0, lengthGuess)
for t := range c {
tasks = append(tasks, t)
}
processAll(tasks)
}

This allows callers to provide context-appropriate sizes. Unfortunately, Go 1.24 can't stack-allocate variable-sized backing stores, forcing heap allocation. Still better than repeated append allocations, but not ideal.

Go 1.25 changes this. You might consider splitting the logic:

func process4(c chan task, lengthGuess int) {
var tasks []task
if lengthGuess <= 10 {
tasks = make([]task, 0, 10)
} else {
tasks = make([]task, 0, lengthGuess)
}
for t := range c {
tasks = append(tasks, t)
}
processAll(tasks)
}

This works but feels clunky. Go 1.25 automates this pattern. The compiler allocates a small (32-byte) backing store on the stack and uses it when the requested size fits. Otherwise, it falls back to heap allocation.

In Go 1.25, process3 achieves zero heap allocations when lengthGuess produces a slice fitting within 32 bytes—assuming the guess accurately reflects the channel contents.

Upgrading to the latest Go release delivers surprising performance gains without code changes.

Stack allocation of append-allocated slices

But modifying APIs to accept length hints isn't always desirable. Go 1.26 offers another option.

func process(c chan task) {
var tasks []task
for t := range c {
tasks = append(tasks, t)
}
processAll(tasks)
}

Go 1.26 allocates the same small stack buffer but applies it directly at append sites. The first iteration uses a stack-allocated backing store (say, capacity 4). The next three iterations append without allocation. The fifth iteration requires heap allocation, but you've avoided the size 1, 2, and 4 heap allocations and their associated garbage. Small slices may never touch the heap.

Stack allocation of append-allocated escaping slices

What about returned slices? They can't live on the stack since the frame disappears at return.

func extract(c chan task) []task {
var tasks []task
for t := range c {
tasks = append(tasks, t)
}
return tasks
}

The returned slice requires heap allocation, but what about intermediate slices that become garbage? You could manually optimize:

func extract2(c chan task) []task {
var tasks []task
for t := range c {
tasks = append(tasks, t)
}
tasks2 := make([]task, len(tasks))
copy(tasks2, tasks)
return tasks2
}

Now tasks doesn't escape, enabling stack optimizations. At the end, one heap allocation of the exact size copies the data for return. But this adds error-prone boilerplate.

Go 1.26 automates this transformation, converting extract to something like:

func extract3(c chan task) []task {
var tasks []task
for t := range c {
tasks = append(tasks, t)
}
tasks = runtime.move2heap(tasks)
return tasks
}

runtime.move2heap is a compiler-runtime function that returns heap slices unchanged but copies stack slices to the heap. For the original extract, if items fit in the stack buffer, you get exactly one allocation of the correct size. If they exceed capacity, normal doubling occurs after the stack buffer fills.

This actually outperforms manual optimization because it only copies when necessary—when the slice remained stack-backed until return. The manual version always copies. The copy cost is offset by eliminated startup-phase copies, with at most one additional element copied compared to the old approach.

Wrapping up

Manual optimization remains valuable when you have good size estimates. But the compiler now handles many simple cases automatically, letting you focus on what truly matters.

These optimizations involve careful compiler work. If you suspect they're causing correctness or performance problems, disable them with -gcflags=all=-d=variablemakehash=n. If disabling helps, please file an issue for investigation.