Goroutine Leaks in Go and How to Prevent Them
Learn how goroutine leaks happen in Go services and how to prevent them with context cancellation, buffered channels, and clear goroutine ownership.
A goroutine leak happens when a goroutine is started, but never gets a realistic path to finish.
That sounds small because goroutines are cheap. But cheap is not free. A leaked goroutine keeps its stack, references to heap objects, timers, network calls, channel waits, and sometimes file descriptors or database connections alive for longer than intended. One leaked goroutine is usually invisible. One leaked goroutine per request is a production incident waiting patiently.
This article starts with a realistic bug, then builds toward the production-grade habits that prevent it.
The mental model
A goroutine should have an owner.
The owner is the code that starts it, and the owner must know:
- when the goroutine should stop
- how it will be told to stop
- how the caller waits for it, ignores it safely, or lets it run as a deliberate background worker
If you cannot answer those three questions, the goroutine is probably under-designed.
A realistic leak
Imagine a checkout service. For each request, it calls a risk service to decide whether the order should be accepted.
The handler wants to return quickly if the client disconnects or the request times out, so it starts the risk check in a goroutine and waits on either the risk result or the request context.
This version looks reasonable at first glance:
package checkout
import (
"context"
"encoding/json"
"net/http"
)
type RiskClient interface {
Check(ctx context.Context, order Order) (RiskResult, error)
}
type Server struct {
risk RiskClient
}
type Order struct {
ID string
UserID string
Total int64
}
type RiskResult struct {
Approved bool
Reason string
}
type riskResponse struct {
result RiskResult
err error
}
func (s *Server) Checkout(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()
order := parseOrder(r)
riskCh := make(chan riskResponse)
go func() {
result, err := s.risk.Check(context.Background(), order)
riskCh <- riskResponse{result: result, err: err}
}()
select {
case res := <-riskCh:
if res.err != nil {
http.Error(w, "risk check failed", http.StatusBadGateway)
return
}
if !res.result.Approved {
http.Error(w, res.result.Reason, http.StatusForbidden)
return
}
json.NewEncoder(w).Encode(map[string]string{"status": "accepted"})
case <-ctx.Done():
http.Error(w, "request cancelled", http.StatusGatewayTimeout)
}
}
The leak is here:
riskCh := make(chan riskResponse)
go func() {
result, err := s.risk.Check(context.Background(), order)
riskCh <- riskResponse{result: result, err: err}
}()
There are two problems.
First, the goroutine uses context.Background(), so it ignores the request cancellation. If the client goes away, the risk check keeps running.
Second, riskCh is unbuffered. A send on an unbuffered channel waits until another goroutine receives. If the handler returns through case <-ctx.Done(), there is no receiver left. When the risk check eventually finishes, the child goroutine blocks forever on:
riskCh <- riskResponse{result: result, err: err}
That goroutine is now leaked.
Why this hurts in production
The bug does not look dramatic in a local test because it only leaks on the timeout path. In production, timeout paths are not rare:
- clients disconnect
- load balancers cancel slow requests
- upstream services become slow during deploys
- mobile networks drop connections
- retrying clients create bursts of abandoned work
If each abandoned checkout leaks one goroutine, the service may run fine for hours and then slowly become unhealthy. Memory rises. The scheduler has more goroutines to track. Profiles show thousands of goroutines parked at the same channel send. The original cause may be hidden behind the later symptom: high memory, slow garbage collection, or container restarts.
That is why goroutine leaks are often found with runtime evidence, not by reading the final crash message.
The recommended solution
For request-scoped goroutines, use three rules:
- Pass the caller’s
context.Contextinto the goroutine’s work. - Give the goroutine a way to finish even if the caller stops waiting.
- Bound the number of goroutines if the operation can be triggered many times.
Here is a safer version of the checkout handler:
func (s *Server) Checkout(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()
order := parseOrder(r)
riskCh := make(chan riskResponse, 1)
go func() {
result, err := s.risk.Check(ctx, order)
select {
case riskCh <- riskResponse{result: result, err: err}:
case <-ctx.Done():
}
}()
select {
case res := <-riskCh:
if res.err != nil {
http.Error(w, "risk check failed", http.StatusBadGateway)
return
}
if !res.result.Approved {
http.Error(w, res.result.Reason, http.StatusForbidden)
return
}
json.NewEncoder(w).Encode(map[string]string{"status": "accepted"})
case <-ctx.Done():
http.Error(w, "request cancelled", http.StatusGatewayTimeout)
}
}
The changes are small but important.
riskCh := make(chan riskResponse, 1)
The channel has capacity for one result. If the handler times out right before the child sends, the send can still complete and the goroutine can exit. This is useful when there is exactly one child result and the parent may stop waiting.
result, err := s.risk.Check(ctx, order)
The downstream call receives the request context. This only works if RiskClient.Check respects context cancellation, but that is the contract you want. HTTP clients, database calls, gRPC clients, and most serious Go libraries expose context-aware APIs.
select {
case riskCh <- riskResponse{result: result, err: err}:
case <-ctx.Done():
}
The child goroutine does not insist on sending after the request is gone. It either reports the result or exits when cancellation wins.
For this exact shape, the buffered channel alone is often enough. I still like the select because it documents the ownership rule: this goroutine is request-scoped and should not outlive the request once it can observe cancellation.
An even better option: avoid the goroutine
Before adding any goroutine, ask whether you need one.
If checkout cannot continue without the risk result, this is simpler:
func (s *Server) Checkout(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()
order := parseOrder(r)
result, err := s.risk.Check(ctx, order)
if err != nil {
http.Error(w, "risk check failed", http.StatusBadGateway)
return
}
if !result.Approved {
http.Error(w, result.Reason, http.StatusForbidden)
return
}
json.NewEncoder(w).Encode(map[string]string{"status": "accepted"})
}
This is the best solution when the work is not actually concurrent. The request already runs in its own goroutine inside net/http. Starting a second goroutine just to wait for it immediately is usually accidental complexity.
Use a goroutine when you are doing real overlap:
- querying risk and inventory at the same time
- streaming work in the background after the response
- running a worker loop owned by process lifecycle
- fan-out to multiple replicas and using the first successful answer
If none of those are true, stay synchronous.
Handling multiple concurrent calls
Advanced bugs often appear when one request fans out to many goroutines.
Suppose checkout asks three independent systems: risk, inventory, and promotions. This is a reasonable use of concurrency, but you still want cancellation and waiting to be structured.
The errgroup package is a good fit:
import "golang.org/x/sync/errgroup"
func (s *Server) validateOrder(ctx context.Context, order Order) error {
g, ctx := errgroup.WithContext(ctx)
g.SetLimit(3)
g.Go(func() error {
_, err := s.risk.Check(ctx, order)
return err
})
g.Go(func() error {
return s.inventory.Reserve(ctx, order)
})
g.Go(func() error {
return s.promotions.Apply(ctx, order)
})
return g.Wait()
}
errgroup.WithContext gives the group a derived context. If one function returns an error, the context is canceled and the other functions get a signal to stop. g.Wait() makes ownership explicit: the caller waits until all started functions have returned.
SetLimit matters when the fan-out count can grow. If a request can start one goroutine per cart item, one goroutine per tenant, or one goroutine per downstream shard, you should usually put a ceiling on it.
Common leak patterns
Sending with no receiver
This is the checkout bug:
ch := make(chan Result)
go func() {
ch <- slowWork()
}()
select {
case result := <-ch:
use(result)
case <-ctx.Done():
return ctx.Err()
}
If ctx.Done() wins, the child may block forever trying to send.
Use a buffered channel of size one, cancellation-aware send, or structured waiting with errgroup.
Receiving from a channel nobody will close
This worker never exits if jobs is never closed and no cancellation path exists:
go func() {
for job := range jobs {
process(job)
}
}()
A long-lived worker is fine, but it needs a lifecycle owner:
go func() {
for {
select {
case job, ok := <-jobs:
if !ok {
return
}
process(job)
case <-ctx.Done():
return
}
}
}()
Tickers that are never stopped
This leaks both the goroutine and the ticker:
go func() {
ticker := time.NewTicker(time.Minute)
for range ticker.C {
refreshCache()
}
}()
Prefer:
go func() {
ticker := time.NewTicker(time.Minute)
defer ticker.Stop()
for {
select {
case <-ticker.C:
refreshCache()
case <-ctx.Done():
return
}
}
}()
Background work with no shutdown
Some goroutines are supposed to live longer than one request: queue consumers, metrics loops, cache refreshers, subscription readers.
That is fine. They still need a process-level owner. In a server, that usually means a root context that is canceled during shutdown and a WaitGroup or errgroup that confirms the workers exited before the process stops.
How to find goroutine leaks
For beginners, the first clue is usually a goroutine count that only moves up.
You can expose Go’s pprof handlers in internal environments:
import _ "net/http/pprof"
Then inspect goroutine stacks:
curl http://localhost:6060/debug/pprof/goroutine?debug=2
What you are looking for is not just “many goroutines”. Some services naturally have many goroutines. The suspicious pattern is many goroutines blocked at the same source line:
goroutine 48291 [chan send]:
checkout.(*Server).Checkout.func1(...)
/app/checkout/handler.go:42
For regular metrics, track the goroutine count. In modern Go, you can read it through runtime metrics:
samples := []metrics.Sample{{Name: "/sched/goroutines:goroutines"}}
metrics.Read(samples)
In tests, Uber’s goleak package is useful for catching goroutines that remain after a test finishes:
func TestCheckoutDoesNotLeakOnTimeout(t *testing.T) {
defer goleak.VerifyNone(t)
// run the timeout path here
}
Leak tests are especially valuable around code that uses channels, timers, retries, streaming APIs, and background workers.
A practical checklist
When reviewing Go code that starts goroutines, ask these questions:
- Who owns this goroutine?
- What makes it return?
- Does it observe
context.Contextor another shutdown signal? - Can it block forever on a channel send or receive?
- If the caller returns early, what happens to the child goroutine?
- If this runs once per request, is there a concurrency limit?
- Is there a test for the timeout, cancellation, and error paths?
The advanced version is the same checklist with less optimism. Every select branch, early return, timeout, retry, and channel close is part of the lifecycle contract.
Final thought
Goroutine leaks are not a Go beginner problem. They happen because Go makes concurrency easy to start, while lifecycle ownership still has to be designed.
The recommended default is simple:
Do not start a goroutine unless you know who owns it, how it stops, and how the rest of the program waits for it or safely stops caring.
For request-scoped work, prefer synchronous code first. When concurrency is useful, pass context, avoid unbounded fan-out, make channel sends safe when the receiver may leave, and use errgroup or a clear worker lifecycle where it fits.
References
- Uber Engineering, LeakProf: Featherlight In-Production Goroutine Leak Detection. Uber reported that LeakProf found critical production goroutine leaks, and that fixing two defects reduced peak memory in affected services by 2.5x and 5x.
- Uber, goleak, a goroutine leak detector for Go tests.
- Go documentation, runtime/pprof, including the goroutine profile and
debug=2stack output.