Go Counting Semaphore Pattern

Learn the counting semaphore pattern in Go using buffered channels to limit concurrent work, protect services, and keep worker pools predictable.

A counting semaphore limits how many pieces of work can run at the same time.

In Go, the simplest version is a buffered channel. The channel capacity is the number of available permits. A goroutine sends into the channel before starting work and receives from it when the work is done.

This is a small pattern, but it solves a very real problem: protecting a service from doing too much at once.

A realistic example

Suppose a backend receives uploaded videos and needs to create thumbnails with ffmpeg.

Thumbnail generation is CPU-heavy. If 200 uploads arrive together and we start 200 ffmpeg processes, the machine may become slow enough that everything fails. What we actually want is a queue with a limit:

  • accept many jobs
  • run only 3 thumbnail jobs at once
  • start the next job when one finishes

That is a counting semaphore.

Counting semaphore limiting thumbnail jobs

The core pattern

The semaphore is just this:

sem := make(chan struct{}, 3)

To acquire a permit:

sem <- struct{}{}

To release it:

<-sem

The send blocks when the buffer is full. If three goroutines already hold permits, the fourth goroutine waits until one permit is released.

Complete example

Here is a small program that processes video jobs with at most three concurrent workers:

package main

import (
	"context"
	"fmt"
	"log"
	"os/exec"
	"sync"
	"time"
)

type VideoJob struct {
	ID       string
	Input    string
	ThumbOut string
}

func main() {
	jobs := []VideoJob{
		{ID: "101", Input: "uploads/101.mp4", ThumbOut: "thumbs/101.jpg"},
		{ID: "102", Input: "uploads/102.mp4", ThumbOut: "thumbs/102.jpg"},
		{ID: "103", Input: "uploads/103.mp4", ThumbOut: "thumbs/103.jpg"},
		{ID: "104", Input: "uploads/104.mp4", ThumbOut: "thumbs/104.jpg"},
		{ID: "105", Input: "uploads/105.mp4", ThumbOut: "thumbs/105.jpg"},
	}

	const maxConcurrent = 3
	sem := make(chan struct{}, maxConcurrent)

	var wg sync.WaitGroup

	for _, job := range jobs {
		job := job
		wg.Add(1)

		go func() {
			defer wg.Done()

			sem <- struct{}{}
			defer func() { <-sem }()

			if err := createThumbnail(context.Background(), job); err != nil {
				log.Printf("job %s failed: %v", job.ID, err)
				return
			}

			log.Printf("job %s complete", job.ID)
		}()
	}

	wg.Wait()
}

func createThumbnail(ctx context.Context, job VideoJob) error {
	ctx, cancel := context.WithTimeout(ctx, 30*time.Second)
	defer cancel()

	cmd := exec.CommandContext(
		ctx,
		"ffmpeg",
		"-y",
		"-i", job.Input,
		"-ss", "00:00:01",
		"-vframes", "1",
		job.ThumbOut,
	)

	if output, err := cmd.CombinedOutput(); err != nil {
		return fmt.Errorf("ffmpeg: %w: %s", err, output)
	}

	return nil
}

Only the section between acquire and release is limited:

sem <- struct{}{}
defer func() { <-sem }()

// limited work happens here

That is the critical region.

Why use struct{}?

The semaphore channel does not need to carry data. It only needs to count permits. struct{} is useful because it has zero size:

chan struct{}

You may also see chan bool, but chan struct{} communicates the intent more clearly: the value is not important.

The common bug

The common mistake is forgetting to release the permit when the job fails.

This is risky:

sem <- struct{}{}

err := doWork()
if err != nil {
	return err
}

<-sem

If doWork fails, the function returns before releasing. Eventually all permits are leaked and the program blocks forever.

Prefer this:

sem <- struct{}{}
defer func() { <-sem }()

return doWork()

Acquire, then immediately defer the release.

Where this shows up in real systems

This pattern is not only for toy examples. The exact implementation may be a channel, a worker count, a connection pool, or a library semaphore, but the idea is the same: allow concurrency, but put a hard ceiling around the scarce thing.

Here are a few practical places where this shows up.

Terraform resource operations

Terraform applies infrastructure changes by walking a dependency graph. Independent resources can be created or updated in parallel, but Terraform still has a concurrency limit. The terraform apply command has a -parallelism=n flag that limits the number of concurrent operations as Terraform walks the graph. The default is 10.

That limit exists for the same reason as the thumbnail example: cloud APIs are shared external systems. If a large Terraform run creates too many AWS, Azure, GCP, or SaaS resources at once, the provider API may throttle or fail requests. HashiCorp’s own Terraform Enterprise guidance talks about tuning TFE_PARALLELISM when providers produce errors during concurrent operations or enforce non-standard rate limiting.

The useful lesson is that concurrency control belongs near the expensive external system. Terraform is not trying to make every operation serial. It is trying to keep the number of in-flight provider operations within a practical bound.

Kubernetes controllers

Kubernetes controllers use work queues heavily. A controller watches cluster state, enqueues keys for objects that need reconciliation, and runs a fixed number of worker goroutines to process the queue.

That fixed worker count is the semaphore idea in another form. If a controller runs 2 workers, only 2 reconcile loops are active at a time for that controller. If it runs 10 workers, up to 10 items can be processed concurrently. The queue may contain thousands of pending changes, but the worker count protects the API server, the controller process, and downstream systems from a burst.

The client-go workqueue package also has the operational details production controllers need: blocking Get, Done when processing finishes, requeueing, delayed queues, and rate-limited retries. A plain channel semaphore is enough for a small service, but Kubernetes needs the queue plus retry behavior.

Go services talking to databases

Database connection pools are another practical version of the same pattern. In Go’s database/sql, db.SetMaxOpenConns(n) sets the maximum number of open connections to the database.

This matters in real services because goroutines are cheap but database connections are not. If a web service gets a traffic spike, thousands of handlers may try to query the database. The connection pool limit prevents the application from opening unlimited database connections and pushing the database over its own connection cap.

This is not a channel semaphore in your code, but it is the same resource-control idea: requests can wait, but the database should not be flooded.

Go library support

The Go ecosystem also has library support for this pattern. golang.org/x/sync/semaphore provides a weighted semaphore where callers can acquire different amounts of capacity. That is useful when jobs do not all cost the same amount.

For task groups, golang.org/x/sync/errgroup has SetLimit(n), which limits the number of active goroutines in a group. That is often nicer than combining sync.WaitGroup, an error channel, and a manual semaphore.

In production code, I usually choose the smallest tool that expresses the real constraint:

  • buffered channel for a simple local limit
  • errgroup.SetLimit when I also need error propagation and cancellation
  • x/sync/semaphore when jobs have different weights
  • a worker queue when jobs need retries, persistence, or backpressure metrics
  • a database or HTTP client pool when the limited resource is already managed by the library

When this pattern is enough

A buffered-channel semaphore is a good fit when:

  • all work has roughly the same cost
  • one process owns the concurrency limit
  • you want a small dependency-free solution
  • blocking is acceptable when the limit is reached

For more complex systems, I would consider a real queue, worker pool, rate limiter, or golang.org/x/sync/semaphore. The weighted semaphore from x/sync is useful when jobs have different costs. For example, a 4K video may need more permits than a short 720p clip.

Final thought

The counting semaphore pattern is one of the most practical Go concurrency tools because it is small and explicit.

The rule is:

Put a permit around the expensive thing, not around the whole program.

In the thumbnail example, the expensive thing is the ffmpeg process. The semaphore keeps that pressure under control while still letting the service accept and schedule many jobs.

References: Terraform’s apply -parallelism, HashiCorp’s notes on TFE_PARALLELISM, Kubernetes client-go workqueue, Go’s database/sql connection management, x/sync/semaphore, and errgroup.SetLimit.