Deep Dive into Go Scheduler: Complete GMP Model Explained

Go is renowned for its elegant concurrency model. goroutines make it easy to write highly concurrent programs. But have you ever wondered how thousands of goroutines run efficiently? How does the scheduler behind the scenes work?

Today, let’s dive deep into Go’s core scheduling model — GMP.

What is GMP?

GMP stands for the three core components of Go’s scheduler:

G (Goroutine): User-level lightweight thread, created with go func()
M (Machine): OS thread, where code actually executes
P (Processor): Logical processor that manages scheduling context for G

Simply put: G needs to execute on M, but M must hold P to execute G.

Why GMP?

In early versions (Go 1.0), Go’s scheduler was very simple: all goroutines were in a global queue, and all M’s would take G from this queue to execute.

This design had serious performance issues:

Severe lock contention: All M’s accessing the global queue needed locks, terrible concurrency
Poor multi-core utilization: No cooperation between M’s, low CPU utilization
Bad cache locality: G frequently switching between M’s, low CPU cache hit rate

In 2012, Dmitry Vyukov redesigned the scheduler, introducing the P concept, forming today’s GMP model.

Core Design of GMP

1. Local Queue + Global Queue

Each P has a local queue (max 256 G’s). M prioritizes taking G from its bound P’s local queue.

The global queue serves as a fallback when P’s local queue is empty.

Benefits:

No locking in most cases (local queue is lock-free)
Reduced lock contention, improved concurrency

2. Work Stealing

When a P’s local queue is empty, it will:

Try to get G from the global queue
If global queue is also empty, steal half from other P’s local queues

This mechanism ensures load balancing, keeping all M’s busy.

3. Hand Off

When M blocks due to syscalls (file IO, network IO):

M unbinds from P
P finds another idle M, or creates a new one
New M binds to P, continues executing other G’s in P’s queue

This ensures that even if one M blocks, other G’s can still execute.

4. Preemptive Scheduling

Before Go 1.14: Cooperative preemption - each G executes for max 10ms, then voluntarily yields.

Go 1.14 introduced Signal-based preemption: if G runs too long (like infinite loop), scheduler sends signal to forcefully preempt.

This solves the problem of certain G’s starving others by hogging CPU.

Interactive Visualization

Below is an interactive animation I created showing the GMP scheduling process. You can switch between scenarios to observe the scheduling relationships between G, M, and P:

Instructions:

Click top buttons to switch scheduling scenarios
Use play/pause buttons for automatic demonstration
Use prev/next buttons for step-by-step analysis
Adjust speed slider to control playback speed

Key Scheduling Scenarios

Scenario 1: Normal Scheduling

Most common case: M takes G from its bound P’s local queue and executes.

P0: [G1, G2, G3] → M0
     ↓
M0 executes G1 → G1 completes → M0 takes G2 and continues

This is the ideal case with no lock contention and best performance.

Scenario 2: Work Stealing

When P1’s queue is empty but P0 still has many G’s:

P0: [G1, G2, G3, G4] (busy)
P1: []               (idle)
     ↓
P1 steals half from P0: [G3, G4]

This ensures CPU utilization, avoiding idle cores while others are overloaded.

Scenario 3: Hand Off

M0 blocks on syscall while executing G1:

M0 + P0 + G1 → G1 makes syscall (blocks)
     ↓
M0 blocks, P0 unbinds
     ↓
P0 binds to M1 (idle or newly created)
     ↓
M1 continues executing other G's in P0's queue

This ensures that even if one G blocks, others can still be scheduled.

Scenario 4: Global Queue Scheduling

When P’s local queue is empty and no other P can be stolen from:

P0: [] (local queue empty)
Global queue: [G1, G2, G3, G4]
     ↓
M0 batch fetches G from global queue to P0 local queue
     ↓
M0 executes G's in P0's queue

Periodically, the scheduler also actively fetches from global queue to avoid starvation.

Advantages of GMP

High concurrency: Lock-free local queues reduce contention
Load balancing: Work Stealing maximizes CPU utilization
Flexible scheduling: Hand Off prevents M blocking from affecting other G’s
Scalability: Number of P equals GOMAXPROCS, typically CPU cores

Interesting Details

Number of P’s

By default, P count equals CPU cores (runtime.NumCPU()). You can adjust via GOMAXPROCS environment variable or runtime.GOMAXPROCS().

runtime.GOMAXPROCS(4) // Limit to 4 P's

Number of M’s

M count is dynamic:

Max 10000 M’s by default (adjustable via debug.SetMaxThreads)
New M created when existing M’s block
Idle M’s get recycled (but some backup M’s are kept)

G Creation and Destruction

Creating G has minimal overhead (only 2KB initial stack). Destroyed G’s go into object pool for reuse.

This is why Go can easily support millions of goroutines.

Performance Optimization Tips

Understanding GMP helps write more efficient Go code:

Avoid blocking syscalls: Use async IO when possible, reduce M blocking
Control goroutine count: Though lightweight, don’t create unlimited goroutines
Use worker pools: For CPU-intensive tasks, use fixed worker goroutines
Set GOMAXPROCS wisely: For CPU-bound programs, equal to CPU cores is optimal; for IO-bound, can increase

Summary

The GMP scheduling model is the cornerstone of Go’s high concurrency capability. By introducing P, Go achieves:

Reduced lock contention (local queues)
Load balancing (Work Stealing)
Efficient scheduling (Hand Off)
Preemptive scheduling (prevent G starvation)

Understanding GMP not only helps us write more efficient Go code, but also guides us in performance optimization.

Hope this article and the interactive visualization help you better understand Go’s scheduler!

References

The Go scheduler - A classic article with clear visualizations
Scalable Go Scheduler Design Doc - Dmitry Vyukov’s original design document
go tool trace - Official trace tool documentation