Deep Dive into Go Scheduler: Complete GMP Model Explained
Go is renowned for its elegant concurrency model. goroutines make it easy to write highly concurrent programs. But have you ever wondered how thousands of goroutines run efficiently? How does the scheduler behind the scenes work?
Today, let’s dive deep into Go’s core scheduling model — GMP.
What is GMP?
GMP stands for the three core components of Go’s scheduler:
- G (Goroutine): User-level lightweight thread, created with
go func() - M (Machine): OS thread, where code actually executes
- P (Processor): Logical processor that manages scheduling context for G
Simply put: G needs to execute on M, but M must hold P to execute G.
Why GMP?
In early versions (Go 1.0), Go’s scheduler was very simple: all goroutines were in a global queue, and all M’s would take G from this queue to execute.
This design had serious performance issues:
- Severe lock contention: All M’s accessing the global queue needed locks, terrible concurrency
- Poor multi-core utilization: No cooperation between M’s, low CPU utilization
- Bad cache locality: G frequently switching between M’s, low CPU cache hit rate
In 2012, Dmitry Vyukov redesigned the scheduler, introducing the P concept, forming today’s GMP model.
Core Design of GMP
1. Local Queue + Global Queue
Each P has a local queue (max 256 G’s). M prioritizes taking G from its bound P’s local queue.
The global queue serves as a fallback when P’s local queue is empty.
Benefits:
- No locking in most cases (local queue is lock-free)
- Reduced lock contention, improved concurrency
2. Work Stealing
When a P’s local queue is empty, it will:
- Try to get G from the global queue
- If global queue is also empty, steal half from other P’s local queues
This mechanism ensures load balancing, keeping all M’s busy.
3. Hand Off
When M blocks due to syscalls (file IO, network IO):
- M unbinds from P
- P finds another idle M, or creates a new one
- New M binds to P, continues executing other G’s in P’s queue
This ensures that even if one M blocks, other G’s can still execute.
4. Preemptive Scheduling
Before Go 1.14: Cooperative preemption - each G executes for max 10ms, then voluntarily yields.
Go 1.14 introduced Signal-based preemption: if G runs too long (like infinite loop), scheduler sends signal to forcefully preempt.
This solves the problem of certain G’s starving others by hogging CPU.
Interactive Visualization
Below is an interactive animation I created showing the GMP scheduling process. You can switch between scenarios to observe the scheduling relationships between G, M, and P:
Instructions:
- Click top buttons to switch scheduling scenarios
- Use play/pause buttons for automatic demonstration
- Use prev/next buttons for step-by-step analysis
- Adjust speed slider to control playback speed
Key Scheduling Scenarios
Scenario 1: Normal Scheduling
Most common case: M takes G from its bound P’s local queue and executes.
P0: [G1, G2, G3] → M0
↓
M0 executes G1 → G1 completes → M0 takes G2 and continues
This is the ideal case with no lock contention and best performance.
Scenario 2: Work Stealing
When P1’s queue is empty but P0 still has many G’s:
P0: [G1, G2, G3, G4] (busy)
P1: [] (idle)
↓
P1 steals half from P0: [G3, G4]
This ensures CPU utilization, avoiding idle cores while others are overloaded.
Scenario 3: Hand Off
M0 blocks on syscall while executing G1:
M0 + P0 + G1 → G1 makes syscall (blocks)
↓
M0 blocks, P0 unbinds
↓
P0 binds to M1 (idle or newly created)
↓
M1 continues executing other G's in P0's queue
This ensures that even if one G blocks, others can still be scheduled.
Scenario 4: Global Queue Scheduling
When P’s local queue is empty and no other P can be stolen from:
P0: [] (local queue empty)
Global queue: [G1, G2, G3, G4]
↓
M0 batch fetches G from global queue to P0 local queue
↓
M0 executes G's in P0's queue
Periodically, the scheduler also actively fetches from global queue to avoid starvation.
Advantages of GMP
- High concurrency: Lock-free local queues reduce contention
- Load balancing: Work Stealing maximizes CPU utilization
- Flexible scheduling: Hand Off prevents M blocking from affecting other G’s
- Scalability: Number of P equals
GOMAXPROCS, typically CPU cores
Interesting Details
Number of P’s
By default, P count equals CPU cores (runtime.NumCPU()). You can adjust via GOMAXPROCS environment variable or runtime.GOMAXPROCS().
runtime.GOMAXPROCS(4) // Limit to 4 P's
Number of M’s
M count is dynamic:
- Max 10000 M’s by default (adjustable via
debug.SetMaxThreads) - New M created when existing M’s block
- Idle M’s get recycled (but some backup M’s are kept)
G Creation and Destruction
Creating G has minimal overhead (only 2KB initial stack). Destroyed G’s go into object pool for reuse.
This is why Go can easily support millions of goroutines.
Performance Optimization Tips
Understanding GMP helps write more efficient Go code:
- Avoid blocking syscalls: Use async IO when possible, reduce M blocking
- Control goroutine count: Though lightweight, don’t create unlimited goroutines
- Use worker pools: For CPU-intensive tasks, use fixed worker goroutines
- Set GOMAXPROCS wisely: For CPU-bound programs, equal to CPU cores is optimal; for IO-bound, can increase
Summary
The GMP scheduling model is the cornerstone of Go’s high concurrency capability. By introducing P, Go achieves:
- Reduced lock contention (local queues)
- Load balancing (Work Stealing)
- Efficient scheduling (Hand Off)
- Preemptive scheduling (prevent G starvation)
Understanding GMP not only helps us write more efficient Go code, but also guides us in performance optimization.
Hope this article and the interactive visualization help you better understand Go’s scheduler!
References
- The Go scheduler - A classic article with clear visualizations
- Scalable Go Scheduler Design Doc - Dmitry Vyukov’s original design document
- go tool trace - Official trace tool documentation