docs/concepts/kernel/kernel_scheduling.md - fuchsia - Git at Google

 # Zircon scheduler

 ## Overview

 The Zircon scheduler is a hybrid scheduler that supports both **Fair** and
 **Deadline** scheduling disciplines. It runs independently on each logical
 CPU, managing its own set of run queues and coordinating with other CPUs
 via Inter-Processor Interrupts (IPIs).

 The primary goal of the scheduler is to share CPU bandwidth among competing
 threads while guaranteeing the timing requirements of critical workloads
 (such as audio, graphics, and high-frequency sensors).

 ## Scheduling disciplines

 ### Fair scheduling

 Fair scheduling is the primary scheduling discipline for general-purpose
 workloads in the system. It divides CPU bandwidth between competing threads
 such that each receives a weighted proportion of the CPU over time.

 * **Weight-Based Allocation**: Each thread is assigned a weight. A thread
   with twice the weight of another receives approximately twice the CPU time.
 * **Virtual Timeline**: Ready threads are ordered in a balanced binary tree
   (WAVL tree) based on a *virtual finish time*. Higher-weighted threads are
   ordered closer to the front, but lower-weighted threads are guaranteed
   timely execution to prevent starvation and unbounded priority inversion.
 * **Scheduling Period**: The scheduling period adapts to the number of
   competing threads. If too many threads compete, the period stretches to
   ensure each gets at least a **Minimum Granularity** slice, improving
   throughput.

 For a detailed look at the Fair Scheduling algorithm math and ordering
 criteria, see [Fair Scheduler Mechanics][fair-scheduler-mechanics].

 ### Deadline scheduling

 Deadline scheduling is supported for latency-critical tasks that require firm
 timing guarantees and must not partake in proportional slowdowns during overload
 conditions.

 * **Guarantees**: A deadline task specifies a **Capacity** (CPU time) and a
   **Deadline** (period). The scheduler guarantees that a task will receive its
   capacity within its deadline period if the overall deadline demand is
   feasible.
 * **Precedence**: Deadline tasks always take precedence over eligible fair
   tasks.
 * **Overload Response**: To ensure that all deadline tasks can be scheduled
   within their expected periods, the sum total of deadline demands must not
   exceed the processor's capacity. If the total deadline demand of a processor
   exceeds 100%, stochastic deadline misses are expected and the system will
   behave as though the periods of some or all task were multiplied by the
   overload factor.
 * **Critical Priority**: Deadline profiles can optionally be marked as
   **critical**. A critical deadline task receives absolute priority over normal
   deadline tasks during periods of CPU oversubscription, ensuring that essential
   workloads are not degraded by runaway standard deadline threads. In a
   correctly functioning system, total critical utilization should never exceed
   the limits of the processor, otherwise a kernel oops is emitted.

 ## Run queues and selection

 Each CPU maintains three primary run queues, implemented as augmented WAVL trees
 to satisfy efficient O(log n) operations:

 1. **Critical Deadline Run Queue**: Contains runnable critical deadline tasks.
 2. **Deadline Run Queue**: Contains runnable non-critical deadline tasks.
 3. **Fair Run Queue**: Contains runnable fair tasks.

 ### Thread selection

 When choosing the next thread to run, the scheduler evaluates queues in the
 following order:

 1. **Critical Deadline Task**: Picks the eligible critical deadline task with
    the earliest finish time.
 2. **Deadline Task**: Picks the eligible deadline task with the earliest finish
    time.
 3. **Fair Task**: Picks the eligible fair task with the earliest virtual finish
    time.
 4. **Idle/Power Thread**: Runs when no other threads are eligible.

 ### Preemption and timeslices

 The scheduler sets a CPU-local preemption timer based on the selected thread's
 timeslice or the arrival of a deadline task. When the timer fires, execution
 stops on the current thread, and the scheduler re-evaluates which thread is the
 most suitable to run next.

 ## CPU placement and migration

 The scheduler performs load balancing and task placement based on several
 factors to optimize performance and thermal efficiency.

 ### Placement criteria

 When a thread wakes up or is unblocked, the scheduler selects a target CPU based
 on:

 1. **Affinity**: The thread's user-defined CPU affinity mask is prioritized.
 2. **Last CPU**: Running on the last CPU helps preserve cache warmness
    (Intra-cluster affinity).
 3. **Idle States**: Highly prioritized if the core avoids context-switch
    overhead of a busy CPU.

 ### Work stealing

 To maintain load balance, a CPU with no eligible work will attempt to **steal**
 work from other busy CPUs.

 * **Cluster Awareness**: The scheduler is conscious of the CPU topology. CPU
   stealing prioritizes local clusters over distant clusters to capitalize on
   better shared cache performance levels and reduce migration penalties.

 ## Power and energy awareness

 The scheduler uses local energy models and power level controls to optimize
 power consumption and thermal performance.

 * **Performance Scales**: Account for individual CPU performance multipliers
   inside budget calculations.
 * **Dynamic Voltage and Frequency Scaling (DVFS)**: Adjust processor workloads
   supporting user-defined limits while striving to meet deadline schedules.

 <!-- Reference links -->
 [fair-scheduler-mechanics]: fair_scheduler.md
	# Zircon scheduler

	## Overview

	The Zircon scheduler is a hybrid scheduler that supports both Fair and
	Deadline scheduling disciplines. It runs independently on each logical
	CPU, managing its own set of run queues and coordinating with other CPUs
	via Inter-Processor Interrupts (IPIs).

	The primary goal of the scheduler is to share CPU bandwidth among competing
	threads while guaranteeing the timing requirements of critical workloads
	(such as audio, graphics, and high-frequency sensors).

	## Scheduling disciplines

	### Fair scheduling

	Fair scheduling is the primary scheduling discipline for general-purpose
	workloads in the system. It divides CPU bandwidth between competing threads
	such that each receives a weighted proportion of the CPU over time.

	* Weight-Based Allocation: Each thread is assigned a weight. A thread
	with twice the weight of another receives approximately twice the CPU time.
	* Virtual Timeline: Ready threads are ordered in a balanced binary tree
	(WAVL tree) based on a virtual finish time. Higher-weighted threads are
	ordered closer to the front, but lower-weighted threads are guaranteed
	timely execution to prevent starvation and unbounded priority inversion.
	* Scheduling Period: The scheduling period adapts to the number of
	competing threads. If too many threads compete, the period stretches to
	ensure each gets at least a Minimum Granularity slice, improving
	throughput.

	For a detailed look at the Fair Scheduling algorithm math and ordering
	criteria, see [Fair Scheduler Mechanics][fair-scheduler-mechanics].

	### Deadline scheduling

	Deadline scheduling is supported for latency-critical tasks that require firm
	timing guarantees and must not partake in proportional slowdowns during overload
	conditions.

	* Guarantees: A deadline task specifies a Capacity (CPU time) and a
	Deadline (period). The scheduler guarantees that a task will receive its
	capacity within its deadline period if the overall deadline demand is
	feasible.
	* Precedence: Deadline tasks always take precedence over eligible fair
	tasks.
	* Overload Response: To ensure that all deadline tasks can be scheduled
	within their expected periods, the sum total of deadline demands must not
	exceed the processor's capacity. If the total deadline demand of a processor
	exceeds 100%, stochastic deadline misses are expected and the system will
	behave as though the periods of some or all task were multiplied by the
	overload factor.
	* Critical Priority: Deadline profiles can optionally be marked as
	critical. A critical deadline task receives absolute priority over normal
	deadline tasks during periods of CPU oversubscription, ensuring that essential
	workloads are not degraded by runaway standard deadline threads. In a
	correctly functioning system, total critical utilization should never exceed
	the limits of the processor, otherwise a kernel oops is emitted.

	## Run queues and selection

	Each CPU maintains three primary run queues, implemented as augmented WAVL trees
	to satisfy efficient O(log n) operations:

	1. Critical Deadline Run Queue: Contains runnable critical deadline tasks.
	2. Deadline Run Queue: Contains runnable non-critical deadline tasks.
	3. Fair Run Queue: Contains runnable fair tasks.

	### Thread selection

	When choosing the next thread to run, the scheduler evaluates queues in the
	following order:

	1. Critical Deadline Task: Picks the eligible critical deadline task with
	the earliest finish time.
	2. Deadline Task: Picks the eligible deadline task with the earliest finish
	time.
	3. Fair Task: Picks the eligible fair task with the earliest virtual finish
	time.
	4. Idle/Power Thread: Runs when no other threads are eligible.

	### Preemption and timeslices

	The scheduler sets a CPU-local preemption timer based on the selected thread's
	timeslice or the arrival of a deadline task. When the timer fires, execution
	stops on the current thread, and the scheduler re-evaluates which thread is the
	most suitable to run next.

	## CPU placement and migration

	The scheduler performs load balancing and task placement based on several
	factors to optimize performance and thermal efficiency.

	### Placement criteria

	When a thread wakes up or is unblocked, the scheduler selects a target CPU based
	on:

	1. Affinity: The thread's user-defined CPU affinity mask is prioritized.
	2. Last CPU: Running on the last CPU helps preserve cache warmness
	(Intra-cluster affinity).
	3. Idle States: Highly prioritized if the core avoids context-switch
	overhead of a busy CPU.

	### Work stealing

	To maintain load balance, a CPU with no eligible work will attempt to steal
	work from other busy CPUs.

	* Cluster Awareness: The scheduler is conscious of the CPU topology. CPU
	stealing prioritizes local clusters over distant clusters to capitalize on
	better shared cache performance levels and reduce migration penalties.

	## Power and energy awareness

	The scheduler uses local energy models and power level controls to optimize
	power consumption and thermal performance.

	* Performance Scales: Account for individual CPU performance multipliers
	inside budget calculations.
	* Dynamic Voltage and Frequency Scaling (DVFS): Adjust processor workloads
	supporting user-defined limits while striving to meet deadline schedules.

	<!-- Reference links -->
	[fair-scheduler-mechanics]: fair_scheduler.md