,

Contents · Energy-aware scheduling and power states


Background: power, energy, and performance

  • Dynamic power ~ C·V²·f; leakage grows with temperature and process.
  • Energy = ∫ Power dt; latency vs energy trade-offs shaped by utilization and sleep opportunities.
  • Goals: meet QoS while minimizing energy via right frequency and deeper sleep when idle.
// Simplified energy estimate
function energy(jobs, freq){
  const power = 0.5 * Math.pow(freq, 2); // proxy for C·V²·f
  const time = jobs / freq;
  return power * time;
}

DVFS and P-states

  • Dynamic Voltage and Frequency Scaling chooses operating points (OPPs) per policy.
  • P-states (ACPI) expose discrete performance levels; Intel P-state/AMD P-state manage transitions.
  • Modern policy: schedutil translates scheduler utilization into target frequency.

Idle management and C-states

  • C-states define idle depth (C1 shallow → Cn deep) with increasing exit latency and lower leakage.
  • cpuidle framework selects states using governors (menu, ladder) based on next-wakeup prediction.
  • Timer slack and tickless idle (NO_HZ) create longer uninterrupted idle to reach deeper C-states.

Linux schedutil, EAS, and cpusets

  • schedutil: maps per-CPU utilization to frequency, reducing oscillations and latency.
  • Energy Aware Scheduling (EAS): on asymmetric systems, places tasks on the most energy-efficient CPU given utilization.
  • Use cpuset/sched_setaffinity to confine background tasks to efficient cores.
// Sketch: pick core with minimal energy for a task's util
function pickCore(utils, cores){
  // cores: [{eff: joules_per_cycle}, ...]; utils: task utilization
  let best = 0, score = Infinity;
  cores.forEach((c,i)=>{ const s = c.eff * utils; if (s < score){ score = s; best = i; } });
  return best;
}

CPUfreq governors and policies

  • Governors: schedutil (recommended), performance, powersave, ondemand, conservative.
  • Policy per CPU or shared cluster; min/max caps and boost for responsiveness.
  • Stability: filter transients; batching of frequency changes avoids thrash.

SoC power domains and big.LITTLE

  • Power/clock domains gate blocks independently; runtime PM suspends unused devices.
  • big.LITTLE (Arm): energy-efficient LITTLE cores + high-performance big cores; EAS balances placement.
  • Memory/controller DVFS and uncore power states affect end-to-end energy and latency.

Measuring power and tuning

  • Tools: RAPL (Intel), HWMON, perf stat/top, powertop, turbostat, energy_uj counters.
  • Workload shaping: coalesce timers/IO, batch work, avoid wakeups, increase timer slack.
  • Pin interrupts; isolate latency-sensitive tasks; ensure idle residency is achieved.
// Coalesce periodic work windows (concept)
const periodMs = 100; // do work every 100ms, sleep otherwise
setInterval(() => {
  // batch IO and compute here
}, periodMs);

Exercises

  1. Compare schedutil vs performance governors on a CPU-bound and an IO-bound workload.
  2. Measure idle residency with and without timer slack; quantify energy impact.
  3. On a big.LITTLE device, pin background tasks to LITTLE cores and benchmark energy/latency.
Energy efficiency emerges from coordinated frequency policy, smart task placement, and maximizing true idle.