Values related to processor utilization are displayed on the third line. They provide insight into exactly what the CPUs are doing.
us is the percent of time spent running user processes.
sy is the percent of time spent running the kernel.
ni is the percent of time spent running processes with manually configured nice values.
id is the percent of time idle (if high, CPU may be overworked).
wa is the percent of wait time (if high, CPU is waiting for I/O access).
hi is the percent of time managing hardware interrupts.
si is the percent of time managing software interrupts.
st is the percent of virtual CPU time waiting for access to physical CPU.
In the context of the top command in Unix-like operating systems, the “wa” field represents the percentage of time the CPU spends waiting for I/O operations to complete. Specifically, it stands for “waiting for I/O.”
The calculation includes the time the CPU is idle while waiting for data to be read from or written to storage devices such as hard drives or SSDs. High values in the “wa” field may indicate that the system is spending a significant amount of time waiting for I/O operations to complete, which could be a bottleneck if not addressed.
The formula for calculating the “wa” value is as follows:
wa %=(Time spent waiting for I/O / Total CPU time)×100
Also we can use vmstat 1 to watch the wa state in system, but it is the not the percent:
Procs r: The number of processes waiting for run time. b: The number of processes in uninterruptible sleep. Memory swpd: the amount of virtual memory used. free: the amount of idle memory. buff: the amount of memory used as buffers. cache: the amount of memory used as cache. inact: the amount of inactive memory. (-a option) active: the amount of active memory. (-a option) Swap si: Amount of memory swapped in from disk (/s). so: Amount of memory swapped to disk (/s). IO bi: Blocks received from a block device (blocks/s). bo: Blocks sent to a block device (blocks/s). System in: The number of interrupts per second, including the clock. cs: The number of context switches per second. CPU These are percentages of total CPU time. us: Time spent running non-kernel code. (user time, including nice time) sy: Time spent running kernel code. (system time) id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time. wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle. st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.
In summary, a high “wa” value in the output of top suggests that your system is spending a considerable amount of time waiting for I/O operations to be completed, which could impact overall system performance. Investigating and optimizing disk I/O can be beneficial in such cases, possibly by improving disk performance, optimizing file systems, or identifying and addressing specific I/O-intensive processes. We can use iotop to find the high io or slow io processes in system:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Current DISK READ: 341.78 M/s | Current DISK WRITE: 340.53 M/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 3487445 be/4 root 34.44 M/s 33.36 M/s ?unavailable? fio -filename=/mnt/hostroot/var/mnt/testfile -direct=1 -iodepth 64 -thread -rw=randrw -ioengine=libaio -bs=8k -size=20G -numjobs=10 -group_reporting --runtime=100 -name=mytest 3487446 be/4 root 33.40 M/s 34.27 M/s ?unavailable? fio -filename=/mnt/hostroot/var/mnt/testfile -direct=1 -iodepth 64 -thread -rw=randrw -ioengine=libaio -bs=8k -size=20G -numjobs=10 -group_reporting --runtime=100 -name=mytest 3487447 be/4 root 34.10 M/s 33.78 M/s ?unavailable? fio -filename=/mnt/hostroot/var/mnt/testfile -direct=1 -iodepth 64 -thread -rw=randrw -ioengine=libaio -bs=8k -size=20G -numjobs=10 -group_reporting --runtime=100 -name=mytest 3487448 be/4 root 32.21 M/s 31.64 M/s ?unavailable? fio -filename=/mnt/hostroot/var/mnt/testfile -direct=1 -iodepth 64 -thread -rw=randrw -ioengine=libaio -bs=8k -size=20G -numjobs=10 -group_reporting --runtime=100 -name=mytest 3487449 be/4 root 34.22 M/s 34.44 M/s ?unavailable? fio -filename=/mnt/hostroot/var/mnt/testfile -direct=1 -iodepth 64 -thread -rw=randrw -ioengine=libaio -bs=8k -size=20G -numjobs=10 -group_reporting --runtime=100 -name=mytest 3487450 be/4 root 34.46 M/s 34.33 M/s ?unavailable? fio -filename=/mnt/hostroot/var/mnt/testfile -direct=1 -iodepth 64 -thread -rw=randrw -ioengine=libaio -bs=8k -size=20G -numjobs=10 -group_reporting --runtime=100 -name=mytest 3487451 be/4 root 34.17 M/s 34.21 M/s ?unavailable? fio -filename=/mnt/hostroot/var/mnt/testfile -direct=1 -iodepth 64 -thread -rw=randrw -ioengine=libaio -bs=8k -size=20G -numjobs=10 -group_reporting --runtime=100 -name=mytest 3487452 be/4 root 33.05 M/s 32.95 M/s ?unavailable? fio -filename=/mnt/hostroot/var/mnt/testfile -direct=1 -iodepth 64 -thread -rw=randrw -ioengine=libaio -bs=8k -size=20G -numjobs=10 -group_reporting --runtime=100 -name=mytest 3487453 be/4 root 34.23 M/s 34.74 M/s ?unavailable? fio -filename=/mnt/hostroot/var/mnt/testfile -direct=1 -iodepth 64 -thread -rw=randrw -ioengine=libaio -bs=8k -size=20G -numjobs=10 -group_reporting --runtime=100 -name=mytest 3487454 be/4 root 37.51 M/s 36.79 M/s ?unavailable? fio -filename=/mnt/hostroot/var/mnt/testfile -direct=1 -iodepth 64 -thread -rw=randrw -ioengine=libaio -bs=8k -size=20G -numjobs=10 -group_reporting --runtime=100 -name=mytest 1 be/4 root 0.00 B/s 0.00 B/s ?unavailable? systemd --switched-root --system --deserialize 29 2 be/4 root 0.00 B/s 0.00 B/s ?unavailable? [kthreadd] 3 be/4 root 0.00 B/s 0.00 B/s ?unavailable? [rcu_gp] 4 be/4 root 0.00 B/s 0.00 B/s ?unavailable? [rcu_par_gp] 6 be/4 root 0.00 B/s 0.00 B/s ?unavailable? [kworker/0:0H-events_highpri]
Code tracing
From strace the top command, the wa value is get the status from /proc/*/stat. The /proc/*/stat is wrote by the proc ops and get the iowait time from kernel_cpustat.cpustat[CPUTIME_IOWAIT] accroding to the code fs/proc/stat.c.
if (cpu_online(cpu)) iowait_usecs = get_cpu_iowait_time_us(cpu, NULL);
if (iowait_usecs == -1ULL) /* !NO_HZ or cpu offline so we can rely on cpustat.iowait */ iowait = kcs->cpustat[CPUTIME_IOWAIT]; else iowait = iowait_usecs * NSEC_PER_USEC;
return iowait; }
In the Linux kernel, the calculation of the “wa” (waiting for I/O) value that is reported by tools like top is handled within the kernel’s scheduler. The specific code responsible for updating the CPU usage statistics can be found in the kernel/sched/cputime.c file.
One of the key functions related to this is account_idle_time().
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
/* * Account for idle time. * @cputime: the CPU time spent in idle wait */ voidaccount_idle_time(u64 cputime) { u64 *cpustat = kcpustat_this_cpu->cpustat; structrq *rq = this_rq();
u32 nr_iowait; /* number of blocked threads (waiting for I/O) */
This function is part of the kernel’s scheduler and is responsible for updating the various CPU time statistics, including the time spent waiting for I/O. The function takes into account different CPU states, such as idle time and time spent waiting for I/O.When The blocked threads with waiting for I/O, the cpu time accumulated into CPUTIME_IOWAIT.
Here is a basic outline of how the “wa” time is accounted for in the Linux kernel:
Idle time accounting: The kernel keeps track of the time the CPU spends in an idle state, waiting for tasks to execute.
I/O wait time accounting: When a process is waiting for I/O operations (such as reading or writing to a disk), the kernel accounts for this time in the appropriate CPU state, including the “wa” time.
# block/blk-core.c voidblk_io_schedule(void) { /* Prevent hang_check timer from firing at us during very long I/O */ unsignedlong timeout = sysctl_hung_task_timeout_secs * HZ / 2; if (timeout) io_schedule_timeout(timeout); else io_schedule(); } EXPORT_SYMBOL_GPL(blk_io_schedule);
# kernel/sched/core.c /* * This task is about to go to sleep on IO. Increment rq->nr_iowait so * that process accounting knows that this is a task in IO wait state. */ long __sched io_schedule_timeout(long timeout) { int token; long ret;
token = io_schedule_prepare(); ret = schedule_timeout(timeout); io_schedule_finish(token);
return ret; } EXPORT_SYMBOL(io_schedule_timeout);
intio_schedule_prepare(void) { int old_iowait = current->in_iowait;
switch (timeout) { case MAX_SCHEDULE_TIMEOUT: /* * These two special cases are useful to be comfortable * in the caller. Nothing more. We could take * MAX_SCHEDULE_TIMEOUT from one of the negative value * but I' d like to return a valid offset (>=0) to allow * the caller to do everything it want with the retval. */ schedule(); goto out; ... } }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
staticint try_to_wake_up(struct task_struct *p, unsignedint state, int wake_flags) { ... if (task_cpu(p) != cpu) { if (p->in_iowait) { delayacct_blkio_end(p); atomic_dec(&task_rq(p)->nr_iowait); }