IOCost model impact cpu soft lockup
Reproduce
environment:
5.15 kernel
way:
Enable cgroup v2 and bfq, set bfq as scheduler on target deviceEnable io cost by /sys/fs/cgroup/io.cost.qos, set cgroup io.weightStart two or more fio process to read or write on the target device, at least contains 1 direct and 1 buffer io, each one use different cgroupsWait 10-30 minutes
Debug
Kernel Dump 1 (Set hardlockup_panic=1 by sysctl):
1 | [ 1408.828715] Call Trace: |
Kernel dump 2 :
1 | [60977.577690] BUG: spinlock recursion on CPU#21, fio/11770 |
Kernel dump 3 (enable lockdep):
1 | [ 4398.422037] WARNING: inconsistent lock state |
Lock chain:
- bfq_bio_merge:
spin_lock_irq(&bfqd->lock);
blk_mq_sched_try_merge:
adjust_inuse_and_calc_cost:
spin_lock_irq(&ioc->lock);
spin_unlock_irq(&ioc->lock);
- bfq_finish_requeue_request:
spin_lock_irqsave(&bfqd->lock, flags);
…
spin_unlock_irqrestore(&bfqd->lock, flags);
There is a process running on cpu and hold a lock spin_lock_irq(&bfqd->lock), then the process to hold the second lock by spin_lock_irq(&ioc->lock), after finish calculation, the process release the second lock spin_unlock_irq(&ioc->lock), at this time the kernel allow irq, and a irq preempts the cpu and when run into function bfq_finish_requeue_request, it try to hold the first lock by spin_lock_irqsave(&bfqd->lock, flags), but it is still hold by original process, so it is deadlock. The locked module can detect this case and reprot the risk into dmesg (stack 3).
Solution
This kernel patch fix the issue in kernel 6.3.13 version:
https://lwn.net/Articles/937933/
https://www.spinics.net/lists/stable/msg669695.html
In this patch, the fix is using spin_lock_irqsave to replace spin_lock_irq in the function of “adjust_inuse_and_calc_cost” to block the irq when unlocked.
We will cherry-pick the patch into our 5.15 kernel.













