Deep Dive: SCST Kernel Module Infinite Loop and Deadlock Analysis
SCST Infinite Loop Deadlock: Root Cause Analysis and Fix
Executive Summary
This post provides a deep technical analysis of a critical infinite loop bug in the SCST (SCSI Target) kernel module that caused complete system hangs during device cleanup. The issue manifested as kernel tasks stuck in uninterruptible sleep (D state), eventually triggering soft lockup detection and system instability.
The Problem: During device shutdown, the cleanup routine enters an infinite loop when zombie commands remain in the hash table but not in the ready queue, creating a deadlock condition.
The Solution: Implemented comprehensive diagnostics, force cleanup mechanism with timeout handling, and proper command lifecycle management to break the deadlock.
Background: The SCST User-Space Device Handler
SCST is a professional, clustered, high-performance storage target subsystem for Linux. The scst_user module allows user-space applications to implement SCSI target devices via a character device interface.
Architecture Overview
1 | User Space Application |
Key data structures:
ucmd_hash: Hash table tracking all active commandsready_cmd_list: Queue of commands ready for user-space processingcleanup_cmpl: Completion for device cleanup synchronization
The Issue: Kernel Soft Lockup During Device Cleanup
Symptom
Multiple scst_usr_release kernel threads stuck in D state (uninterruptible sleep):
1 | [Thu Jan 23 02:37:11 2025] task:scst_usr_releas state:D stack: 0 pid:334614 |
The threads are waiting for wait_for_completion(&dev->cleanup_cmpl) which never gets signaled because the cleanup thread is stuck in an infinite loop.
Root Cause Analysis
The Cleanup Flow
When a SCST user device is being torn down, this sequence occurs:
dev_user_exit_dev()(scst_user.c:3747)- Unregisters device from SCST core
- Sets
dev->cleanup_done = 1 - Waits for
wait_for_completion(&dev->cleanup_cmpl)at line 3779
dev_user_process_cleanup()(scst_user.c:3851) - runs in cleanup thread- Processes remaining commands
- Eventually should call
complete_all(&dev->cleanup_cmpl)at line 3898 - This is where the infinite loop occurs
The Infinite Loop Logic
Let’s examine the problematic code in dev_user_process_cleanup():
1 | while (1) { |
The Deadlock Condition
The infinite loop occurs when all three conditions are simultaneously true:
| Condition | Meaning | Variable |
|---|---|---|
rc1 != 0 |
Commands still exist in ucmd_hash |
Non-zero count from dev_user_unjam_dev() |
rc == -EAGAIN |
No commands in ready_cmd_list |
From dev_user_get_next_cmd() |
dev->cleanup_done == 1 |
Device teardown initiated | Set by dev_user_exit_dev() |
Why Commands Get Stuck
Looking at dev_user_unjam_dev() (scst_user.c:2680):
1 | static int dev_user_unjam_dev(struct scst_user_dev *dev) |
Zombie commands can remain in ucmd_hash but not be unjammable if:
ucmd->sent_to_user == 0- Never delivered to user spaceucmd_get_check(ucmd)fails - Reference count is 0 (being destroyed)- Command is in an intermediate state that
dev_user_unjam_cmd()can’t handle
Command State Machine
Commands go through these states:
1 | UCMD_STATE_PARSING |
If a command’s reference count reaches 0 while still in ucmd_hash, it becomes a zombie:
ucmd_get_check()fails (atomic_inc_return returns 1 from 0)- Command can’t be retrieved from ready list (already removed)
- Command can’t be unjammed (reference check fails)
- But
dev_user_unjam_dev()still counts it inres++
The Race Condition
1 | Thread 1: dev_user_exit_dev() Thread 2: dev_user_process_cleanup() |
The Original Workaround: Panic
The initial workaround was to panic after 10,000 iterations:
1 | loop_count++; |
Why This “Worked”
- Prevents soft lockup detector from triggering
- Provides clear evidence via kernel panic
- Forces immediate attention to the problem
Why This Is Bad
- ❌ Crashes the entire system
- ❌ Loses all in-flight I/O across all devices
- ❌ No recovery possible
- ❌ Doesn’t fix the root cause
- ❌ No diagnostic information about stuck commands
The Proper Solution
Multi-Layered Approach
- Diagnostic Layer - Identify stuck commands
- Timeout Layer - Detect prolonged loops
- Force Cleanup Layer - Break the deadlock
- Graceful Degradation - Continue cleanup after recovery
Implementation
1 | static int dev_user_process_cleanup(struct scst_user_dev *dev) |
Key Improvements
1. Diagnostic Logging (100 iterations)
Captures the state when the loop is detected:
- Device pointer and iteration count
- Return codes (
rc1,rc,cleanup_done) - Every stuck command’s details:
- State (parsing, executing, freeing, etc.)
- Whether sent to user space
- Jammed flag
- Reference count
- Associated SCST command pointer
This provides actionable debugging information instead of just panicking.
2. Force Cleanup (1000 iterations ≈ 1 second)
After a reasonable timeout:
- Removes all commands from
ucmd_hash - Forces reference counts down to prevent leaks
- Frees stuck commands
- Breaks the loop, allowing
complete_all()to be called
3. CPU Throttling
After 10 iterations, adds msleep(1) to:
- Prevent CPU spinning at 100%
- Allow other threads to run
- Reduce system load during cleanup
4. Guaranteed Progress
The loop will terminate either by:
- Normal completion (all commands cleaned up)
- Force cleanup after timeout
- Early exit if cleanup not yet initiated
Unlike the panic approach, this allows:
- System to continue running
- Other devices to function normally
- Diagnostic data to be collected
- Graceful degradation instead of catastrophic failure
Testing and Validation
Reproduction
The issue can be reproduced by:
- Creating a SCST user-space device
- Sending I/O commands
- Killing the user-space handler process abruptly
- Attempting to unload the device
This leaves commands in intermediate states, triggering the deadlock.
Expected Behavior After Fix
1 | # dmesg output during cleanup with stuck commands |
Device cleanup completes successfully without system crash!
Alternative Solutions Considered
Option 1: Fix the State Machine
Prevent commands from becoming zombies in the first place by ensuring proper reference counting and state transitions.
Pros:
- Addresses root cause
- No workarounds needed
Cons:
- Requires extensive refactoring
- High risk of introducing new bugs
- Difficult to guarantee all edge cases covered
Verdict: Long-term goal, but needs immediate fix first
Option 2: Timeout with Warning Only
Log an error but don’t force cleanup, just break the loop.
Pros:
- Simple implementation
- Non-intrusive
Cons:
- May leak memory (stuck commands not freed)
- Could accumulate over multiple cleanup attempts
- No guarantee of forward progress
Verdict: Too risky for resource leaks
Option 3: The Implemented Hybrid Approach
Diagnostics + Force cleanup + Graceful degradation
Pros:
- ✅ Prevents system crash
- ✅ Provides debugging information
- ✅ Guarantees cleanup completion
- ✅ Frees stuck resources
- ✅ Minimal code changes
- ✅ Low risk of side effects
Cons:
- Force cleanup may leak kernel memory in pathological cases
- Doesn’t fix underlying race conditions
Verdict: Best balance of safety, effectiveness, and maintainability
Lessons Learned
1. Infinite Loops in Kernel Space Are Deadly
Unlike user space, kernel infinite loops can:
- Trigger soft lockup detection
- Hang the entire system
- Require hard reboot
- Lose all in-flight operations
Always have an escape hatch!
2. Reference Counting Is Hard
The zombie command issue stems from a classic reference counting problem:
- Commands can reach refcount 0 while still in data structures
ucmd_get_check()tries to increment from 0 → fails- Command is un-reachable but still “exists”
Solution: Ensure removal from data structures happens atomically with final refcount decrement.
3. Completions Need Guarantees
The wait_for_completion(&dev->cleanup_cmpl) in dev_user_exit_dev() will block forever if complete_all() is never called.
Always ensure completion paths are reachable, even in error scenarios.
4. Diagnostics Are Crucial
The panic approach provided no information about:
- What commands were stuck
- Why they were stuck
- How to prevent it
The new approach logs:
- Command states
- Reference counts
- Flags
This enables root cause analysis and eventual proper fix.
5. Graceful Degradation > Catastrophic Failure
A storage system that crashes the entire server on cleanup failure is worse than one that:
- Logs errors
- Forces cleanup
- Continues operation
Users prefer logs and potential minor resource leaks over complete system outages.
Future Work
Short Term
- Implement force cleanup (completed in this fix)
- Add metrics for stuck command frequency
- Create reproducer test case
- Upstream the fix to SCST maintainers
Long Term
- Refactor command lifecycle to eliminate zombie state possibility
- Implement per-command timeout tracking
- Add lockdep annotations for lock ordering
- Redesign cleanup sequence to avoid wait_for_completion
Conclusion
The SCST infinite loop deadlock is a textbook example of kernel synchronization gone wrong:
- Complex state machine with multiple threads
- Reference counting bugs creating zombie objects
- Infinite loop with unreachable exit condition
- Blocking wait for completion that never gets signaled
The fix demonstrates defensive programming principles:
- Detect - Identify when abnormal conditions occur
- Diagnose - Log detailed state for debugging
- Recover - Take corrective action to restore progress
- Degrade Gracefully - Prefer operation with warnings over complete failure
This approach transforms a system-crashing bug into a manageable edge case with clear diagnostics and automatic recovery.
Code Locations
- File:
scst/src/dev_handlers/scst_user.c - Function:
dev_user_process_cleanup()(line 3851) - Related Functions:
dev_user_unjam_dev()(line 2680)dev_user_exit_dev()(line 3747)dev_user_get_next_cmd()(line 2174)
References
- SCST Project: http://scst.sourceforge.net/
- Issue Tracker: https://github.corp.ebay.com/tess-contrib/scst/issues/5
- Fix Commit: (will be updated after merge)
Tags: #kernel #deadlock #scst #storage #debugging #linux #reference-counting #synchronization