Deep Dive: SCST Kernel Module Infinite Loop and Deadlock Analysis

SCST Infinite Loop Deadlock: Root Cause Analysis and Fix

Executive Summary

This post provides a deep technical analysis of a critical infinite loop bug in the SCST (SCSI Target) kernel module that caused complete system hangs during device cleanup. The issue manifested as kernel tasks stuck in uninterruptible sleep (D state), eventually triggering soft lockup detection and system instability.

The Problem: During device shutdown, the cleanup routine enters an infinite loop when zombie commands remain in the hash table but not in the ready queue, creating a deadlock condition.

The Solution: Implemented comprehensive diagnostics, force cleanup mechanism with timeout handling, and proper command lifecycle management to break the deadlock.


Background: The SCST User-Space Device Handler

SCST is a professional, clustered, high-performance storage target subsystem for Linux. The scst_user module allows user-space applications to implement SCSI target devices via a character device interface.

Architecture Overview

1
2
3
4
5
6
7
8
9
10
User Space Application
↕️ (ioctl/read/write)
/dev/scst_user
↕️
scst_user kernel module
↕️
ucmd_hash (command tracking)
ready_cmd_list (commands ready for processing)
↕️
SCST core → Target drivers → Initiators

Key data structures:

  • ucmd_hash: Hash table tracking all active commands
  • ready_cmd_list: Queue of commands ready for user-space processing
  • cleanup_cmpl: Completion for device cleanup synchronization

The Issue: Kernel Soft Lockup During Device Cleanup

Symptom

Multiple scst_usr_release kernel threads stuck in D state (uninterruptible sleep):

1
2
3
4
5
6
7
[Thu Jan 23 02:37:11 2025] task:scst_usr_releas state:D stack:    0 pid:334614
[Thu Jan 23 02:37:11 2025] Call Trace:
[Thu Jan 23 02:37:11 2025] __schedule+0x23d/0x590
[Thu Jan 23 02:37:11 2025] schedule+0x4e/0xb0
[Thu Jan 23 02:37:11 2025] schedule_timeout+0xfb/0x140
[Thu Jan 23 02:37:11 2025] wait_for_completion+0x24/0x30
[Thu Jan 23 02:37:11 2025] dev_user_exit_dev.isra.0+0x16a/0x1e0 [scst_user]

The threads are waiting for wait_for_completion(&dev->cleanup_cmpl) which never gets signaled because the cleanup thread is stuck in an infinite loop.


Root Cause Analysis

The Cleanup Flow

When a SCST user device is being torn down, this sequence occurs:

  1. dev_user_exit_dev() (scst_user.c:3747)

    • Unregisters device from SCST core
    • Sets dev->cleanup_done = 1
    • Waits for wait_for_completion(&dev->cleanup_cmpl) at line 3779
  2. dev_user_process_cleanup() (scst_user.c:3851) - runs in cleanup thread

    • Processes remaining commands
    • Eventually should call complete_all(&dev->cleanup_cmpl) at line 3898
    • This is where the infinite loop occurs

The Infinite Loop Logic

Let’s examine the problematic code in dev_user_process_cleanup():

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
while (1) {
int rc1;

// Count commands in ucmd_hash and try to unjam them
rc1 = dev_user_unjam_dev(dev);

// Exit condition: no commands AND no ready commands AND cleanup done
if (rc1 == 0 && rc == -EAGAIN && dev->cleanup_done)
break;

spin_lock_irq(&dev->udev_cmd_threads.cmd_list_lock);

// Try to get next ready command
rc = dev_user_get_next_cmd(dev, &ucmd, false);
if (rc == 0)
dev_user_unjam_cmd(ucmd, 1, NULL);

spin_unlock_irq(&dev->udev_cmd_threads.cmd_list_lock);

if (rc == -EAGAIN) {
if (!dev->cleanup_done) {
goto out; // Normal early exit
}
// DEADLOCK: rc1 != 0 && rc == -EAGAIN && dev->cleanup_done
// Commands exist but can't be retrieved!
}
}

complete_all(&dev->cleanup_cmpl); // Never reached in deadlock

The Deadlock Condition

The infinite loop occurs when all three conditions are simultaneously true:

Condition Meaning Variable
rc1 != 0 Commands still exist in ucmd_hash Non-zero count from dev_user_unjam_dev()
rc == -EAGAIN No commands in ready_cmd_list From dev_user_get_next_cmd()
dev->cleanup_done == 1 Device teardown initiated Set by dev_user_exit_dev()

Why Commands Get Stuck

Looking at dev_user_unjam_dev() (scst_user.c:2680):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
static int dev_user_unjam_dev(struct scst_user_dev *dev)
{
int i, res = 0;
struct scst_user_cmd *ucmd;

// ...

repeat:
for (i = 0; i < ARRAY_SIZE(dev->ucmd_hash); i++) {
struct list_head *head = &dev->ucmd_hash[i];

list_for_each_entry(ucmd, head, hash_list_entry) {
res++; // Count EVERY command in hash

if (!ucmd->sent_to_user)
continue; // Skip but still counted!

if (ucmd_get_check(ucmd))
continue; // Reference check failed, still counted!

// Try to unjam...
dev_user_unjam_cmd(ucmd, 0, NULL);
// ...
goto repeat;
}
}

return res; // Returns count of commands in hash
}

Zombie commands can remain in ucmd_hash but not be unjammable if:

  • ucmd->sent_to_user == 0 - Never delivered to user space
  • ucmd_get_check(ucmd) fails - Reference count is 0 (being destroyed)
  • Command is in an intermediate state that dev_user_unjam_cmd() can’t handle

Command State Machine

Commands go through these states:

1
2
3
4
5
6
7
8
9
UCMD_STATE_PARSING

UCMD_STATE_BUF_ALLOCING

UCMD_STATE_EXECING ← Zombie commands can get stuck here

UCMD_STATE_ON_FREEING

[freed from ucmd_hash]

If a command’s reference count reaches 0 while still in ucmd_hash, it becomes a zombie:

  • ucmd_get_check() fails (atomic_inc_return returns 1 from 0)
  • Command can’t be retrieved from ready list (already removed)
  • Command can’t be unjammed (reference check fails)
  • But dev_user_unjam_dev() still counts it in res++

The Race Condition

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Thread 1: dev_user_exit_dev()          Thread 2: dev_user_process_cleanup()
───────────────────────────── ────────────────────────────────
unregister device
loop iteration:
rc1 = dev_user_unjam_dev()
→ finds 2 zombie commands
→ returns rc1 = 2
set cleanup_done = 1
rc = dev_user_get_next_cmd()
→ no ready commands
→ returns rc = -EAGAIN

Check: rc1=2 && rc=-EAGAIN && cleanup_done=1
→ INFINITE LOOP!

wait_for_completion(&cleanup_cmpl)
→ Blocks forever complete_all() never called

The Original Workaround: Panic

The initial workaround was to panic after 10,000 iterations:

1
2
3
4
loop_count++;
if (unlikely(loop_count > 10000)) {
panic("SCST panic: DeadLoop error occurred!");
}

Why This “Worked”

  • Prevents soft lockup detector from triggering
  • Provides clear evidence via kernel panic
  • Forces immediate attention to the problem

Why This Is Bad

  • ❌ Crashes the entire system
  • ❌ Loses all in-flight I/O across all devices
  • ❌ No recovery possible
  • ❌ Doesn’t fix the root cause
  • ❌ No diagnostic information about stuck commands

The Proper Solution

Multi-Layered Approach

  1. Diagnostic Layer - Identify stuck commands
  2. Timeout Layer - Detect prolonged loops
  3. Force Cleanup Layer - Break the deadlock
  4. Graceful Degradation - Continue cleanup after recovery

Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
static int dev_user_process_cleanup(struct scst_user_dev *dev)
{
struct scst_user_cmd *ucmd;
int rc = 0, res = 1;
int loop_count = 0;
int force_cleanup_threshold = 1000; /* ~1 second with msleep(1) */
int diagnostic_threshold = 100;

// ... initialization ...

while (1) {
int rc1;

rc1 = dev_user_unjam_dev(dev);
if (rc1 == 0 && rc == -EAGAIN && dev->cleanup_done)
break; // Normal exit

// ... command processing ...

if (rc == -EAGAIN) {
if (!dev->cleanup_done) {
goto out; // Normal early exit
}

loop_count++;

/* DIAGNOSTIC LAYER: Print diagnostics after 100 iterations */
if (unlikely(loop_count == diagnostic_threshold)) {
PRINT_WARNING("Cleanup loop detected for dev %p after %d iterations, "
"rc1=%d (stuck commands), rc=%d, cleanup_done=%d",
dev, loop_count, rc1, rc, dev->cleanup_done);

// Log all stuck commands with their states
spin_lock_irq(&dev->udev_cmd_threads.cmd_list_lock);
for (i = 0; i < ARRAY_SIZE(dev->ucmd_hash); i++) {
struct list_head *head = &dev->ucmd_hash[i];
struct scst_user_cmd *stuck_ucmd;

list_for_each_entry(stuck_ucmd, head, hash_list_entry) {
PRINT_WARNING("Stuck ucmd %p: state=%x, sent_to_user=%d, "
"jammed=%d, ref=%d, scst_cmd=%p",
stuck_ucmd, stuck_ucmd->state,
stuck_ucmd->sent_to_user,
stuck_ucmd->jammed,
atomic_read(&stuck_ucmd->ucmd_ref),
stuck_ucmd->cmd);
}
}
spin_unlock_irq(&dev->udev_cmd_threads.cmd_list_lock);
}

/* FORCE CLEANUP LAYER: Break deadlock after 1000 iterations */
if (unlikely(loop_count >= force_cleanup_threshold)) {
PRINT_ERROR("Force cleanup triggered after %d iterations for dev %p. "
"Breaking deadlock by forcefully releasing %d stuck commands.",
loop_count, dev, rc1);

spin_lock_irq(&dev->udev_cmd_threads.cmd_list_lock);

// Forcefully release all stuck commands
for (i = 0; i < ARRAY_SIZE(dev->ucmd_hash); i++) {
struct list_head *head = &dev->ucmd_hash[i];
struct scst_user_cmd *stuck_ucmd, *tmp;

list_for_each_entry_safe(stuck_ucmd, tmp, head, hash_list_entry) {
PRINT_ERROR("Force releasing stuck ucmd %p (state %x, ref %d)",
stuck_ucmd, stuck_ucmd->state,
atomic_read(&stuck_ucmd->ucmd_ref));

// Remove from hash
list_del(&stuck_ucmd->hash_list_entry);

// Force reference count to 1 then free
while (atomic_read(&stuck_ucmd->ucmd_ref) > 1)
ucmd_put(stuck_ucmd);

if (atomic_read(&stuck_ucmd->ucmd_ref) == 1)
ucmd_put(stuck_ucmd);
}
}

spin_unlock_irq(&dev->udev_cmd_threads.cmd_list_lock);

break; // Exit loop after force cleanup
}

/* CPU THROTTLING: Prevent tight spinning */
if (loop_count > 10)
msleep(1);
}
}

dev_user_check_lost_ucmds(dev);
complete_all(&dev->cleanup_cmpl); // Always reached now!
res = 0;

out:
return res;
}

Key Improvements

1. Diagnostic Logging (100 iterations)

Captures the state when the loop is detected:

  • Device pointer and iteration count
  • Return codes (rc1, rc, cleanup_done)
  • Every stuck command’s details:
    • State (parsing, executing, freeing, etc.)
    • Whether sent to user space
    • Jammed flag
    • Reference count
    • Associated SCST command pointer

This provides actionable debugging information instead of just panicking.

2. Force Cleanup (1000 iterations ≈ 1 second)

After a reasonable timeout:

  • Removes all commands from ucmd_hash
  • Forces reference counts down to prevent leaks
  • Frees stuck commands
  • Breaks the loop, allowing complete_all() to be called

3. CPU Throttling

After 10 iterations, adds msleep(1) to:

  • Prevent CPU spinning at 100%
  • Allow other threads to run
  • Reduce system load during cleanup

4. Guaranteed Progress

The loop will terminate either by:

  • Normal completion (all commands cleaned up)
  • Force cleanup after timeout
  • Early exit if cleanup not yet initiated

Unlike the panic approach, this allows:

  • System to continue running
  • Other devices to function normally
  • Diagnostic data to be collected
  • Graceful degradation instead of catastrophic failure

Testing and Validation

Reproduction

The issue can be reproduced by:

  1. Creating a SCST user-space device
  2. Sending I/O commands
  3. Killing the user-space handler process abruptly
  4. Attempting to unload the device

This leaves commands in intermediate states, triggering the deadlock.

Expected Behavior After Fix

1
2
3
4
5
6
7
8
9
10
11
12
13
# dmesg output during cleanup with stuck commands

[ 123.456] scst_user: Cleanup loop detected for dev ffff888123456789 after 100 iterations,
rc1=2 (stuck commands), rc=-11, cleanup_done=1
[ 123.457] scst_user: Stuck ucmd ffff888abcd1234: state=3, sent_to_user=1,
jammed=0, ref=0, scst_cmd=ffff888def5678
[ 123.458] scst_user: Stuck ucmd ffff888abcd5678: state=3, sent_to_user=0,
jammed=0, ref=0, scst_cmd=0000000000000000
[ 124.456] scst_user: Force cleanup triggered after 1000 iterations for dev ffff888123456789.
Breaking deadlock by forcefully releasing 2 stuck commands.
[ 124.457] scst_user: Force releasing stuck ucmd ffff888abcd1234 (state 3, ref 0)
[ 124.458] scst_user: Force releasing stuck ucmd ffff888abcd5678 (state 3, ref 0)
[ 124.459] scst_user: Cleanuping done (dev ffff888123456789)

Device cleanup completes successfully without system crash!


Alternative Solutions Considered

Option 1: Fix the State Machine

Prevent commands from becoming zombies in the first place by ensuring proper reference counting and state transitions.

Pros:

  • Addresses root cause
  • No workarounds needed

Cons:

  • Requires extensive refactoring
  • High risk of introducing new bugs
  • Difficult to guarantee all edge cases covered

Verdict: Long-term goal, but needs immediate fix first

Option 2: Timeout with Warning Only

Log an error but don’t force cleanup, just break the loop.

Pros:

  • Simple implementation
  • Non-intrusive

Cons:

  • May leak memory (stuck commands not freed)
  • Could accumulate over multiple cleanup attempts
  • No guarantee of forward progress

Verdict: Too risky for resource leaks

Option 3: The Implemented Hybrid Approach

Diagnostics + Force cleanup + Graceful degradation

Pros:

  • ✅ Prevents system crash
  • ✅ Provides debugging information
  • ✅ Guarantees cleanup completion
  • ✅ Frees stuck resources
  • ✅ Minimal code changes
  • ✅ Low risk of side effects

Cons:

  • Force cleanup may leak kernel memory in pathological cases
  • Doesn’t fix underlying race conditions

Verdict: Best balance of safety, effectiveness, and maintainability


Lessons Learned

1. Infinite Loops in Kernel Space Are Deadly

Unlike user space, kernel infinite loops can:

  • Trigger soft lockup detection
  • Hang the entire system
  • Require hard reboot
  • Lose all in-flight operations

Always have an escape hatch!

2. Reference Counting Is Hard

The zombie command issue stems from a classic reference counting problem:

  • Commands can reach refcount 0 while still in data structures
  • ucmd_get_check() tries to increment from 0 → fails
  • Command is un-reachable but still “exists”

Solution: Ensure removal from data structures happens atomically with final refcount decrement.

3. Completions Need Guarantees

The wait_for_completion(&dev->cleanup_cmpl) in dev_user_exit_dev() will block forever if complete_all() is never called.

Always ensure completion paths are reachable, even in error scenarios.

4. Diagnostics Are Crucial

The panic approach provided no information about:

  • What commands were stuck
  • Why they were stuck
  • How to prevent it

The new approach logs:

  • Command states
  • Reference counts
  • Flags

This enables root cause analysis and eventual proper fix.

5. Graceful Degradation > Catastrophic Failure

A storage system that crashes the entire server on cleanup failure is worse than one that:

  • Logs errors
  • Forces cleanup
  • Continues operation

Users prefer logs and potential minor resource leaks over complete system outages.


Future Work

Short Term

  • Implement force cleanup (completed in this fix)
  • Add metrics for stuck command frequency
  • Create reproducer test case
  • Upstream the fix to SCST maintainers

Long Term

  • Refactor command lifecycle to eliminate zombie state possibility
  • Implement per-command timeout tracking
  • Add lockdep annotations for lock ordering
  • Redesign cleanup sequence to avoid wait_for_completion

Conclusion

The SCST infinite loop deadlock is a textbook example of kernel synchronization gone wrong:

  • Complex state machine with multiple threads
  • Reference counting bugs creating zombie objects
  • Infinite loop with unreachable exit condition
  • Blocking wait for completion that never gets signaled

The fix demonstrates defensive programming principles:

  1. Detect - Identify when abnormal conditions occur
  2. Diagnose - Log detailed state for debugging
  3. Recover - Take corrective action to restore progress
  4. Degrade Gracefully - Prefer operation with warnings over complete failure

This approach transforms a system-crashing bug into a manageable edge case with clear diagnostics and automatic recovery.

Code Locations

  • File: scst/src/dev_handlers/scst_user.c
  • Function: dev_user_process_cleanup() (line 3851)
  • Related Functions:
    • dev_user_unjam_dev() (line 2680)
    • dev_user_exit_dev() (line 3747)
    • dev_user_get_next_cmd() (line 2174)

References


Tags: #kernel #deadlock #scst #storage #debugging #linux #reference-counting #synchronization