SCST Infinite Loop: Root Cause Analysis and Fix

Posted on 2026-06-01 Edited on 2026-06-10 In Linux Kernel

SCST Infinite Loop: Root Cause Analysis and Fix

Summary

During device shutdown, dev_user_process_cleanup() spins at ~2 million
iterations per second, pins one CPU core, and triggers the kernel soft-lockup
detector within seconds. The root cause is a command stuck in ucmd_hash
with ref=1 because sgv_pool_free() caches the scatter-gather buffer on
the pool LRU instead of freeing it — so the allocator’s ucmd_put() callback
never fires and the reference count never reaches zero.

The fix: two sgv_pool_flush() calls added after the unjam loop in
dev_user_unjam_dev().

Background: The SCST User-Space Device Handler

SCST is a high-performance storage target subsystem for Linux. The scst_user
module allows user-space applications to implement SCSI target devices via a
character device interface.

Key data structures:

ucmd_hash: hash table tracking all active scst_user_cmd objects
ready_cmd_list: queue of commands ready for user-space processing
cleanup_cmpl: completion for device cleanup synchronization
ucmd_ref: per-command reference count; dev_user_free_ucmd() →
cmd_remove_hash() fires only when atomic_dec_and_test() returns true

Normal command lifecycle:

dev_user_alloc_ucmd()        ucmd_ref = 1
dev_user_alloc_pages()       ucmd_ref++ (ucmd_get) for each SG allocation
sent to user space           sent_to_user = 1
reply from user space        processing begins
dev_user_on_free_cmd()       SGV freed, one ucmd_put()
dev_user_free_ucmd()         ref reaches 0 → cmd_remove_hash()

The Symptom

Multiple scst_usr_release threads stuck in D state:

[Thu Jan 23 02:37:11 2025] task:scst_usr_releas state:D stack:    0 pid:334614
[Thu Jan 23 02:37:11 2025] Call Trace:
[Thu Jan 23 02:37:11 2025]  __schedule+0x23d/0x590
[Thu Jan 23 02:37:11 2025]  schedule+0x4e/0xb0
[Thu Jan 23 02:37:11 2025]  schedule_timeout+0xfb/0x140
[Thu Jan 23 02:37:11 2025]  wait_for_completion+0x24/0x30
[Thu Jan 23 02:37:11 2025]  dev_user_exit_dev.isra.0+0x16a/0x1e0 [scst_user]

The threads block on wait_for_completion(&dev->cleanup_cmpl), which is never
signaled because the cleanup thread is spinning in an infinite loop and never
reaches complete_all(&dev->cleanup_cmpl).

The Cleanup Flow

When a SCST user device is torn down:

dev_user_exit_dev() — unregisters the device, sets
dev->cleanup_done = 1, then blocks on
wait_for_completion(&dev->cleanup_cmpl).
dev_user_process_cleanup() — runs in a separate thread, drains
remaining commands, and calls complete_all(&dev->cleanup_cmpl) to
unblock step 1.

The exit condition requires rc1 == 0 (hash empty) and rc == -EAGAIN
(ready list empty) and cleanup_done:

while (1) {
    rc1 = dev_user_unjam_dev(dev);   /* returns number of cmds in hash */

    if (rc1 == 0 && rc == -EAGAIN && dev->cleanup_done)
        break;   /* normal exit */

    spin_lock_irq(&dev->udev_cmd_threads.cmd_list_lock);
    rc = dev_user_get_next_cmd(dev, &ucmd, false);
    if (rc == 0)
        dev_user_unjam_cmd(ucmd, 1, NULL);
    spin_unlock_irq(&dev->udev_cmd_threads.cmd_list_lock);
}
complete_all(&dev->cleanup_cmpl);   /* never reached */

If any command remains in ucmd_hash but is not in ready_cmd_list,
rc1 > 0 and rc == -EAGAIN simultaneously, and the loop has no exit.

Root Cause: SGV Pool Caching Strands a Reference

The stuck command

The command stuck in ucmd_hash has:

state        = UCMD_STATE_ON_FREE_SKIPPED (7)
scst_cmd     = NULL
ucmd_ref     = 1
sent_to_user = 0

State 7 is set in dev_user_on_free_cmd() when on_free_cmd_type is
SCST_USER_ON_FREE_CMD_IGNORE:

if (ucmd->dev->on_free_cmd_type == SCST_USER_ON_FREE_CMD_IGNORE) {
    ucmd->state = UCMD_STATE_ON_FREE_SKIPPED;
    goto out_reply;
}
...
out_reply:
    dev_user_process_reply_on_free(ucmd);

dev_user_process_reply_on_free() frees the SGV buffer and drops a reference:

static int dev_user_process_reply_on_free(struct scst_user_cmd *ucmd)
{
    dev_user_free_sgv(ucmd);   /* free the scatter-gather buffer */
    ucmd_put(ucmd);            /* drop one reference */
    return 0;
}

This looks correct. The problem is what dev_user_free_sgv() actually does.

`sgv_pool_free()` is a cache return, not a free

static void dev_user_free_sgv(struct scst_user_cmd *ucmd)
{
    if (ucmd->sgv) {
        sgv_pool_free(ucmd->sgv, &ucmd->dev->udev_mem_lim);
        ucmd->sgv = NULL;
    } else if (ucmd->data_pages) {
        ucmd_get(ucmd);
        __dev_user_free_sg_entries(ucmd);
    }
}

The SGV (scatter-gather vector) pool is a performance cache: it holds
recently freed SG buffers so future commands can reuse them without hitting
the page allocator. When sgv_pool_free() is called:

The SGV object is placed on the pool’s LRU cache.
The allocator’s free callback — dev_user_free_sg_entries() — is not called.
The ucmd_get() reference taken in dev_user_alloc_pages() is not released.

dev_user_free_sg_entries() (and its ucmd_put()) only fires when the pool
evicts a cached object — via an explicit sgv_pool_flush().

The complete reference count trace

Event	Operation	ucmd_ref
`dev_user_alloc_ucmd()`	`atomic_set(&ucmd_ref, 1)`	1
`dev_user_alloc_pages()`	`ucmd_get()` for first SG page	2
`dev_user_unjam_dev()`: `ucmd_get_check()`	bump to verify not zombie	3
`dev_user_unjam_cmd()` → `scst_cmd_done()` → `dev_user_on_free_cmd()` → `dev_user_free_sgv()` → `sgv_pool_free()`	SGV goes to pool LRU; `dev_user_free_sg_entries()` not called; alloc_pages ref not released	3
`dev_user_process_reply_on_free()`: `ucmd_put()`	3 → 2	2
`dev_user_unjam_dev()`: `ucmd_put()` for `ucmd_get_check` ref	2 → 1	1

cmd_remove_hash() fires only when atomic_dec_and_test() returns true (ref
reaches 0). It never does — the alloc_pages reference is never released because
dev_user_free_sg_entries() never fires. The ucmd stays in ucmd_hash
indefinitely.

Why the loop spins at 2 million iterations per second

After unjamming, the stuck ucmd has sent_to_user = 0 and is not in
ready_cmd_list. On every subsequent pass:

list_for_each_entry(ucmd, head, hash_list_entry) {
    res++;                   /* always incremented — hash is not empty */
    if (!ucmd->sent_to_user)
        continue;            /* always taken — sent_to_user == 0 */
}

res is non-zero (rc1 > 0) but no command is unjammed.
dev_user_get_next_cmd() returns -EAGAIN (ucmd not in ready_cmd_list).
Both functions acquire and release a spinlock in under a microsecond.
Result: ~2 million iterations per second, 100% CPU on one core, soft-lockup
detector fires within seconds.

The Fix: Post-Unjam SGV Pool Flush

Why the existing pre-unjam flush was not enough

dev_user_unjam_dev() already calls sgv_pool_flush() before the unjam
loop:

static int dev_user_unjam_dev(struct scst_user_dev *dev)
{
    sgv_pool_flush(dev->pool);       /* before unjamming */
    sgv_pool_flush(dev->pool_clust);

    spin_lock_irq(&dev->udev_cmd_threads.cmd_list_lock);
    /* ... unjam loop ... */
    spin_unlock_irq(&dev->udev_cmd_threads.cmd_list_lock);

    return res;
}

SGV objects are placed into the pool cache during unjamming — when
scst_cmd_done → dev_user_on_free_cmd → dev_user_free_sgv →
sgv_pool_free executes inside the unjam loop. A flush that precedes the loop
cannot evict objects that do not yet exist in the cache.

The fix

static int dev_user_unjam_dev(struct scst_user_dev *dev)
{
    sgv_pool_flush(dev->pool);       /* existing flush — before unjamming */
    sgv_pool_flush(dev->pool_clust);

    spin_lock_irq(&dev->udev_cmd_threads.cmd_list_lock);
    /* ... unjam loop ... */
    spin_unlock_irq(&dev->udev_cmd_threads.cmd_list_lock);

    /*
     * Flush again after unjamming. Unjamming calls sgv_pool_free(), which
     * caches the SGV object on the pool LRU instead of freeing it directly.
     * The pre-unjam flush above misses these objects. Without this second
     * flush, dev_user_free_sg_entries() never fires, the alloc_pages
     * ucmd_get() ref is never balanced, and the ucmd stays in ucmd_hash
     * indefinitely — causing dev_user_process_cleanup() to loop forever.
     */
    sgv_pool_flush(dev->pool);
    sgv_pool_flush(dev->pool_clust);

    return res;
}

sgv_pool_flush() is fully synchronous — it calls sgv_dtor_and_free()
inline in a while loop, so by the time it returns all eviction callbacks have
already fired. The call chain on eviction:

sgv_pool_flush()
  → dev_user_free_sg_entries()
      → __dev_user_free_sg_entries()
          → ucmd_put()               ← releases the alloc_pages ref
              → atomic_dec_and_test() → 0 → dev_user_free_ucmd()
                  → cmd_remove_hash()    ← ucmd removed from hash

On the next iteration dev_user_unjam_dev() returns res = 0, and
dev_user_process_cleanup() breaks normally — within 2–3 iterations.

Summary

	Detail
Symptom	`dev_user_process_cleanup()` loops at ~2M iter/s; soft lockup
Stuck ucmd	`state=7 (ON_FREE_SKIPPED)`, `ref=1`, not in ready list
Why ref stays at 1	`sgv_pool_free()` caches the SGV on the pool LRU; `dev_user_free_sg_entries()` never fires; the `ucmd_get()` from `dev_user_alloc_pages()` is never balanced
Why pre-unjam flush failed	Runs before unjamming; SGV objects are cached during unjamming
Fix	`sgv_pool_flush()` for both pools after the unjam loop
Fix size	2 function calls

Lesson: Pool Caches Decouple Free from Callback

The SGV pool decouples sgv_pool_free() from the actual page release. Code
that relies on “free → callback → ucmd_put” must account for the callback
firing on eviction, not on free. At teardown time, an explicit
sgv_pool_flush() is required to force eviction and drain all outstanding
references before checking whether the hash is empty.

Tags: #kernel #scst #storage #debugging #linux #memory-management #sgv-pool #reference-counting

SCST Infinite Loop: Root Cause Analysis and Fix

Summary

Background: The SCST User-Space Device Handler

The Symptom

The Cleanup Flow

Root Cause: SGV Pool Caching Strands a Reference

The stuck command

sgv_pool_free() is a cache return, not a free

The complete reference count trace

Why the loop spins at 2 million iterations per second

The Fix: Post-Unjam SGV Pool Flush

Why the existing pre-unjam flush was not enough

The fix

Summary

Lesson: Pool Caches Decouple Free from Callback

`sgv_pool_free()` is a cache return, not a free