Kubernetes核心组件学习系列 - 完整指南与学习路线图
Kubernetes核心组件深度学习系列文章导航,提供系统性的学习路径和面试准备指南
Kubernetes核心组件深度学习系列文章导航,提供系统性的学习路径和面试准备指南
Cgroup V2 offers a unified hierarchy, better IO QoS — including buffer IO throttling — and cleaner semantics compared to V1. This post documents the end-to-end process of migrating a production Kubernetes cluster to cgroup v2: component version requirements, kernel boot parameters, and compatibility verification results for CPU, memory, PID, hugetlb, and IO controllers.
When a large number of pods using local CSI inline volumes are created and deleted concurrently on a single node, the LVM command hangs and the entire node’s disk operations become unavailable. This post analyzes the root cause — unbounded goroutine concurrency — and describes the fix: a FIFO queue with a bounded worker pool.
When a pod is terminating and the kubelet restarts or shuts down at the same time, a CSI inline (ephemeral) volume can be left as an orphan — kubelet skips both the unmount and the cleanup of the volume after it restarts. This post walks through the root cause and the fix.
Systematic benchmarks of the Cgroup V2 IO controller on production-grade Kubernetes nodes, covering io.max (hard bandwidth/IOPS limits), io.weight (proportional scheduling), and io.cost.qos (latency-based QoS), across Direct IO vs Buffer IO, raw disk vs LVM, and ext4 vs xfs combinations.
本文记录在生产级 Kubernetes 节点上对 Cgroup V2 IO 控制器的系统性基准测试,覆盖 io.max(带宽/IOPS 硬限)、io.weight(权重调度)、io.cost.qos(延迟模型 QoS)三类控制机制,以及 Direct IO vs Buffer IO、raw disk vs LVM、ext4 vs xfs 等不同组合下的实测数据。
Cgroup V2 相较于 V1 提供了统一层级、更完善的 IO QoS 支持,尤其是对 Buffer IO 的限速能力,是 Kubernetes 集群提升资源利用率的重要基础。本文记录在生产 Kubernetes 集群上迁移到 Cgroup V2 的完整过程:依赖版本要求、启用步骤,以及 CPU、内存、PID、IO 等各资源控制器的兼容性验证结果。
在使用 CSI Local Inline Volume 的节点上,当大量 Pod 并发创建和删除时,LVM 命令会出现长时间挂起,整个节点的磁盘操作进入不可用状态。本文分析其根因,并记录通过引入 FIFO 队列限制并发度的解决方案。
当 Pod 正在 Terminating 期间 kubelet 发生重启,CSI Inline Volume(ephemeral volume)可能进入”孤儿”状态——kubelet 重启后既不卸载该 volume,也不清理挂载点,导致底层 LVM 资源泄漏。本文分析其根因,并给出修复方案。
节点重启或 OS patching 时,本地磁盘可能短暂丢失,导致依赖 Local PV 的 Pod 启动失败——kubelet 找不到设备路径,mount 操作报错。本文记录一种通过 loop device 创建”假设备”(fake device)来解除 Pod 启动阻塞的工程方案,以及各类节点修复场景下的处理策略。