系统设计:设计一个支付系统(支付宝/Stripe 级别)
支付系统是互联网最核心、最复杂的业务系统之一,直接关系到资金安全。本文从系统设计角度,拆解如何设计一个类支付宝/Stripe 量级的支付平台,覆盖核心支付链路、幂等、对账、风控等关键设计。
支付系统是互联网最核心、最复杂的业务系统之一,直接关系到资金安全。本文从系统设计角度,拆解如何设计一个类支付宝/Stripe 量级的支付平台,覆盖核心支付链路、幂等、对账、风控等关键设计。
电商系统是系统设计面试中最经典的题目之一,覆盖商品、订单、库存、搜索、推荐、支付等几乎所有核心子系统。本文从全局视角拆解一个淘宝/Amazon 量级的电商平台设计。
搜索引擎是互联网最核心的基础设施之一。本文从系统设计角度,拆解如何从零设计一个类 Google 的大规模搜索系统,覆盖爬取、索引、检索、排序全链路。
VM volume attacher enhancement
Volume attach failed
| 2021-09-28T16:00:03.816682107-07:00 stderr F I0928 23:00:03.816558 1 cephvolume.go:73] attach disk &{XMLName:{Space: Local:} Device:disk RawIO: SGIO: Snapshot: Model: Driver:0xc00073b420 Auth:0xc000f49e40 Source:0xc000159860 BackingStore:<nil> Geometry:<nil> BlockIO:<nil> Mirror:<nil> Target:0xc000a58b40 IOTune:0xc00019b970 ReadOnly:<nil> Shareable:<nil> Transient:<nil> Serial:pvc-33003998-6624-4ac9-a923-d94f9401abdf WWN: Vendor: Product: Encryption:<nil> Boot:<nil> Alias:<nil> Address:0xc001420b40} error: virError(Code=27, Domain=20, Message=’XML error: target ‘vdf’ duplicated for disk sources ‘volume-0aab375c-1858-4f09-b276-ea297cd29a3d’ and ‘volume-63ef92c4-a027-476c-a2de-9fcf501dd4de’’) <disk type=’network’ device=’disk’> <driver name=’qemu’ type=’raw’ cache=’none’ io=’native’/> <auth username=’cinder’> <secret type=’ceph’ uuid=’1bcf1a49-b42f-4bc2-6e70-9ea7a4006740’/> </auth> <source protocol=’rbd’ name=’volumes/volume-0aab375c-1858-4f09-b276-ea297cd29a3d’ > <host name=’10.166.77.141’ port=’6789’/> <host name=’10.78.225.48’ port=’6789’/> <host name=’10.33.212.26’ port=’6789’/> <host name=’10.212.255.52’ port=’6789’/> <host name=’10.164.134.166’ port=’6789’/> </source> <target dev=’vdf’ bus=’virtio’/> <iotune> <total_bytes_sec>157286400</total_bytes_sec> <total_iops_sec>300</total_iops_sec> </iotune> <serial>pvc-22650282-34fe-412c-a5a3-1df0bdb3cadd</serial> <alias name=’virtio-disk5’/> <address type=’pci’ domain=’0x0000’ bus=’0x00’ slot=’0x09’ function=’0x0’/> </disk> ➜ ~ k. 130 tSwitched to context “130”. ➜ ~ tk get pv pvc-22650282-34fe-412c-a5a3-1df0bdb3cadd Error from server (NotFound): persistentvolumes “pvc-22650282-34fe-412c-a5a3-1df0bdb3cadd” not found |
|---|
Dump exists vdf device but persistent xml not found.
| I0721 03:39:04.965504 1 cephvolume.go:73] attach disk &{XMLName:{Space: Local:} Device:disk RawIO: SGIO: Snapshot: Model: Driver:0xc0015061c0 Auth:0xc00294c880 Source:0xc0004db310 BackingStore:<nil> Geometry:<nil> BlockIO:<nil> Mirror:<nil> Target:0xc00685fd00 IOTune:0xc0014fc8f0 ReadOnly:<nil> Shareable:<nil> Transient:<nil> Serial:pvc-20e1dd78-f543-40ae-bdb0-3eb74f0ffb1c WWN: Vendor: Product: Encryption:<nil> Boot:<nil> Alias:<nil> Address:0xc001508cc0} error: virError(Code=1, Domain=10, Message=’internal error: unable to execute QEMU command ‘device_add’: Property ‘virtio-blk-device.drive’ can’t find value ‘drive-virtio-disk15’’) |
|---|
Try to use rbd info to find the volume, if not found, report the error.
| 2023-03-28T16:46:39.34066771-07:00 stderr F I0328 23:46:39.340535 1 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF 2023-03-28T16:46:39.340722015-07:00 stderr F I0328 23:46:39.340600 1 reflector.go:371] tess.io/ebay/vm-volume/pkg/controller/attach/attach_controller.go:285: Watch close - *v1.VmVolumeAttachment total 2 items received 2023-03-28T16:47:39.094070215-07:00 stderr F I0328 23:47:39.093952 1 reflector.go:371] tess.io/ebay/vm-volume/pkg/controller/attach/attach_controller.go:286: Watch close - *v1.Node total 1007 items received 2023-03-28T16:49:39.350681116-07:00 stderr F I0328 23:49:39.350572 1 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF 2023-03-28T16:49:39.35071423-07:00 stderr F I0328 23:49:39.350619 1 reflector.go:371] tess.io/ebay/vm-volume/pkg/controller/attach/attach_controller.go:285: Watch close - *v1.VmVolumeAttachment total 0 items received 2023-03-28T16:51:29.857528981-07:00 stderr F I0328 23:51:29.857397 1 attach_controller.go:453] VmVolumeAttachment pvc-891a1e4f-fe72-447d-ac29-efea9675bc51.tess-node-7fm8k is already attached phase 2023-03-28T16:51:29.857578201-07:00 stderr F I0328 23:51:29.857451 1 attach_controller.go:252] tess140/pvc-891a1e4f-fe72-447d-ac29-efea9675bc51.tess-node-7fm8k added to queue 2023-03-28T16:51:32.341806809-07:00 stderr F I0328 23:51:32.341711 1 attach_controller.go:453] VmVolumeAttachment pvc-891a1e4f-fe72-447d-ac29-efea9675bc51.tess-node-7fm8k is already attached phase 2023-03-28T16:51:32.341832149-07:00 stderr F I0328 23:51:32.341743 1 attach_controller.go:252] tess140/pvc-891a1e4f-fe72-447d-ac29-efea9675bc51.tess-node-7fm8k added to queue 2023-03-28T16:51:32.341835164-07:00 stderr F E0328 23:51:32.341758 1 attach_controller.go:323] vmvolumeattachment “tess140/pvc-891a1e4f-fe72-447d-ac29-efea9675bc51.tess-node-7fm8k” in work queue no longer exists 2023-03-28T16:54:35.827164294-07:00 stderr F I0328 23:54:35.827044 1 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF 2023-03-28T16:54:35.827197159-07:00 stderr F I0328 23:54:35.827105 1 reflector.go:371] tess.io/ebay/vm-volume/pkg/controller/attach/attach_controller.go:285: Watch close - *v1.VmVolumeAttachment total 3 items received 2023-03-28T16:54:37.095045006-07:00 stderr F I0328 23:54:37.094917 1 reflector.go:371] tess.io/ebay/vm-volume/pkg/controller/attach/attach_controller.go:286: Watch close - *v1.Node total 1056 items received 2023-03-28T16:54:40.55082912-07:00 stderr F I0328 23:54:40.550708 1 attach_controller.go:453] VmVolumeAttachment pvc-d0c67ff7-aeb0-4e13-b373-507b335a67fb.tess-node-2nb47 is already attached phase |
|---|
All attacher happened EOF at the same time.
| I0330 00:28:55.070331 1 cephvolume.go:73] attach disk &{XMLName:{Space: Local:} Device:disk RawIO: SGIO: Snapshot: Model: Driver:0xc000024380 Auth:0xc006a51600 Source:0xc000640cd0 BackingStore:<nil> Geometry:<nil> BlockIO:<nil> Mirror:<nil> Target:0xc00342db80 IOTune:0xc005071c30 ReadOnly:<nil> Shareable:<nil> Transient:<nil> Serial:pvc-39c80157-0862-433c-a1ec-49475db818cf WWN: Vendor: Product: Encryption:<nil> Boot:<nil> Alias:<nil> Address:0xc000f5e240} error: virError(Code=27, Domain=20, Message=’XML error: Attempted double use of PCI Address 0000:00:0a.0’) |
|---|
Auto recovered first time : https://tessio.slack.com/archives/C03JP804D/p1670377925841209
Dumpxml not found the 0x0a address but it is in persistent file and used by a pv. The pv detached failed because not found in dump like detach failed case 1.
The address can transfer to pci address:
<address type=’pci’ domain=’0x0000’ bus=’0x00’ slot=’0x0a’ function=’0x0’/>
So it is a slot conflict issue. Need skip the slot.
Volume detach failed
| I0330 00:38:54.615254 1 cephvolume.go:227] detach disk &{XMLName:{Space: Local:disk} Device:disk RawIO: SGIO: Snapshot: Model: Driver:0xc000736540 Auth:0xc0027b68a0 Source:0xc000641040 BackingStore:<nil> Geometry:<nil> BlockIO:<nil> Mirror:<nil> Target:0xc004557640 IOTune:0xc0006e4370 ReadOnly:<nil> Shareable:<nil> Transient:<nil> Serial:pvc-febae406-15ad-4d05-9c93-b2d09c197840 WWN: Vendor: Product: Encryption:<nil> Boot:<nil> Alias:<nil> Address:0xc00709cba0} error: virError(Code=9, Domain=10, Message=’operation failed: disk vde not found’) |
|---|
This case can not directly return nil because the device still exists in the VM. If returned, success might produce complex conditions. Attacher will retry to check and auto return success if actually detach.
| I0417 07:57:33.764883 1 iscsivolume.go:176] detach disk &{XMLName:{Space: Local:disk} Device:disk RawIO: SGIO: Snapshot: Model: Driver:0xc004d3b0a0 Auth:<nil> Source:0xc000a66b90 BackingStore:<nil> Geometry:<nil> BlockIO:<nil> Mirror:<nil> Target:0xc009aefdc0 IOTune:<nil> ReadOnly:<nil> Shareable:<nil> Transient:<nil> Serial:pvc-9ad2511a-d8d9-4b14-b285-7acb4ef33800 WWN: Vendor: Product: Encryption:<nil> Boot:<nil> Alias:<nil> Address:0xc0048ce780} error: virError(Code=55, Domain=20, Message=’Requested operation is not valid: domain is not running’) root@tess-node-bf7rf-tess93:/# virsh list --all Id Name State -——————————————————- - virtlet-19c7472c-40a1-tess-node-69wjv shut off |
|---|
Cgroup V2 offers a unified hierarchy, better IO QoS — including buffer IO throttling — and cleaner semantics compared to V1. This post documents the end-to-end process of migrating a production Kubernetes cluster to cgroup v2: component version requirements, kernel boot parameters, and compatibility verification results for CPU, memory, PID, hugetlb, and IO controllers.
When a large number of pods using local CSI inline volumes are created and deleted concurrently on a single node, the LVM command hangs and the entire node’s disk operations become unavailable. This post analyzes the root cause — unbounded goroutine concurrency — and describes the fix: a FIFO queue with a bounded worker pool.
When a pod is terminating and the kubelet restarts or shuts down at the same time, a CSI inline (ephemeral) volume can be left as an orphan — kubelet skips both the unmount and the cleanup of the volume after it restarts. This post walks through the root cause and the fix.
Systematic benchmarks of the Cgroup V2 IO controller on production-grade Kubernetes nodes, covering io.max (hard bandwidth/IOPS limits), io.weight (proportional scheduling), and io.cost.qos (latency-based QoS), across Direct IO vs Buffer IO, raw disk vs LVM, and ext4 vs xfs combinations.
本文记录在生产级 Kubernetes 节点上对 Cgroup V2 IO 控制器的系统性基准测试,覆盖 io.max(带宽/IOPS 硬限)、io.weight(权重调度)、io.cost.qos(延迟模型 QoS)三类控制机制,以及 Direct IO vs Buffer IO、raw disk vs LVM、ext4 vs xfs 等不同组合下的实测数据。
Cgroup V2 相较于 V1 提供了统一层级、更完善的 IO QoS 支持,尤其是对 Buffer IO 的限速能力,是 Kubernetes 集群提升资源利用率的重要基础。本文记录在生产 Kubernetes 集群上迁移到 Cgroup V2 的完整过程:依赖版本要求、启用步骤,以及 CPU、内存、PID、IO 等各资源控制器的兼容性验证结果。