BPF CPU Usage High Issue: Root Cause Analysis and Fix
BPF CPU Usage High issue
Objective
Problem Statement
https://docs.google.com/document/d/1HlvGBoT8gL3LToCIB8KH88EG4toh7YiiQnNApNyosOo/edit
We met the bpf cpu high issue many times after the cilium conntrack table was full. It impacts the L4/L7 traffic reliability and also impacts the reliability confidence about cilium and the cilium rollout on tlb/gateway nodes.
We have taken some measures to try to avoid the conntrack table from getting full. However, if the load further increases or if the garbage collection is delayed for some reason, the conntrack table can still become full, which will trigger this issue. Therefore, we aim to address this problem fundamentally.