Virtlet 深度学习笔记:Kubernetes VM 运行时架构与代码分析
Virtlet learning
Key Components
- Virtlet Manager:
- Implements the CRI interface for virtualization and image handling
- Processes requests from kubelet
- Sets up libvirt VM environment (virtual drives, network interfaces, resources)
- Manages VM lifecycle through libvirt
- Tapmanager:
- Controls VM networking using CNI
- Takes setup requests from Virtlet manager
- Runs DHCP server for each active VM
- Serves requests from vmwrapper
- Tears down VM networks upon Virtlet manager requests
- VMWrapper:
- Run by libvirt, wraps the emulator (QEMU/KVM)
- Requests tap file descriptor from tapmanager
- Adds command line arguments for the emulator
- Execs the emulator
- Libvirt:
- Manages VM lifecycle
- Provides API for VM operations
- QEMU/KVM:
- The actual emulator that runs VMs
- CRI Proxy:
- Allows running multiple CRI implementations on the same node
- Routes requests to appropriate runtime (Virtlet or dockershim)
Volume Management
Virtlet supports various volume types:
- Root Volumes: The main VM disk
- Cloud-Init Volumes: For VM configuration
- Raw Devices: Direct access to host devices
- Kubernetes Volumes: Integration with Kubernetes volume system
The volume management is handled through:
VMVolumeSourceinterface- Various volume implementations (rootfs, cloudinit, raw, etc.)
- Libvirt storage pools
Networking Architecture
VM Lifecycle Management
Code Flow
Detailed Code Flow Explanation
1. Pod Creation and Annotation Parsing
- Starting Point: Kubernetes creates a pod with Virtlet-specific annotations
- Key Files:
pkg/metadata/types/annotations.go - Process:
- The
VirtletDiskDriverannotation specifies the disk driver type (virtio, scsi, or nvme) - Annotations are parsed in
parsePodAnnotationsmethod
- The
2. Disk Driver Selection
- Key Files:
pkg/libvirttools/diskdriver.go - Process:
getDiskDriverFactoryselects the appropriate driver factory based on the annotation- Driver factories:
virtioBlkDriverFactory,scsiDriverFactory, ornvmeDriverFactory - Each factory creates a driver implementing the
diskDriverinterface
3. Volume Source and Volume Creation
- Key Files:
pkg/libvirttools/volumes.go,pkg/libvirttools/virtualization.go - Process:
volumeSourcefunction createsVMVolumeobjects for each required volume- Volumes include root disk, cloud-init config, and additional volumes from flexvolume
4. Disk List Setup
- Key Files:
pkg/libvirttools/disklist.go - Process:
newDiskListcreates a list of disk items, each with a driver and volumediskList.setupcallsvolume.Setup()for each volume to get disk definitions- Each disk definition gets its target from the corresponding driver
5. Domain Creation
- Key Files:
pkg/libvirttools/virtualization.go - Process:
createDomainbuilds the libvirt domain XML structure- Disk definitions from
diskList.setupare added to the domain devices - For NVMe disks, the target is set to
nvmeXn1with bus typenvme
6. Domain Definition and Start
- Key Files:
pkg/libvirttools/virtualization.go - Process:
- Domain is defined in libvirt using
DefineDomain diskList.writeImageswrites any necessary disk images (e.g., cloud-init)- Domain is started, launching QEMU with the configured devices
- Domain is defined in libvirt using
Device Mapping Details
For SCSI Disks (default):
- The domain includes a SCSI controller
- Disks are attached to this controller with names like
sda,sdb - SCSI addressing is used to connect disks to the controller
For virtio-blk Disks:
- Disks are attached directly to the PCI bus
- Disk names follow the pattern
vda,vdb - No controller is needed, simplifying the setup
Key Insights
- Modular Design: Virtlet uses a modular architecture with clear separation of concerns between components.
- Integration with Kubernetes: Fully implements the CRI interface, allowing seamless integration with Kubernetes.
- Networking: Uses CNI for network setup and runs a DHCP server for each VM.
- Volume Management: Flexible volume system supporting various volume types and Kubernetes volume integration.
- Resource Management: Supports CPU and memory limits, CPU pinning, and NUMA topology.
This architecture allows Virtlet to run VMs as if they were containers from Kubernetes’ perspective, providing a way to run legacy applications or workloads that require full virtualization in a Kubernetes environment.
Questions:
Core Interface: FDClient/FDServer Protocol
The communication between vmwrapper and tapmanager is primarily facilitated through the FDClient/FDServer protocol, which allows passing file descriptors across process boundaries.
Key Functions in vmwrapper
In cmd/vmwrapper/vmwrapper.go:
// Main function that retrieves network FDs from tapmanager
if netFdKey != “” {
c := tapmanager.NewFDClient(fdSocketPath)
fds, marshaledData, err := c.GetFDs(netFdKey)
if err != nil {
glog.Errorf(“Failed to obtain tap fds for key %q: %v”, netFdKey, err)
os.Exit(1)
}
var descriptions \[\]tapmanager.InterfaceDescription
if err := json.Unmarshal(marshaledData, \&descriptions); err \!= nil {
glog.Errorf("Failed to unmarshal network interface info: %v", err)
os.Exit(1)
}
// ...
}
This code in vmwrapper:
Creates a new FDClient connected to tapmanager’s socket
Calls GetFDs() with the network key to retrieve file descriptors
Unmarshals the interface descriptions
Uses these FDs to configure QEMU network devices
Key Functions in tapmanager
In pkg/tapmanager/fdserver.go:
// FDServer.serveGet - handles GetFDs requests from clients
func (s *FDServer) serveGet(c *net.UnixConn, hdr *fdHeader) (*fdHeader, []byte, []byte, error) {
key := hdr.getKey()
fds, err := s.getFDs(key)
if err != nil {
return nil, nil, nil, err
}
info, err := s.source.GetInfo(key)
if err != nil {
return nil, nil, nil, fmt.Errorf(“can’t get key info: %v”, err)
}
// ... prepare and return file descriptors ...
rights := syscall.UnixRights(fds...)
return \&fdHeader{/\*...\*/}, info, rights, nil
}
In pkg/tapmanager/tapfdsource.go:
// GetFDs implements GetFDs method of FDSource interface
func (s *TapFDSource) GetFDs(key string, data []byte) ([]int, []byte, error) {
var payload GetFDPayload
if err := json.Unmarshal(data, &payload); err != nil {
return nil, nil, fmt.Errorf(“error unmarshalling GetFD payload: %v”, err)
}
// ... network namespace and CNI setup ...
// Setup container side network
if csn, err \= nettools.SetupContainerSideNetwork(netConfig, netNSPath, allLinks, s.enableSriov, hostNS); err \!= nil {
return nil, err
}
// Marshal network configuration to return to client
if respData, err \= json.Marshal(csn); err \!= nil {
return nil, fmt.Errorf("error marshalling net config: %v", err)
}
// Collect file descriptors for tap devices
for \_, i := range csn.Interfaces {
fds \= append(fds, int(i.Fo.Fd()))
}
return fds, respData, nil
}
Key Interfaces
FDSource Interface
// FDSource denotes an ‘executive’ part for FDServer which
// creates and destroys (closes) the file descriptors and
// associated resources
type FDSource interface {
// GetFDs sets up file descriptors based on key and extra data
GetFDs(key string, data []byte) ([]int, []byte, error)
// Release destroys the file descriptor and associated resources
Release(key string) error
// GetInfo returns information to propagate back to FDClient
GetInfo(key string) (\[\]byte, error)
// Recover recovers FDSource's state after restart
Recover(key string, data \[\]byte) error
// RetrieveFDs retrieves file descriptors
RetrieveFDs(key string) (\[\]int, error)
// Stop stops goroutines associated with FDSource
Stop() error
}
FDManager Interface
// FDManager denotes an object that provides ‘master’-side
// functionality of FDClient
type FDManager interface {
// AddFDs adds new file descriptor to the FDManager
AddFDs(key string, data interface{}) ([]byte, error)
// ReleaseFDs makes FDManager close the file descriptor
ReleaseFDs(key string) error
// Recover recovers the state regarding the specified key
Recover(key string, data interface{}) error
}
InterfaceDescription Struct
// InterfaceDescription contains interface type with additional data
// needed to identify it
type InterfaceDescription struct {
Type network.InterfaceType `json:”type”`
HardwareAddr net.HardwareAddr `json:”mac”`
FdIndex int `json:”fdIndex”`
PCIAddress string `json:”pciAddress”`
}
Network Setup Functions
The actual network setup is handled by nettools.SetupContainerSideNetwork(), which:
Creates tap devices
Sets up bridges
Configures networking
Returns a ContainerSideNetwork structure with interface descriptions
Data Flow Between Components
Virtlet creates a VM and generates a network key
Virtlet passes this key to vmwrapper via environment variables
vmwrapper connects to tapmanager’s socket and calls GetFDs(key)
tapmanager calls TapFDSource.GetFDs() to set up networking
TapFDSource uses nettools to create and configure tap devices
TapFDSource returns file descriptors and interface descriptions
vmwrapper uses these to configure QEMU network devices
QEMU uses the file descriptors to communicate with the tap devices
This architecture allows for clean separation between the VM process and the network setup, with file descriptors being the primary interface between the components.
Key Functions and Interfaces Between Virtlet and vmwrapper
Overview
The interaction between Virtlet and vmwrapper is primarily through environment variables and the libvirt domain definition. Virtlet configures the VM domain and sets vmwrapper as the emulator, passing necessary configuration through environment variables.
Key Functions in Virtlet (VirtualizationTool)
Domain Creation
In pkg/libvirttools/virtualization.go, the CreateContainer method is responsible for defining the libvirt domain with vmwrapper as the emulator:
func (v *VirtualizationTool) CreateContainer(config *types.VMConfig, netFdKey string) (string, error) {
// …
settings := domainSettings{
domainUUID: domainUUID,
domainName: “virtlet-“ + domainUUID[:13] + “-“ + config.Name,
netFdKey: netFdKey,
// … other settings
}
domainDef := settings.createDomain(config)
// ...
}
Domain Definition
The createDomain method in domainSettings sets up the domain definition with vmwrapper as the emulator and passes configuration through environment variables:
func (ds *domainSettings) createDomain(config *types.VMConfig) *libvirtxml.Domain {
// …
domain := &libvirtxml.Domain{
Devices: &libvirtxml.DomainDeviceList{
Emulator: “/vmwrapper”,
// … other devices
},
// … other domain settings
QEMUCommandline: \&libvirtxml.DomainQEMUCommandline{
Envs: \[\]libvirtxml.DomainQEMUCommandlineEnv{
{Name: vconfig.EmulatorEnvVarName, Value: emulator},
{Name: vconfig.NetKeyEnvVarName, Value: ds.netFdKey},
{Name: vconfig.ContainerIDEnvVarName, Value: config.DomainUUID},
{Name: vconfig.LogPathEnvVarName, Value: filepath.Join(config.LogDirectory, config.LogPath)},
{Name: vconfig.NetworkDeviceEnvVarName, Value: config.ParsedAnnotations.NetworkDevice},
},
},
}
// ...
return domain
}
Environment Variables (Communication Interface)
The key environment variables used for communication between Virtlet and vmwrapper are defined in pkg/config/constants.go:
const (
// ContainerIDEnvVarName contains name of env variable passed from virtlet to vmwrapper
ContainerIDEnvVarName = “VIRTLET_CONTAINER_ID”
// CpusetsEnvVarName contains name of env variable passed from virtlet to vmwrapper
CpusetsEnvVarName = “VIRTLET_CPUSETS”
// EmulatorEnvVarName contains name of env variable passed from virtlet to vmwrapper
EmulatorEnvVarName = “VIRTLET_EMULATOR”
// LogPathEnvVarName contains name of env variable passed from virtlet to vmwrapper
LogPathEnvVarName = “VIRTLET_CONTAINER_LOG_PATH”
// NetKeyEnvVarName contains name of env variable passed from virtlet to vmwrapper
NetKeyEnvVarName = “VIRTLET_NET_KEY”
// Network device
NetworkDeviceEnvVarName = “VIRTLET_NETWORK_DEVICE”
)
Key Functions in vmwrapper
In cmd/vmwrapper/vmwrapper.go, the main function processes these environment variables:
func main() {
// …
emulator := os.Getenv(config.EmulatorEnvVarName)
emulatorArgs := os.Args[1:]
var netArgs []string
if emulator == “” {
// this happens during ‘qemu -help’ invocation by libvirt
// (capability check)
emulator = defaultEmulator
} else {
netFdKey := os.Getenv(config.NetKeyEnvVarName)
// …
if netFdKey != “” {
c := tapmanager.NewFDClient(fdSocketPath)
fds, marshaledData, err := c.GetFDs(netFdKey)
// …
// Process network interfaces
// …
}
}
args := append(\[\]string{emulator}, emulatorArgs...)
args \= append(args, netArgs...)
env := os.Environ()
if err := syscall.Exec(args\[0\], args, env); err \!= nil {
glog.Errorf("Can't exec emulator: %v", err)
os.Exit(1)
}
}
Data Flow Between Components
Virtlet creates a VM configuration with a unique domain UUID
Virtlet generates a network key for the VM
Virtlet defines a libvirt domain with:
/vmwrapper as the emulator
Environment variables containing configuration:
VIRTLET_EMULATOR: The actual QEMU emulator path
VIRTLET_NET_KEY: Key to retrieve network interfaces
VIRTLET_CONTAINER_ID: The domain UUID
VIRTLET_CONTAINER_LOG_PATH: Path for VM logs
VIRTLET_NETWORK_DEVICE: Network device type (e.g., “virtio”)
VIRTLET_CPUSETS: CPU sets for the VM (optional)
libvirt starts the domain, executing /vmwrapper with the environment variables
vmwrapper:
Reads the environment variables
Connects to tapmanager using the socket path
Retrieves network interfaces using the network key
Constructs QEMU command line arguments
Executes the actual QEMU emulator with the arguments
Key Interfaces
The primary interface between Virtlet and vmwrapper is the set of environment variables passed through the libvirt domain definition. These variables provide vmwrapper with all the information it needs to:
Identify the VM (container ID)
Locate the actual emulator to use
Set up networking by retrieving the appropriate file descriptors
Configure logging
Set CPU affinity (if specified)
This design allows Virtlet to remain in control of the VM configuration while delegating the actual execution and network setup to vmwrapper, which runs in a separate process.