Virtlet 深度学习笔记:Kubernetes VM 运行时架构与代码分析

Virtlet learning

Key Components

  1. Virtlet Manager:
    • Implements the CRI interface for virtualization and image handling
    • Processes requests from kubelet
    • Sets up libvirt VM environment (virtual drives, network interfaces, resources)
    • Manages VM lifecycle through libvirt
  2. Tapmanager:
    • Controls VM networking using CNI
    • Takes setup requests from Virtlet manager
    • Runs DHCP server for each active VM
    • Serves requests from vmwrapper
    • Tears down VM networks upon Virtlet manager requests
  3. VMWrapper:
    • Run by libvirt, wraps the emulator (QEMU/KVM)
    • Requests tap file descriptor from tapmanager
    • Adds command line arguments for the emulator
    • Execs the emulator
  4. Libvirt:
    • Manages VM lifecycle
    • Provides API for VM operations
  5. QEMU/KVM:
    • The actual emulator that runs VMs
  6. CRI Proxy:
    • Allows running multiple CRI implementations on the same node
    • Routes requests to appropriate runtime (Virtlet or dockershim)

Volume Management

Virtlet supports various volume types:

  1. Root Volumes: The main VM disk
  2. Cloud-Init Volumes: For VM configuration
  3. Raw Devices: Direct access to host devices
  4. Kubernetes Volumes: Integration with Kubernetes volume system

The volume management is handled through:

  • VMVolumeSource interface
  • Various volume implementations (rootfs, cloudinit, raw, etc.)
  • Libvirt storage pools

Networking Architecture

VM Lifecycle Management

Code Flow

Detailed Code Flow Explanation

1. Pod Creation and Annotation Parsing

  • Starting Point: Kubernetes creates a pod with Virtlet-specific annotations
  • Key Files: pkg/metadata/types/annotations.go
  • Process:
    • The VirtletDiskDriver annotation specifies the disk driver type (virtio, scsi, or nvme)
    • Annotations are parsed in parsePodAnnotations method

2. Disk Driver Selection

  • Key Files: pkg/libvirttools/diskdriver.go
  • Process:
    • getDiskDriverFactory selects the appropriate driver factory based on the annotation
    • Driver factories: virtioBlkDriverFactory, scsiDriverFactory, or nvmeDriverFactory
    • Each factory creates a driver implementing the diskDriver interface

3. Volume Source and Volume Creation

  • Key Files: pkg/libvirttools/volumes.go, pkg/libvirttools/virtualization.go
  • Process:
    • volumeSource function creates VMVolume objects for each required volume
    • Volumes include root disk, cloud-init config, and additional volumes from flexvolume

4. Disk List Setup

  • Key Files: pkg/libvirttools/disklist.go
  • Process:
    • newDiskList creates a list of disk items, each with a driver and volume
    • diskList.setup calls volume.Setup() for each volume to get disk definitions
    • Each disk definition gets its target from the corresponding driver

5. Domain Creation

  • Key Files: pkg/libvirttools/virtualization.go
  • Process:
    • createDomain builds the libvirt domain XML structure
    • Disk definitions from diskList.setup are added to the domain devices
    • For NVMe disks, the target is set to nvmeXn1 with bus type nvme

6. Domain Definition and Start

  • Key Files: pkg/libvirttools/virtualization.go
  • Process:
    • Domain is defined in libvirt using DefineDomain
    • diskList.writeImages writes any necessary disk images (e.g., cloud-init)
    • Domain is started, launching QEMU with the configured devices

Device Mapping Details

For SCSI Disks (default):

  1. The domain includes a SCSI controller
  2. Disks are attached to this controller with names like sda, sdb
  3. SCSI addressing is used to connect disks to the controller

For virtio-blk Disks:

  1. Disks are attached directly to the PCI bus
  2. Disk names follow the pattern vda, vdb
  3. No controller is needed, simplifying the setup

Key Insights

  1. Modular Design: Virtlet uses a modular architecture with clear separation of concerns between components.
  2. Integration with Kubernetes: Fully implements the CRI interface, allowing seamless integration with Kubernetes.
  3. Networking: Uses CNI for network setup and runs a DHCP server for each VM.
  4. Volume Management: Flexible volume system supporting various volume types and Kubernetes volume integration.
  5. Resource Management: Supports CPU and memory limits, CPU pinning, and NUMA topology.

This architecture allows Virtlet to run VMs as if they were containers from Kubernetes’ perspective, providing a way to run legacy applications or workloads that require full virtualization in a Kubernetes environment.

Questions:

  1. Key Functions and Interfaces Between vmwrapper and tapmanager

Core Interface: FDClient/FDServer Protocol
The communication between vmwrapper and tapmanager is primarily facilitated through the FDClient/FDServer protocol, which allows passing file descriptors across process boundaries.

Key Functions in vmwrapper
In cmd/vmwrapper/vmwrapper.go:

// Main function that retrieves network FDs from tapmanager
if netFdKey != “” {
c := tapmanager.NewFDClient(fdSocketPath)
fds, marshaledData, err := c.GetFDs(netFdKey)
if err != nil {
glog.Errorf(“Failed to obtain tap fds for key %q: %v”, netFdKey, err)
os.Exit(1)
}

var descriptions \[\]tapmanager.InterfaceDescription  
if err := json.Unmarshal(marshaledData, \&descriptions); err \!= nil {  
    glog.Errorf("Failed to unmarshal network interface info: %v", err)  
    os.Exit(1)  
}  
// ...  

}
This code in vmwrapper:

Creates a new FDClient connected to tapmanager’s socket
Calls GetFDs() with the network key to retrieve file descriptors
Unmarshals the interface descriptions
Uses these FDs to configure QEMU network devices
Key Functions in tapmanager
In pkg/tapmanager/fdserver.go:

// FDServer.serveGet - handles GetFDs requests from clients
func (s *FDServer) serveGet(c *net.UnixConn, hdr *fdHeader) (*fdHeader, []byte, []byte, error) {
key := hdr.getKey()
fds, err := s.getFDs(key)
if err != nil {
return nil, nil, nil, err
}
info, err := s.source.GetInfo(key)
if err != nil {
return nil, nil, nil, fmt.Errorf(“can’t get key info: %v”, err)
}

// ... prepare and return file descriptors ...  
rights := syscall.UnixRights(fds...)  
return \&fdHeader{/\*...\*/}, info, rights, nil  

}
In pkg/tapmanager/tapfdsource.go:

// GetFDs implements GetFDs method of FDSource interface
func (s *TapFDSource) GetFDs(key string, data []byte) ([]int, []byte, error) {
var payload GetFDPayload
if err := json.Unmarshal(data, &payload); err != nil {
return nil, nil, fmt.Errorf(“error unmarshalling GetFD payload: %v”, err)
}

// ... network namespace and CNI setup ...  
  
// Setup container side network  
if csn, err \= nettools.SetupContainerSideNetwork(netConfig, netNSPath, allLinks, s.enableSriov, hostNS); err \!= nil {  
    return nil, err  
}

// Marshal network configuration to return to client  
if respData, err \= json.Marshal(csn); err \!= nil {  
    return nil, fmt.Errorf("error marshalling net config: %v", err)  
}

// Collect file descriptors for tap devices  
for \_, i := range csn.Interfaces {  
    fds \= append(fds, int(i.Fo.Fd()))  
}  
  
return fds, respData, nil  

}
Key Interfaces
FDSource Interface
// FDSource denotes an ‘executive’ part for FDServer which
// creates and destroys (closes) the file descriptors and
// associated resources
type FDSource interface {
// GetFDs sets up file descriptors based on key and extra data
GetFDs(key string, data []byte) ([]int, []byte, error)

// Release destroys the file descriptor and associated resources  
Release(key string) error  
  
// GetInfo returns information to propagate back to FDClient  
GetInfo(key string) (\[\]byte, error)  
  
// Recover recovers FDSource's state after restart  
Recover(key string, data \[\]byte) error  
  
// RetrieveFDs retrieves file descriptors  
RetrieveFDs(key string) (\[\]int, error)  
  
// Stop stops goroutines associated with FDSource  
Stop() error  

}
FDManager Interface
// FDManager denotes an object that provides ‘master’-side
// functionality of FDClient
type FDManager interface {
// AddFDs adds new file descriptor to the FDManager
AddFDs(key string, data interface{}) ([]byte, error)

// ReleaseFDs makes FDManager close the file descriptor  
ReleaseFDs(key string) error  
  
// Recover recovers the state regarding the specified key  
Recover(key string, data interface{}) error  

}
InterfaceDescription Struct
// InterfaceDescription contains interface type with additional data
// needed to identify it
type InterfaceDescription struct {
Type network.InterfaceType `json:”type”`
HardwareAddr net.HardwareAddr `json:”mac”`
FdIndex int `json:”fdIndex”`
PCIAddress string `json:”pciAddress”`
}
Network Setup Functions
The actual network setup is handled by nettools.SetupContainerSideNetwork(), which:

Creates tap devices
Sets up bridges
Configures networking
Returns a ContainerSideNetwork structure with interface descriptions
Data Flow Between Components
Virtlet creates a VM and generates a network key
Virtlet passes this key to vmwrapper via environment variables
vmwrapper connects to tapmanager’s socket and calls GetFDs(key)
tapmanager calls TapFDSource.GetFDs() to set up networking
TapFDSource uses nettools to create and configure tap devices
TapFDSource returns file descriptors and interface descriptions
vmwrapper uses these to configure QEMU network devices
QEMU uses the file descriptors to communicate with the tap devices
This architecture allows for clean separation between the VM process and the network setup, with file descriptors being the primary interface between the components.

  1. Key Functions and Interfaces Between virtlet and vmwrapper

Key Functions and Interfaces Between Virtlet and vmwrapper
Overview
The interaction between Virtlet and vmwrapper is primarily through environment variables and the libvirt domain definition. Virtlet configures the VM domain and sets vmwrapper as the emulator, passing necessary configuration through environment variables.

Key Functions in Virtlet (VirtualizationTool)
Domain Creation
In pkg/libvirttools/virtualization.go, the CreateContainer method is responsible for defining the libvirt domain with vmwrapper as the emulator:

func (v *VirtualizationTool) CreateContainer(config *types.VMConfig, netFdKey string) (string, error) {
// …
settings := domainSettings{
domainUUID: domainUUID,
domainName: “virtlet-“ + domainUUID[:13] + “-“ + config.Name,
netFdKey: netFdKey,
// … other settings
}

domainDef := settings.createDomain(config)  
// ...  

}
Domain Definition
The createDomain method in domainSettings sets up the domain definition with vmwrapper as the emulator and passes configuration through environment variables:

func (ds *domainSettings) createDomain(config *types.VMConfig) *libvirtxml.Domain {
// …
domain := &libvirtxml.Domain{
Devices: &libvirtxml.DomainDeviceList{
Emulator: “/vmwrapper”,
// … other devices
},
// … other domain settings

    QEMUCommandline: \&libvirtxml.DomainQEMUCommandline{  
        Envs: \[\]libvirtxml.DomainQEMUCommandlineEnv{  
            {Name: vconfig.EmulatorEnvVarName, Value: emulator},  
            {Name: vconfig.NetKeyEnvVarName, Value: ds.netFdKey},  
            {Name: vconfig.ContainerIDEnvVarName, Value: config.DomainUUID},  
            {Name: vconfig.LogPathEnvVarName, Value: filepath.Join(config.LogDirectory, config.LogPath)},  
            {Name: vconfig.NetworkDeviceEnvVarName, Value: config.ParsedAnnotations.NetworkDevice},  
        },  
    },  
}  
// ...  
return domain  

}
Environment Variables (Communication Interface)
The key environment variables used for communication between Virtlet and vmwrapper are defined in pkg/config/constants.go:

const (
// ContainerIDEnvVarName contains name of env variable passed from virtlet to vmwrapper
ContainerIDEnvVarName = “VIRTLET_CONTAINER_ID”
// CpusetsEnvVarName contains name of env variable passed from virtlet to vmwrapper
CpusetsEnvVarName = “VIRTLET_CPUSETS”
// EmulatorEnvVarName contains name of env variable passed from virtlet to vmwrapper
EmulatorEnvVarName = “VIRTLET_EMULATOR”
// LogPathEnvVarName contains name of env variable passed from virtlet to vmwrapper
LogPathEnvVarName = “VIRTLET_CONTAINER_LOG_PATH”
// NetKeyEnvVarName contains name of env variable passed from virtlet to vmwrapper
NetKeyEnvVarName = “VIRTLET_NET_KEY”
// Network device
NetworkDeviceEnvVarName = “VIRTLET_NETWORK_DEVICE”
)
Key Functions in vmwrapper
In cmd/vmwrapper/vmwrapper.go, the main function processes these environment variables:

func main() {
// …
emulator := os.Getenv(config.EmulatorEnvVarName)
emulatorArgs := os.Args[1:]
var netArgs []string
if emulator == “” {
// this happens during ‘qemu -help’ invocation by libvirt
// (capability check)
emulator = defaultEmulator
} else {
netFdKey := os.Getenv(config.NetKeyEnvVarName)
// …
if netFdKey != “” {
c := tapmanager.NewFDClient(fdSocketPath)
fds, marshaledData, err := c.GetFDs(netFdKey)
// …
// Process network interfaces
// …
}
}

args := append(\[\]string{emulator}, emulatorArgs...)  
args \= append(args, netArgs...)  
env := os.Environ()  
if err := syscall.Exec(args\[0\], args, env); err \!= nil {  
    glog.Errorf("Can't exec emulator: %v", err)  
    os.Exit(1)  
}  

}
Data Flow Between Components
Virtlet creates a VM configuration with a unique domain UUID
Virtlet generates a network key for the VM
Virtlet defines a libvirt domain with:
/vmwrapper as the emulator
Environment variables containing configuration:
VIRTLET_EMULATOR: The actual QEMU emulator path
VIRTLET_NET_KEY: Key to retrieve network interfaces
VIRTLET_CONTAINER_ID: The domain UUID
VIRTLET_CONTAINER_LOG_PATH: Path for VM logs
VIRTLET_NETWORK_DEVICE: Network device type (e.g., “virtio”)
VIRTLET_CPUSETS: CPU sets for the VM (optional)
libvirt starts the domain, executing /vmwrapper with the environment variables
vmwrapper:
Reads the environment variables
Connects to tapmanager using the socket path
Retrieves network interfaces using the network key
Constructs QEMU command line arguments
Executes the actual QEMU emulator with the arguments
Key Interfaces
The primary interface between Virtlet and vmwrapper is the set of environment variables passed through the libvirt domain definition. These variables provide vmwrapper with all the information it needs to:

Identify the VM (container ID)
Locate the actual emulator to use
Set up networking by retrieving the appropriate file descriptors
Configure logging
Set CPU affinity (if specified)
This design allows Virtlet to remain in control of the VM configuration while delegating the actual execution and network setup to vmwrapper, which runs in a separate process.

Function Relationship