Linux Networking Deep Dive: Part 2 - Network Layer & IP Processing
Series Navigation
This is Part 2 of the Linux Networking Deep Dive series:
- Part 1: Foundation - Physical & Link Layer ← Previous
- Part 2: Network Layer & IP Processing ← You are here
- Part 3: Transport Layer & Socket Processing → Coming Next
Introduction
In Part 1, we explored the physical and link layer foundations of Linux networking, including network device registration, sk_buff management, and the transition from link layer to network layer. Now we dive into the network layer, where Linux makes critical routing decisions and applies powerful packet filtering and manipulation capabilities.
The network layer is where packets begin their journey through the IP protocol stack, determining whether they should be delivered locally, forwarded to another host, or processed by specialized services. This layer implements the core Internet Protocol (IPv4/IPv6) processing, sophisticated routing algorithms, and the netfilter framework that powers iptables, NAT, and firewall functionality.
IP Packet Processing Pipeline
The Journey Begins: ip_rcv()
Every IPv4 packet’s journey through the Linux network stack begins in the ip_rcv() function, located in net/ipv4/ip_input.c. This is where the link layer hands off packets to the network layer for processing.
1 | int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, |
This function demonstrates several key design principles:
- Network Namespace Awareness: Uses
dev_net(dev)to operate within the correct network namespace - Two-Stage Processing: Core validation followed by netfilter hook processing
- Early Exit Strategy: Invalid packets are dropped before expensive netfilter processing
Core IP Validation
The ip_rcv_core() function performs essential sanity checks that protect the system from malformed packets:
Promiscuous Mode Filtering
1 | if (skb->pkt_type == PACKET_OTHERHOST) { |
This check ensures that packets captured in promiscuous mode but not destined for this host are immediately dropped, preventing unnecessary processing.
Header Length Validation
1 | if (!pskb_may_pull(skb, sizeof(struct iphdr))) |
These checks ensure that the packet contains a valid IPv4 header with the minimum required 20 bytes.
Packet Length and Checksum Verification
1 | len = ntohs(iph->tot_len); |
The kernel performs comprehensive validation including length consistency and header checksum verification using optimized assembly instructions.
Drop Reasons and Statistics
Linux maintains detailed statistics about why packets are dropped, which is invaluable for troubleshooting:
| Drop Reason | SNMP Counter | Common Cause |
|---|---|---|
SKB_DROP_REASON_OTHERHOST |
- | Promiscuous mode filter |
SKB_DROP_REASON_PKT_TOO_SMALL |
IPSTATS_MIB_INTRUNCATEDPKTS |
Packet smaller than header indicates |
SKB_DROP_REASON_IP_CSUM |
IPSTATS_MIB_INHDRERRORS |
Invalid IP header checksum |
SKB_DROP_REASON_IP_INHDR |
IPSTATS_MIB_INHDRERRORS |
Malformed IP header |
Routing System Deep Dive
After successful validation, packets enter the routing subsystem through the ip_rcv_finish() function. This is where Linux determines the packet’s ultimate destination.
Route Lookup Optimization
Modern Linux implements several optimizations to minimize routing lookup overhead:
1. Route Hint Optimization
1 | if (ip_can_use_hint(skb, iph, hint)) { |
For packet streams, the kernel can reuse routing decisions from previous packets in the same flow, significantly reducing CPU overhead.
2. Early Demux for Performance
1 | if (READ_ONCE(net->ipv4.sysctl_ip_early_demux) && |
Early demux allows established TCP connections to bypass full routing lookup by associating packets directly with their sockets.
FIB Trie Structure
Linux uses a compressed trie (Patricia tree) for efficient route lookup, implemented in net/ipv4/fib_trie.c:
1 | struct key_vector { |
This structure provides several advantages:
- O(log n) lookup time: Efficient for large routing tables
- Memory efficient: Compressed representation reduces memory usage
- Cache-friendly: Locality of reference improves performance
- Lock-free reads: RCU protection allows concurrent access
Route Lookup Process
The core lookup function fib_lookup() implements a layered approach:
1 | int fib_lookup(struct net *net, struct flowi4 *flp, |
This approach checks tables in priority order:
- Local table: Routes to local interfaces and addresses
- Main table: Normal routing entries
- Policy routing: Rule-based routing for advanced configurations
Policy Routing and Multiple Tables
Linux supports up to 255 routing tables, enabling sophisticated policy-based routing:
1 | # Default routing rule priority |
Policy rules can make routing decisions based on:
- Source address: Route packets from specific networks differently
- Type of Service: Prioritize traffic based on TOS field
- Input interface: Apply different policies per interface
- Packet marks: Route based on netfilter marks
Example policy routing configuration:
1 | # Route traffic from 192.168.1.0/24 via different gateway |
Netfilter Framework Architecture
The netfilter framework is the cornerstone of Linux packet filtering, providing hooks at strategic points in the packet processing pipeline.
The Five Netfilter Hooks
Netfilter defines five hook points in the IPv4 stack, each serving specific purposes:
1. NF_INET_PRE_ROUTING
Location: After packet arrival, before routing decision
Use Cases:
- DNAT (Destination NAT): Redirect packets to different destinations
- Connection Tracking: Establish connection state
- Early packet filtering: Drop malicious packets before routing overhead
1 | return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING, |
2. NF_INET_LOCAL_IN
Location: After routing, for packets destined locally
Use Cases:
- Local packet filtering: Control access to local services
- Service protection: Implement rate limiting and access controls
- Logging and auditing: Monitor incoming connections
3. NF_INET_FORWARD
Location: After routing, for packets to be forwarded
Use Cases:
- Firewall rules: Control packet forwarding between networks
- Bandwidth management: Shape traffic flowing through the system
- VPN processing: Handle tunneled traffic
4. NF_INET_LOCAL_OUT
Location: For locally generated packets, before routing
Use Cases:
- Outbound filtering: Control local application traffic
- SNAT (Source NAT): Modify source addresses for outbound packets
- Traffic shaping: Apply QoS to outbound traffic
5. NF_INET_POST_ROUTING
Location: After routing, just before packet transmission
Use Cases:
- Final NAT: Last chance for address translation
- Traffic accounting: Count and classify outbound traffic
- QoS marking: Set DSCP/TOS values for traffic prioritization
Hook Processing and Priorities
Hooks are processed in priority order, with well-defined standard priorities:
1 |
This ordering ensures that:
- Connection tracking establishes state before filtering
- Destination NAT occurs before routing decisions
- Source NAT happens after routing is complete
- Raw table can bypass connection tracking when needed
Connection Tracking System
Connection tracking (conntrack) maintains state information for network connections, enabling stateful packet filtering:
1 | struct nf_conn { |
Connection States
Different protocols maintain different state information:
TCP States: SYN_SENT, SYN_RECV, ESTABLISHED, FIN_WAIT, CLOSE_WAIT, TIME_WAIT, etc.
UDP States: NEW, ESTABLISHED, UNREPLIED
ICMP States: NEW, ESTABLISHED (for request/reply pairs)
Connection tracking enables powerful filtering rules:
1 | # Allow established connections |
NAT Implementation
Network Address Translation integrates deeply with connection tracking:
Source NAT (SNAT)
1 | # Basic SNAT for private network |
Destination NAT (DNAT)
1 | # Port forwarding to internal server |
ICMP Processing and Network Diagnostics
The Internet Control Message Protocol (ICMP) provides essential diagnostic and error reporting capabilities.
ICMP Message Processing
ICMP processing begins in icmp_rcv(), which validates and dispatches messages:
1 | int icmp_rcv(struct sk_buff *skb) |
Message Type Handlers
Different ICMP message types have specialized handlers:
| Type | Handler | Purpose |
|---|---|---|
| 0 (Echo Reply) | ping_rcv |
Ping responses |
| 3 (Dest Unreachable) | icmp_unreach |
Network/host/port unreachable |
| 5 (Redirect) | icmp_redirect |
Route optimization |
| 8 (Echo Request) | icmp_echo |
Ping requests |
| 11 (Time Exceeded) | icmp_unreach |
TTL expiration |
Path MTU Discovery
ICMP Type 3, Code 4 (Fragmentation Needed) implements Path MTU Discovery:
1 | static void icmp_unreach(struct sk_buff *skb) |
This mechanism allows applications to discover the optimal packet size for a network path, improving efficiency and reducing fragmentation.
Rate Limiting and Security
ICMP implements sophisticated rate limiting to prevent abuse:
1 | static bool icmp_global_allow(struct net *net, int *credit) |
Configuration options for ICMP security:
1 | # Configure ICMP rate limiting |
Practical Examples and Exercises
Network Layer Debugging Tools
1. Route Analysis
1 | # Show routing table |
2. Packet Processing Analysis
1 | # View IP processing statistics |
3. Connection Tracking Monitoring
1 | # View active connections |
Performance Tuning Examples
1. Routing Performance
1 | # Enable early demux for better performance |
2. Netfilter Optimization
1 | # Increase connection tracking table size |
Troubleshooting Network Layer Issues
Common Problems and Solutions
1. Packet Drops at IP Layer
1 | # Check for header errors |
2. Routing Issues
1 | # Verify route lookup |
3. Netfilter Problems
1 | # Enable netfilter packet tracing |
What’s Next: Transport Layer Preview
In Part 3, we’ll explore the transport layer where Linux implements TCP and UDP protocols. We’ll cover:
- TCP State Management: Connection establishment, data transfer, and termination
- Socket Layer Architecture: How applications interact with the network stack
- Buffer Management: Socket buffers, congestion control, and flow control
- Performance Optimization: TCP optimizations, zero-copy techniques, and offloading
- UDP Processing: Connectionless protocol handling and multicast support
The transport layer builds upon the network layer foundation we’ve covered here, using the routing decisions and connection tracking state to efficiently deliver data to applications.
Key Takeaways
The network layer represents a sophisticated balance of performance, security, and flexibility:
- IP Processing Pipeline: Efficient validation and processing with detailed error tracking
- Routing System: Scalable lookup algorithms with policy-based routing capabilities
- Netfilter Framework: Powerful packet filtering and manipulation with minimal overhead
- ICMP Integration: Essential diagnostics and error reporting with security protections
Understanding these subsystems is crucial for:
- Network Performance: Optimizing packet processing pipelines
- Security Implementation: Effective firewall and NAT configurations
- Troubleshooting: Diagnosing connectivity and routing issues
- System Design: Making informed networking architecture decisions
The Linux network layer’s careful design enables it to scale from embedded devices to high-performance routers and servers, while maintaining the flexibility needed for modern networking requirements including containers, virtualization, and cloud environments.
This post is part of the Linux Networking Deep Dive series. Stay tuned for Part 3 where we’ll explore the transport layer and socket processing!