Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 5 Sep 2011 02:59:52 -0400
From:      Arnaud Lacombe <lacombar@gmail.com>
To:        freebsd-net@freebsd.org
Cc:        Jack Vogel <jfvogel@gmail.com>
Subject:   FreeBSD 7-STABLE mbuf corruption
Message-ID:  <CACqU3MUs9Z9GeuGe=8iVp=MWV6eG-tO%2BkHb1znatsTq2uEqwvA@mail.gmail.com>

index | next in thread | raw e-mail

[-- Attachment #1 --]
Hi folks,

We have been trying to track down a bad mbuf management for about two
weeks on a customized 7.1 base. I have finally been able to reproduce
it with a stock FreeBSD 7-STABLE (kernel from r225276, userland from
7.4).

With the help of the attached patches, I have just been able to
trigger the following panic:

panic: Corrupted unused flags, expected 0xffffffff00000000, got 0x0, flags 0x3
cpuid = 1
Uptime: 3d10h5m3s
Cannot dump. No dump device defined

The patches taints high order 32bits of m_flags (extended to 64bits;
those are thus unused) right before the mbuf is referenced in the TX
ring of the igb(4) driver, I expect this value to never change until
right before the mbuf is freed igb_txeof(). [About this, Jack, am I
correct expecting the mbuf's flags not to be touched between
igb_xmit() and igb_txeof() ?]

I have strong suspicions that this is the cause of PR/155597,
eventually a few others PR. My current assumption about the root cause
of this behavior is that the same mbuf ends up being queued for TX
twice. After the first TX, it gets released, eventually reused in
socket's buffer, but ends up being freed again after the second TX,
screwing the chains at the same time, leading to crashes of the box.

On the crashes happens in multiple locations. We have seen crashes
(both clean panic() and NULL-pointer dereference) in various places
over the last weeks, first an almost daily panic() in sbdrop() or
sbsndptr(), and NULL-pointer dereferences in hfsc_dequeue(),
m_ext_free(), m_copym, and the list goes on.

On the driver p.o.v, crashes happened with igb(4) end of last year. At
the time, dropping the number of queue to 1 mitigated the problem...
so far. Now, the daily crashes happens with em(4) (single queue by
default on 7). We also have records of crashes in sbsndptr() on
vr(4)-based devices. Crashes seen on em(4) configuration are almost
always preceded by one or many:

emX: discard frame w/o packet header

which we agree, should not happen.

On the traffic p.o.v., crashes happens on a 24h basis with a box
handling about 30Mbps over a couple of thousands TCP connections. I
have been able to get consistent crashes in about 1h with ALTQ
enabled, proxying about 200 TCP connection over 200Mbps of traffic
(unidirectional, sub-ms RTT). Crashes becomes a lot faster with ALTQ
disabled, down to a reliable crash within 5min. The box is running a
few hundreds ipfw rules. Without any ipfw rules loaded, crashes
happens within 30min.

Now, the FreeBSD 7-STABLE machine has been able to handle about
900Mbps of traffic, over 2*500 TCP connections (500 receiving, 500
sending), over 24h without crashing. It crashed almost instantly when
I restarted the test today. The kernel has been built with INVARIANTS
and INVARIANT_SUPPORT. Hardware enumerate as follow:

CPU: Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz (2493.76-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x1067a  Family = 6  Model = 17  Stepping = 10
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x40ce3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,XSAVE>
  AMD Features=0x20100000<NX,LM>
  AMD Features2=0x1<LAHF>
  Cores per package: 4
real memory  = 3757834240 (3583 MB)
avail memory = 3678064640 (3507 MB)
ACPI APIC Table: <100509 APIC1714>
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
  cpu (AP): APIC ID:  4 (disabled)
  cpu (AP): APIC ID:  5 (disabled)
  cpu (AP): APIC ID:  6 (disabled)
  cpu (AP): APIC ID:  7 (disabled)
[...]
igb0: <Intel(R) PRO/1000 Network Connection version - 2.2.5> port
0xec00-0xec1f mem
0xfdf60000-0xfdf7ffff,0xfdf40000-0xfdf5ffff,0xfdfb8000-0xfdfbbfff irq
16 at device 0.0 on pci7
igb0: Using MSIX interrupts with 5 vectors
igb0: [ITHREAD]
igb0: [ITHREAD]
igb0: [ITHREAD]
igb0: [ITHREAD]
igb0: [ITHREAD]
igb0: Ethernet address: 00:15:b2:xx:xx:xx
igb1: <Intel(R) PRO/1000 Network Connection version - 2.2.5> port
0xec80-0xec9f mem
0xfdfe0000-0xfdffffff,0xfdfc0000-0xfdfdffff,0xfdfbc000-0xfdfbffff irq
17 at device 0.1 on pci7
igb1: Using MSIX interrupts with 5 vectors
igb1: [ITHREAD]
igb1: [ITHREAD]
igb1: [ITHREAD]
igb1: [ITHREAD]
igb1: [ITHREAD]
igb1: Ethernet address: 00:15:b2:xx:xx:xx

em(4) and igb(4) are a direct backport from HEAD, plus the build fix I
posted a few days ago.

Custom mbuf debugging is attached, as well as the config of the
kernel. The goal of the changes was first to enforce mbuf trashing,
then locate the bogus m_freem()[0], thus modifying the mbuf free path
to taint the mbuf not with the static 0xdeadc0de, but with the IP of
the call-site. Then add some consistency check.

At this point, any help is appreciated!

Thanks in advance,
 - Arnaud

[0]: the reason behind that is that I first got tons of crashes
related 0xdeadc0de, in particular, an mbuf being tagged M_PROMISC,
while the interface was not in promiscuous mode crashing on an
unwanted call to m_freem() in ether_demub()

[-- Attachment #2 --]
#
# GENERIC -- Generic kernel configuration file for FreeBSD/i386
#
# For more information on this file, please read the handbook section on
# Kernel Configuration Files:
#
#    http://www.FreeBSD.org/doc/en_US.ISO8859-1/books/handbook/kernelconfig-config.html
#
# The handbook is also available locally in /usr/share/doc/handbook
# if you've installed the doc distribution, otherwise always see the
# FreeBSD World Wide Web server (http://www.FreeBSD.org/) for the
# latest information.
#
# An exhaustive list of options and more detailed explanations of the
# device lines is also present in the ../../conf/NOTES and NOTES files.
# If you are in doubt as to the purpose or necessity of a line, check first
# in NOTES.
#
# $FreeBSD$

#cpu		I486_CPU
#cpu		I586_CPU
cpu		I686_CPU
ident		GENERIC

# To statically compile in device wiring instead of /boot/device.hints
#hints		"GENERIC.hints"		# Default places to look for devices.

makeoptions	DEBUG=-g		# Build kernel with gdb(1) debug symbols

options 	SCHED_ULE		# ULE scheduler
options 	PREEMPTION		# Enable kernel thread preemption
options 	INET			# InterNETworking
options 	INET6			# IPv6 communications protocols
#options 	SCTP			# Stream Control Transmission Protocol
options 	FFS			# Berkeley Fast Filesystem
options 	SOFTUPDATES		# Enable FFS soft updates support
options 	UFS_ACL			# Support for access control lists
options 	UFS_DIRHASH		# Improve performance on big directories
options 	UFS_GJOURNAL		# Enable gjournal-based UFS journaling
options 	MD_ROOT			# MD is a potential root device
#options 	NFSCLIENT		# Network Filesystem Client
#options 	NFSSERVER		# Network Filesystem Server
#options 	NFSLOCKD		# Network Lock Manager
#options 	NFS_ROOT		# NFS usable as /, requires NFSCLIENT
#options 	MSDOSFS			# MSDOS Filesystem
options 	CD9660			# ISO 9660 Filesystem
options 	PROCFS			# Process filesystem (requires PSEUDOFS)
options 	PSEUDOFS		# Pseudo-filesystem framework
options 	GEOM_PART_GPT		# GUID Partition Tables.
options 	GEOM_LABEL		# Provides labelization
options 	COMPAT_43TTY		# BSD 4.3 TTY compat [KEEP THIS!]
#options 	COMPAT_FREEBSD4		# Compatible with FreeBSD4
#options 	COMPAT_FREEBSD5		# Compatible with FreeBSD5
#options 	COMPAT_FREEBSD6		# Compatible with FreeBSD6
#options 	SCSI_DELAY=5000		# Delay (in ms) before probing SCSI
options 	KTRACE			# ktrace(1) support
options 	STACK			# stack(9) support
options 	SYSVSHM			# SYSV-style shared memory
options 	SYSVMSG			# SYSV-style message queues
options 	SYSVSEM			# SYSV-style semaphores
options 	P1003_1B_SEMAPHORES	# POSIX-style semaphores
options 	_KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions
options 	KBD_INSTALL_CDEV	# install a CDEV entry in /dev
options 	ADAPTIVE_GIANT		# Giant mutex is adaptive.
options 	STOP_NMI		# Stop CPUS using NMI instead of IPI
options 	AUDIT			# Security event auditing
#options 	KDTRACE_HOOKS		# Kernel DTrace hooks
options 	INCLUDE_CONFIG_FILE     # Include this file in kernel

# To make an SMP kernel, the next two lines are needed
options 	SMP			# Symmetric MultiProcessor Kernel
device		apic			# I/O APIC

# CPU frequency control
device		cpufreq

# Bus support.
device		eisa
device		pci

# Floppy drives
#device		fdc

# ATA and ATAPI devices
device		ata
device		atadisk		# ATA disk drives
device		ataraid		# ATA RAID drives
#device		atapicd		# ATAPI CDROM drives
#device		atapifd		# ATAPI floppy drives
#device		atapist		# ATAPI tape drives
options 	ATA_STATIC_ID	# Static device numbering

# SCSI Controllers
#device		ahb		# EISA AHA1742 family
#device		ahc		# AHA2940 and onboard AIC7xxx devices
#options 	AHC_REG_PRETTY_PRINT	# Print register bitfields in debug
					# output.  Adds ~128k to driver.
#device		ahd		# AHA39320/29320 and onboard AIC79xx devices
#options 	AHD_REG_PRETTY_PRINT	# Print register bitfields in debug
					# output.  Adds ~215k to driver.
#device		amd		# AMD 53C974 (Tekram DC-390(T))
#device		hptiop		# Highpoint RocketRaid 3xxx series
#device		isp		# Qlogic family
#device 	ispfw		# Firmware for QLogic HBAs- normally a module
#device		mpt		# LSI-Logic MPT-Fusion
#device		ncr		# NCR/Symbios Logic
#device		sym		# NCR/Symbios Logic (newer chipsets + those of `ncr')
#device		trm		# Tekram DC395U/UW/F DC315U adapters

#device		adv		# Advansys SCSI adapters
#device		adw		# Advansys wide SCSI adapters
#device		aha		# Adaptec 154x SCSI adapters
#device		aic		# Adaptec 15[012]x SCSI adapters, AIC-6[23]60.
#device		bt		# Buslogic/Mylex MultiMaster SCSI adapters

#device		ncv		# NCR 53C500
#device		nsp		# Workbit Ninja SCSI-3
#device		stg		# TMC 18C30/18C50

# SCSI peripherals
device		scbus		# SCSI bus (required for SCSI)
device		ch		# SCSI media changers
device		da		# Direct Access (disks)
#device		sa		# Sequential Access (tape etc)
#device		cd		# CD
device		pass		# Passthrough device (direct SCSI access)
device		ses		# SCSI Environmental Services (and SAF-TE)

# RAID controllers interfaced to the SCSI subsystem
#device		amr		# AMI MegaRAID
#device		arcmsr		# Areca SATA II RAID
#device		asr		# DPT SmartRAID V, VI and Adaptec SCSI RAID
#device		ciss		# Compaq Smart RAID 5*
#device		dpt		# DPT Smartcache III, IV - See NOTES for options
#device		hptmv		# Highpoint RocketRAID 182x
#device		hptrr		# Highpoint RocketRAID 17xx, 22xx, 23xx, 25xx
#device		iir		# Intel Integrated RAID
#device		ips		# IBM (Adaptec) ServeRAID
#device		mly		# Mylex AcceleRAID/eXtremeRAID
#device		twa		# 3ware 9000 series PATA/SATA RAID

# RAID controllers
#device		aac		# Adaptec FSA RAID
#device		aacp		# SCSI passthrough for aac (requires CAM)
#device		ida		# Compaq Smart RAID
#device		mfi		# LSI MegaRAID SAS
#device		mlx		# Mylex DAC960 family
#device		pst		# Promise Supertrak SX6000
#device		twe		# 3ware ATA RAID

# atkbdc0 controls both the keyboard and the PS/2 mouse
device		atkbdc		# AT keyboard controller
device		atkbd		# AT keyboard
device		psm		# PS/2 mouse

device		kbdmux		# keyboard multiplexer

device		vga		# VGA video card driver

#device		splash		# Splash screen and screen saver support

# syscons is the default console driver, resembling an SCO console
device		sc

device		agp		# support several AGP chipsets

# Power management support (see NOTES for more options)
#device		apm
# Add suspend/resume support for the i8254.
device		pmtimer

# PCCARD (PCMCIA) support
# PCMCIA and cardbus bridge support
#device		cbb		# cardbus (yenta) bridge
#device		pccard		# PC Card (16-bit) bus
#device		cardbus		# CardBus (32-bit) bus

# Serial (COM) ports
device		sio		# 8250, 16[45]50 based serial ports
device		uart		# Generic UART driver

# Parallel port
#device		ppc
#device		ppbus		# Parallel port bus (required)
#device		lpt		# Printer
#device		plip		# TCP/IP over parallel
#device		ppi		# Parallel port interface device
#device		vpo		# Requires scbus and da

# If you've got a "dumb" serial or parallel PCI card that is
# supported by the puc(4) glue driver, uncomment the following
# line to enable it (connects to sio, uart and/or ppc drivers):
#device		puc

# PCI Ethernet NICs.
#device		de		# DEC/Intel DC21x4x (``Tulip'')
device		em		# Intel PRO/1000 Gigabit Ethernet Family
device		igb		# Intel PRO/1000 PCIE Server Gigabit Family
#device		ixgb		# Intel PRO/10GbE Ethernet Card
#device		le		# AMD Am7900 LANCE and Am79C9xx PCnet
#device		txp		# 3Com 3cR990 (``Typhoon'')
#device		vx		# 3Com 3c590, 3c595 (``Vortex'')

# PCI Ethernet NICs that use the common MII bus controller code.
# NOTE: Be sure to keep the 'device miibus' line in order to use these NICs!
device		miibus		# MII bus support
#device		age		# Attansic/Atheros L1 Gigabit Ethernet
#device		alc		# Atheros AR8131/AR8132 Ethernet
#device		ale		# Atheros AR8121/AR8113/AR8114 Ethernet
#device		bce		# Broadcom BCM5706/BCM5708 Gigabit Ethernet
#device		bfe		# Broadcom BCM440x 10/100 Ethernet
#device		bge		# Broadcom BCM570xx Gigabit Ethernet
#device		dc		# DEC/Intel 21143 and various workalikes
#device		et		# Agere ET1310 10/100/Gigabit Ethernet
#device		fxp		# Intel EtherExpress PRO/100B (82557, 82558)
#device		jme		# JMicron JMC250 Gigabit/JMC260 Fast Ethernet
#device		lge		# Level 1 LXT1001 gigabit Ethernet
#device		msk		# Marvell/SysKonnect Yukon II Gigabit Ethernet
#device		nfe		# nVidia nForce MCP on-board Ethernet
#device		nge		# NatSemi DP83820 gigabit Ethernet
#device		nve		# nVidia nForce MCP on-board Ethernet Networking
#device		pcn		# AMD Am79C97x PCI 10/100 (precedence over 'le')
#device		re		# RealTek 8139C+/8169/8169S/8110S
#device		rl		# RealTek 8129/8139
#device		sf		# Adaptec AIC-6915 (``Starfire'')
#device		sge		# Silicon Integrated Systems SiS190/191
#device		sis		# Silicon Integrated Systems SiS 900/SiS 7016
#device		sk		# SysKonnect SK-984x & SK-982x gigabit Ethernet
#device		ste		# Sundance ST201 (D-Link DFE-550TX)
#device		stge		# Sundance/Tamarack TC9021 gigabit Ethernet
#device		ti		# Alteon Networks Tigon I/II gigabit Ethernet
#device		tl		# Texas Instruments ThunderLAN
#device		tx		# SMC EtherPower II (83c170 ``EPIC'')
#device		vge		# VIA VT612x gigabit Ethernet
#device		vr		# VIA Rhine, Rhine II
#device		vte		# DM&P Vortex86 RDC R6040 Fast Ethernet
#device		wb		# Winbond W89C840F
#device		xl		# 3Com 3c90x (``Boomerang'', ``Cyclone'')

# ISA Ethernet NICs.  pccard NICs included.
#device		cs		# Crystal Semiconductor CS89x0 NIC
# 'device ed' requires 'device miibus'
#device		ed		# NE[12]000, SMC Ultra, 3c503, DS8390 cards
#device		ex		# Intel EtherExpress Pro/10 and Pro/10+
#device		ep		# Etherlink III based cards
#device		fe		# Fujitsu MB8696x based cards
#device		ie		# EtherExpress 8/16, 3C507, StarLAN 10 etc.
#device		sn		# SMC's 9000 series of Ethernet chips
#device		xe		# Xircom pccard Ethernet

# Wireless NIC cards
#device		wlan		# 802.11 support
#device		wlan_wep	# 802.11 WEP support
#device		wlan_ccmp	# 802.11 CCMP support
#device		wlan_tkip	# 802.11 TKIP support
#device		wlan_amrr	# AMRR transmit rate control algorithm
#device		wlan_scan_ap	# 802.11 AP mode scanning
#device		wlan_scan_sta	# 802.11 STA mode scanning
#device		an		# Aironet 4500/4800 802.11 wireless NICs.
#device		ath		# Atheros pci/cardbus NIC's
#device		ath_hal		# Atheros HAL (Hardware Access Layer)
#options		AH_SUPPORT_AR5416	# enable AR5416 tx/rx descriptors
#device		ath_rate_sample	# SampleRate tx rate control for ath
#device		awi		# BayStack 660 and others
#device		ral		# Ralink Technology RT2500 wireless NICs.
#device		wi		# WaveLAN/Intersil/Symbol 802.11 wireless NICs.
#device		wl		# Older non 802.11 Wavelan wireless NIC.

# Pseudo devices.
device		loop		# Network loopback
device		random		# Entropy device
device		ether		# Ethernet support
device		vlan		# 802.1Q VLAN support
device		sl		# Kernel SLIP
device		ppp		# Kernel PPP
device		tun		# Packet tunnel.
device		pty		# Pseudo-ttys (telnet etc)
device		md		# Memory "disks"
device		gif		# IPv6 and IPv4 tunneling
device		faith		# IPv6-to-IPv4 relaying (translation)
device		firmware	# firmware assist module

# The `bpf' device enables the Berkeley Packet Filter.
# Be aware of the administrative consequences of enabling this!
# Note that 'bpf' is required for DHCP.
device		bpf		# Berkeley packet filter

#
options		INVARIANTS
options		INVARIANT_SUPPORT

#
options		IPFIREWALL

options		IPFIREWALL_DEFAULT_TO_ACCEPT
options		IPFIREWALL_FORWARD
options		IPFIREWALL_VERBOSE

#
device		pf		#

#
options		ALTQ
options		ALTQ_CBQ        # Class Based Queueing
options		ALTQ_RED        # Random Early Detection
options		ALTQ_RIO        # RED In/Out
options		ALTQ_HFSC       # Hierarchical Packet Scheduler
options		ALTQ_CDNR       # Traffic conditioner
options		ALTQ_PRIQ       # Priority Queueing
options		ALTQ_NOPCC      # Required if the TSC is unusable

# USB support
#device		uhci		# UHCI PCI->USB interface
#device		ohci		# OHCI PCI->USB interface
#device		ehci		# EHCI PCI->USB interface (USB 2.0)
#device		usb		# USB Bus (required)
#device		udbp		# USB Double Bulk Pipe devices
#device		ugen		# Generic
#device		uhid		# "Human Interface Devices"
#device		ukbd		# Keyboard
#device		ulpt		# Printer
#device		umass		# Disks/Mass storage - Requires scbus and da
#device		ums		# Mouse
#device		urio		# Diamond Rio 500 MP3 player
#device		uscanner	# Scanners
# USB Serial devices
#device		ucom		# Generic com ttys
#device		uark		# Technologies ARK3116 based serial adapters
#device		ubsa		# Belkin F5U103 and compatible serial adapters
#device		ubser		# BWCT console serial adapters
#device		uftdi		# For FTDI usb serial adapters
#device		uipaq		# Some WinCE based devices
#device		uplcom		# Prolific PL-2303 serial adapters
#device		uslcom		# SI Labs CP2101/CP2102 serial adapters
#device		uvisor		# Visor and Palm devices
#device		uvscom		# USB serial support for DDI pocket's PHS
# USB Ethernet, requires miibus
#device		aue		# ADMtek USB Ethernet
#device		axe		# ASIX Electronics USB Ethernet
#device		cdce		# Generic USB over Ethernet
#device		cue		# CATC USB Ethernet
#device		kue		# Kawasaki LSI USB Ethernet
#device		rue		# RealTek RTL8150 USB Ethernet
# USB Wireless
#device		rum		# Ralink Technology RT2501USB wireless NICs
#device		ural		# Ralink Technology RT2500USB wireless NICs

# FireWire support
#device		firewire	# FireWire bus code
#device		sbp		# SCSI over FireWire (Requires scbus and da)
#device		fwe		# Ethernet over FireWire (non-standard!)
#device		fwip		# IP over FireWire (RFC 2734,3146)
#device		dcons		# Dumb console driver
#device		dcons_crom	# Configuration ROM for dcons

[-- Attachment #3 --]
From 5221519fc7d4c31b6c1ea2c7581bd998b7505e65 Mon Sep 17 00:00:00 2001
From: Arnaud Lacombe <lacombar@gmail.com>
Date: Tue, 30 Aug 2011 21:55:47 -0400
Subject: [PATCH 1/4] inlinize m_freem()

---
 sys/kern/uipc_mbuf.c |   12 ------------
 sys/sys/mbuf.h       |   14 +++++++++++++-
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/sys/kern/uipc_mbuf.c b/sys/kern/uipc_mbuf.c
index f823e26..9f9288f 100644
--- a/sys/kern/uipc_mbuf.c
+++ b/sys/kern/uipc_mbuf.c
@@ -151,18 +151,6 @@ m_getm2(struct mbuf *m, int len, int how, short type, int flags)
 	return (m);
 }
 
-/*
- * Free an entire chain of mbufs and associated external buffers, if
- * applicable.
- */
-void
-m_freem(struct mbuf *mb)
-{
-
-	while (mb != NULL)
-		mb = m_free(mb);
-}
-
 /*-
  * Configure a provided mbuf to refer to the provided external storage
  * buffer and setup a reference count for said buffer.  If the setting
diff --git a/sys/sys/mbuf.h b/sys/sys/mbuf.h
index 8529cca..f3f98b0 100644
--- a/sys/sys/mbuf.h
+++ b/sys/sys/mbuf.h
@@ -358,6 +358,7 @@ static __inline struct mbuf	*m_getjcl(int how, short type, int flags,
 				    int size);
 static __inline struct mbuf	*m_getclr(int how, short type);	/* XXX */
 static __inline struct mbuf	*m_free(struct mbuf *m);
+static __inline void		m_freem(struct mbuf *m);
 static __inline void		 m_clget(struct mbuf *m, int how);
 static __inline void		*m_cljget(struct mbuf *m, int how, int size);
 static __inline void		 m_chtype(struct mbuf *m, short new_type);
@@ -520,6 +521,18 @@ m_free(struct mbuf *m)
 	return (n);
 }
 
+/*
+ * Free an entire chain of mbufs and associated external buffers, if
+ * applicable.
+ */
+static __inline void
+m_freem(struct mbuf *m)
+{
+
+	while (m != NULL)
+		m = m_free(m);
+}
+
 static __inline void
 m_clget(struct mbuf *m, int how)
 {
@@ -767,7 +780,6 @@ struct mbuf	*m_dup(struct mbuf *, int);
 int		 m_dup_pkthdr(struct mbuf *, struct mbuf *, int);
 u_int		 m_fixhdr(struct mbuf *);
 struct mbuf	*m_fragment(struct mbuf *, int, int);
-void		 m_freem(struct mbuf *);
 struct mbuf	*m_getm2(struct mbuf *, int, int, short, int);
 struct mbuf	*m_getptr(struct mbuf *, int, int *);
 u_int		 m_length(struct mbuf *, struct mbuf **);
-- 
1.7.6.153.g78432


[-- Attachment #4 --]
From 0f2ada7a85c9e665be23f4801e1722e776492fdc Mon Sep 17 00:00:00 2001
From: Arnaud Lacombe <lacombar@gmail.com>
Date: Wed, 31 Aug 2011 12:16:58 -0400
Subject: [PATCH 2/4] mbuf use-after-free marking

---
 sys/kern/uipc_mbuf.c |    4 ++--
 sys/sys/mbuf.h       |   37 ++++++++++++++++++++++++++++++++-----
 sys/vm/uma_dbg.c     |    6 ++++--
 3 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/sys/kern/uipc_mbuf.c b/sys/kern/uipc_mbuf.c
index 9f9288f..9b92fb0 100644
--- a/sys/kern/uipc_mbuf.c
+++ b/sys/kern/uipc_mbuf.c
@@ -197,7 +197,7 @@ m_extadd(struct mbuf *mb, caddr_t buf, u_int size,
  * storage attached to them if the reference count hits 1.
  */
 void
-mb_free_ext(struct mbuf *m)
+mb_free_ext(struct mbuf *m, void *arg)
 {
 	int skipmbuf;
 	
@@ -264,7 +264,7 @@ mb_free_ext(struct mbuf *m)
 	m->m_ext.ext_size = 0;
 	m->m_ext.ext_type = 0;
 	m->m_flags &= ~M_EXT;
-	uma_zfree(zone_mbuf, m);
+	uma_zfree_arg(zone_mbuf, m, arg);
 }
 
 /*
diff --git a/sys/sys/mbuf.h b/sys/sys/mbuf.h
index f3f98b0..19ea5e8 100644
--- a/sys/sys/mbuf.h
+++ b/sys/sys/mbuf.h
@@ -362,7 +362,7 @@ static __inline void		m_freem(struct mbuf *m);
 static __inline void		 m_clget(struct mbuf *m, int how);
 static __inline void		*m_cljget(struct mbuf *m, int how, int size);
 static __inline void		 m_chtype(struct mbuf *m, short new_type);
-void				 mb_free_ext(struct mbuf *);
+void				 mb_free_ext(struct mbuf *, void *);
 static __inline struct mbuf	*m_last(struct mbuf *m);
 
 static __inline int
@@ -506,21 +506,39 @@ m_free_fast(struct mbuf *m)
 {
 	KASSERT(SLIST_EMPTY(&m->m_pkthdr.tags), ("doing fast free of mbuf with tags"));
 
+#ifdef INVARIANTS
+	uma_zfree_arg(zone_mbuf, m, (void *)(0xf00f0000 | MB_NOTAGS));
+#else
 	uma_zfree_arg(zone_mbuf, m, (void *)MB_NOTAGS);
+#endif
 }
 
 static __inline struct mbuf *
-m_free(struct mbuf *m)
+m_free_arg(struct mbuf *m, void *arg)
 {
 	struct mbuf *n = m->m_next;
 
 	if (m->m_flags & M_EXT)
-		mb_free_ext(m);
+		mb_free_ext(m, arg);
 	else if ((m->m_flags & M_NOFREE) == 0)
-		uma_zfree(zone_mbuf, m);
+		uma_zfree_arg(zone_mbuf, m, arg);
+
 	return (n);
 }
 
+static __inline struct mbuf *
+m_free(struct mbuf *m)
+{
+
+	return m_free_arg(m, 0);
+}
+
+#ifdef INVARIANTS
+#define _THIS_IP_  ({ __label__ __here; __here: (unsigned long)&&__here; })
+#else
+#define _THIS_IP_ 0
+#endif
+
 /*
  * Free an entire chain of mbufs and associated external buffers, if
  * applicable.
@@ -528,9 +546,18 @@ m_free(struct mbuf *m)
 static __inline void
 m_freem(struct mbuf *m)
 {
+	unsigned long this_ip = (_THIS_IP_ & 0x00ffff00) | (_THIS_IP_ & 0xff) << 24;
+
+	while (m != NULL)
+		m = m_free_arg(m, (void *)this_ip);
+}
+
+static __inline void
+m_freem_arg(struct mbuf *m, void *arg)
+{
 
 	while (m != NULL)
-		m = m_free(m);
+		m = m_free_arg(m, arg);
 }
 
 static __inline void
diff --git a/sys/vm/uma_dbg.c b/sys/vm/uma_dbg.c
index 9075bf9..669b1f5 100644
--- a/sys/vm/uma_dbg.c
+++ b/sys/vm/uma_dbg.c
@@ -61,6 +61,7 @@ static const u_int32_t uma_junk = 0xdeadc0de;
 int
 trash_ctor(void *mem, int size, void *arg, int flags)
 {
+#if 0
 	int cnt;
 	u_int32_t *p;
 
@@ -72,6 +73,7 @@ trash_ctor(void *mem, int size, void *arg, int flags)
 			    mem, size, *p, p);
 			return (0);
 		}
+#endif
 	return (0);
 }
 
@@ -90,7 +92,7 @@ trash_dtor(void *mem, int size, void *arg)
 	cnt = size / sizeof(uma_junk);
 
 	for (p = mem; cnt > 0; cnt--, p++)
-		*p = uma_junk;
+		*p = (unsigned long)arg;
 }
 
 /*
@@ -102,7 +104,7 @@ trash_dtor(void *mem, int size, void *arg)
 int
 trash_init(void *mem, int size, int flags)
 {
-	trash_dtor(mem, size, NULL);
+	trash_dtor(mem, size, (void *)uma_junk);
 	return (0);
 }
 
-- 
1.7.6.153.g78432


[-- Attachment #5 --]
From c58e8728e4dd75b542fa870bfc7420a3c555d2ca Mon Sep 17 00:00:00 2001
From: Arnaud Lacombe <lacombar@gmail.com>
Date: Wed, 31 Aug 2011 22:08:04 -0400
Subject: [PATCH 3/4] 64bits mbuf flags

---
 sys/kern/uipc_mbuf.c |    6 +++++-
 sys/sys/mbuf.h       |   46 +++++++++++++++++++++++-----------------------
 2 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/sys/kern/uipc_mbuf.c b/sys/kern/uipc_mbuf.c
index 9b92fb0..24f9ec8 100644
--- a/sys/kern/uipc_mbuf.c
+++ b/sys/kern/uipc_mbuf.c
@@ -50,6 +50,8 @@ __FBSDID("$FreeBSD$");
 
 #include <security/mac/mac_framework.h>
 
+#include <machine/_inttypes.h>
+
 int	max_linkhdr;
 int	max_protohdr;
 int	max_hdr;
@@ -1405,10 +1407,12 @@ m_print(const struct mbuf *m, int maxlen)
 		pdata = m2->m_len;
 		if (maxlen != -1 && pdata > maxlen)
 			pdata = maxlen;
+#if 0
 		printf("mbuf: %p len: %d, next: %p, %b%s", m2, m2->m_len,
 		    m2->m_next, m2->m_flags, "\20\20freelist\17skipfw"
 		    "\11proto5\10proto4\7proto3\6proto2\5proto1\4rdonly"
 		    "\3eor\2pkthdr\1ext", pdata ? "" : "\n");
+#endif
 		if (pdata)
 			printf(", %*D\n", pdata, (u_char *)m2->m_data, "-");
 		if (len != -1)
@@ -1835,7 +1839,7 @@ m_unshare(struct mbuf *m0, int how)
 		 * it anyway, we try to reduce the number of mbufs and
 		 * clusters so that future work is easier).
 		 */
-		KASSERT(m->m_flags & M_EXT, ("m_flags 0x%x", m->m_flags));
+		KASSERT(m->m_flags & M_EXT, ("m_flags 0x%" PRIx64, m->m_flags));
 		/* NB: we only coalesce into a cluster or larger */
 		if (mprev != NULL && (mprev->m_flags & M_EXT) &&
 		    m->m_len <= M_TRAILINGSPACE(mprev)) {
diff --git a/sys/sys/mbuf.h b/sys/sys/mbuf.h
index 19ea5e8..8ad09bb 100644
--- a/sys/sys/mbuf.h
+++ b/sys/sys/mbuf.h
@@ -91,7 +91,7 @@ struct m_hdr {
 	struct mbuf	*mh_nextpkt;	/* next chain in queue/record */
 	caddr_t		 mh_data;	/* location of data */
 	int		 mh_len;	/* amount of data in this mbuf */
-	int		 mh_flags;	/* flags; see below */
+	uint64_t	 mh_flags;	/* flags; see below */
 	short		 mh_type;	/* type of data in this mbuf */
 	uint8_t          pad[M_HDR_PAD];/* word align                  */
 };
@@ -169,28 +169,28 @@ struct mbuf {
 /*
  * mbuf flags.
  */
-#define	M_EXT		0x00000001 /* has associated external storage */
-#define	M_PKTHDR	0x00000002 /* start of record */
-#define	M_EOR		0x00000004 /* end of record */
-#define	M_RDONLY	0x00000008 /* associated data is marked read-only */
-#define	M_PROTO1	0x00000010 /* protocol-specific */
-#define	M_PROTO2	0x00000020 /* protocol-specific */
-#define	M_PROTO3	0x00000040 /* protocol-specific */
-#define	M_PROTO4	0x00000080 /* protocol-specific */
-#define	M_PROTO5	0x00000100 /* protocol-specific */
-#define	M_BCAST		0x00000200 /* send/received as link-level broadcast */
-#define	M_MCAST		0x00000400 /* send/received as link-level multicast */
-#define	M_FRAG		0x00000800 /* packet is a fragment of a larger packet */
-#define	M_FIRSTFRAG	0x00001000 /* packet is first fragment */
-#define	M_LASTFRAG	0x00002000 /* packet is last fragment */
-#define	M_SKIP_FIREWALL	0x00004000 /* skip firewall processing */
-#define	M_FREELIST	0x00008000 /* mbuf is on the free list */
-#define	M_VLANTAG	0x00010000 /* ether_vtag is valid */
-#define	M_PROMISC	0x00020000 /* packet was not for us */
-#define	M_NOFREE	0x00040000 /* do not free mbuf, embedded in cluster */
-#define	M_PROTO6	0x00080000 /* protocol-specific */
-#define	M_PROTO7	0x00100000 /* protocol-specific */
-#define	M_PROTO8	0x00200000 /* protocol-specific */
+#define	M_EXT		0x00000001ULL /* has associated external storage */
+#define	M_PKTHDR	0x00000002ULL /* start of record */
+#define	M_EOR		0x00000004ULL /* end of record */
+#define	M_RDONLY	0x00000008ULL /* associated data is marked read-only */
+#define	M_PROTO1	0x00000010ULL /* protocol-specific */
+#define	M_PROTO2	0x00000020ULL /* protocol-specific */
+#define	M_PROTO3	0x00000040ULL /* protocol-specific */
+#define	M_PROTO4	0x00000080ULL /* protocol-specific */
+#define	M_PROTO5	0x00000100ULL /* protocol-specific */
+#define	M_BCAST		0x00000200ULL /* send/received as link-level broadcast */
+#define	M_MCAST		0x00000400ULL /* send/received as link-level multicast */
+#define	M_FRAG		0x00000800ULL /* packet is a fragment of a larger packet */
+#define	M_FIRSTFRAG	0x00001000ULL /* packet is first fragment */
+#define	M_LASTFRAG	0x00002000ULL /* packet is last fragment */
+#define	M_SKIP_FIREWALL	0x00004000ULL /* skip firewall processing */
+#define	M_FREELIST	0x00008000ULL /* mbuf is on the free list */
+#define	M_VLANTAG	0x00010000ULL /* ether_vtag is valid */
+#define	M_PROMISC	0x00020000ULL /* packet was not for us */
+#define	M_NOFREE	0x00040000ULL /* do not free mbuf, embedded in cluster */
+#define	M_PROTO6	0x00080000ULL /* protocol-specific */
+#define	M_PROTO7	0x00100000ULL /* protocol-specific */
+#define	M_PROTO8	0x00200000ULL /* protocol-specific */
 /*
  * For RELENG_{6,7} steal these flags for limited multiple routing table
  * support. In RELENG_8 and beyond, use just one flag and a tag.
-- 
1.7.6.153.g78432


[-- Attachment #6 --]
From d0530867a44e8029a4e64e69cacdcaa998a7e76f Mon Sep 17 00:00:00 2001
From: Arnaud Lacombe <lacombar@gmail.com>
Date: Wed, 31 Aug 2011 22:10:56 -0400
Subject: [PATCH 4/4] verify mbuf's flags consistency after tx

---
 sys/dev/e1000/if_em.c  |   14 +++++++++++++-
 sys/dev/e1000/if_igb.c |   14 +++++++++++++-
 sys/sys/mbuf.h         |    3 +++
 3 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/sys/dev/e1000/if_em.c b/sys/dev/e1000/if_em.c
index c5e936d..ca1647e 100644
--- a/sys/dev/e1000/if_em.c
+++ b/sys/dev/e1000/if_em.c
@@ -74,6 +74,7 @@
 #include <netinet/udp.h>
 
 #include <machine/in_cksum.h>
+#include <machine/_inttypes.h>
 #include <dev/led/led.h>
 #include <dev/pci/pcivar.h>
 #include <dev/pci/pcireg.h>
@@ -1907,6 +1908,8 @@ em_xmit(struct tx_ring *txr, struct mbuf **m_headp)
                 ctxd->lower.data |= htole32(E1000_TXD_CMD_VLE);
         }
 
+	m_head->m_flags |= M_UNUSED;
+
         tx_buffer->m_head = m_head;
 	tx_buffer_mapped->map = tx_buffer->map;
 	tx_buffer->map = map;
@@ -3535,7 +3538,16 @@ em_txeof(struct tx_ring *txr)
 				    BUS_DMASYNC_POSTWRITE);
 				bus_dmamap_unload(txr->txtag,
 				    tx_buffer->map);
-                        	m_freem(tx_buffer->m_head);
+				{
+					KASSERT((tx_buffer->m_head->m_flags & M_UNUSED) == M_UNUSED,
+					    ("Corrupted unused flags, expected 0x%" PRIx64 ", got 0x%" PRIx64 ", flags 0x%" PRIx64,
+					        M_UNUSED, tx_buffer->m_head->m_flags & M_UNUSED,
+					        tx_buffer->m_head->m_flags));
+					tx_buffer->m_head->m_flags &= ~M_UNUSED;
+
+					m_freem_arg(tx_buffer->m_head, (void *)0x00babe00);
+				}
+
                         	tx_buffer->m_head = NULL;
                 	}
 			tx_buffer->next_eop = -1;
diff --git a/sys/dev/e1000/if_igb.c b/sys/dev/e1000/if_igb.c
index 4700829..9ad616c 100644
--- a/sys/dev/e1000/if_igb.c
+++ b/sys/dev/e1000/if_igb.c
@@ -80,6 +80,7 @@
 #include <netinet/udp.h>
 
 #include <machine/in_cksum.h>
+#include <machine/_inttypes.h>
 #include <dev/led/led.h>
 #include <dev/pci/pcivar.h>
 #include <dev/pci/pcireg.h>
@@ -1655,6 +1656,8 @@ igb_xmit(struct tx_ring *txr, struct mbuf **m_headp)
 	txr->next_avail_desc = i;
 	txr->tx_avail -= nsegs;
 
+	m_head->m_flags |= M_UNUSED;
+
         tx_buffer->m_head = m_head;
 	tx_buffer_mapped->map = tx_buffer->map;
 	tx_buffer->map = map;
@@ -3374,7 +3377,16 @@ igb_txeof(struct tx_ring *txr)
 				bus_dmamap_unload(txr->txtag,
 				    tx_buffer->map);
 
-                        	m_freem(tx_buffer->m_head);
+				{
+					KASSERT((tx_buffer->m_head->m_flags & M_UNUSED) == M_UNUSED,
+					    ("Corrupted unused flags, expected 0x%" PRIx64 ", got 0x%" PRIx64 ", flags 0x%" PRIx64,
+					        M_UNUSED, tx_buffer->m_head->m_flags & M_UNUSED,
+					        tx_buffer->m_head->m_flags));
+					tx_buffer->m_head->m_flags &= ~M_UNUSED;
+
+					m_freem_arg(tx_buffer->m_head, (void *)0x00babe00);
+				}
+
                         	tx_buffer->m_head = NULL;
                 	}
 			tx_buffer->next_eop = -1;
diff --git a/sys/sys/mbuf.h b/sys/sys/mbuf.h
index 8ad09bb..299e691 100644
--- a/sys/sys/mbuf.h
+++ b/sys/sys/mbuf.h
@@ -191,6 +191,9 @@ struct mbuf {
 #define	M_PROTO6	0x00080000ULL /* protocol-specific */
 #define	M_PROTO7	0x00100000ULL /* protocol-specific */
 #define	M_PROTO8	0x00200000ULL /* protocol-specific */
+
+#define M_UNUSED	(~(0xffffffffULL))
+
 /*
  * For RELENG_{6,7} steal these flags for limited multiple routing table
  * support. In RELENG_8 and beyond, use just one flag and a tag.
-- 
1.7.6.153.g78432

help

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACqU3MUs9Z9GeuGe=8iVp=MWV6eG-tO%2BkHb1znatsTq2uEqwvA>