Date: Mon, 5 Sep 2011 02:59:52 -0400 From: Arnaud Lacombe <lacombar@gmail.com> To: freebsd-net@freebsd.org Cc: Jack Vogel <jfvogel@gmail.com> Subject: FreeBSD 7-STABLE mbuf corruption Message-ID: <CACqU3MUs9Z9GeuGe=8iVp=MWV6eG-tO%2BkHb1znatsTq2uEqwvA@mail.gmail.com>
index | next in thread | raw e-mail
[-- Attachment #1 --] Hi folks, We have been trying to track down a bad mbuf management for about two weeks on a customized 7.1 base. I have finally been able to reproduce it with a stock FreeBSD 7-STABLE (kernel from r225276, userland from 7.4). With the help of the attached patches, I have just been able to trigger the following panic: panic: Corrupted unused flags, expected 0xffffffff00000000, got 0x0, flags 0x3 cpuid = 1 Uptime: 3d10h5m3s Cannot dump. No dump device defined The patches taints high order 32bits of m_flags (extended to 64bits; those are thus unused) right before the mbuf is referenced in the TX ring of the igb(4) driver, I expect this value to never change until right before the mbuf is freed igb_txeof(). [About this, Jack, am I correct expecting the mbuf's flags not to be touched between igb_xmit() and igb_txeof() ?] I have strong suspicions that this is the cause of PR/155597, eventually a few others PR. My current assumption about the root cause of this behavior is that the same mbuf ends up being queued for TX twice. After the first TX, it gets released, eventually reused in socket's buffer, but ends up being freed again after the second TX, screwing the chains at the same time, leading to crashes of the box. On the crashes happens in multiple locations. We have seen crashes (both clean panic() and NULL-pointer dereference) in various places over the last weeks, first an almost daily panic() in sbdrop() or sbsndptr(), and NULL-pointer dereferences in hfsc_dequeue(), m_ext_free(), m_copym, and the list goes on. On the driver p.o.v, crashes happened with igb(4) end of last year. At the time, dropping the number of queue to 1 mitigated the problem... so far. Now, the daily crashes happens with em(4) (single queue by default on 7). We also have records of crashes in sbsndptr() on vr(4)-based devices. Crashes seen on em(4) configuration are almost always preceded by one or many: emX: discard frame w/o packet header which we agree, should not happen. On the traffic p.o.v., crashes happens on a 24h basis with a box handling about 30Mbps over a couple of thousands TCP connections. I have been able to get consistent crashes in about 1h with ALTQ enabled, proxying about 200 TCP connection over 200Mbps of traffic (unidirectional, sub-ms RTT). Crashes becomes a lot faster with ALTQ disabled, down to a reliable crash within 5min. The box is running a few hundreds ipfw rules. Without any ipfw rules loaded, crashes happens within 30min. Now, the FreeBSD 7-STABLE machine has been able to handle about 900Mbps of traffic, over 2*500 TCP connections (500 receiving, 500 sending), over 24h without crashing. It crashed almost instantly when I restarted the test today. The kernel has been built with INVARIANTS and INVARIANT_SUPPORT. Hardware enumerate as follow: CPU: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz (2493.76-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x1067a Family = 6 Model = 17 Stepping = 10 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x40ce3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,XSAVE> AMD Features=0x20100000<NX,LM> AMD Features2=0x1<LAHF> Cores per package: 4 real memory = 3757834240 (3583 MB) avail memory = 3678064640 (3507 MB) ACPI APIC Table: <100509 APIC1714> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 cpu (AP): APIC ID: 4 (disabled) cpu (AP): APIC ID: 5 (disabled) cpu (AP): APIC ID: 6 (disabled) cpu (AP): APIC ID: 7 (disabled) [...] igb0: <Intel(R) PRO/1000 Network Connection version - 2.2.5> port 0xec00-0xec1f mem 0xfdf60000-0xfdf7ffff,0xfdf40000-0xfdf5ffff,0xfdfb8000-0xfdfbbfff irq 16 at device 0.0 on pci7 igb0: Using MSIX interrupts with 5 vectors igb0: [ITHREAD] igb0: [ITHREAD] igb0: [ITHREAD] igb0: [ITHREAD] igb0: [ITHREAD] igb0: Ethernet address: 00:15:b2:xx:xx:xx igb1: <Intel(R) PRO/1000 Network Connection version - 2.2.5> port 0xec80-0xec9f mem 0xfdfe0000-0xfdffffff,0xfdfc0000-0xfdfdffff,0xfdfbc000-0xfdfbffff irq 17 at device 0.1 on pci7 igb1: Using MSIX interrupts with 5 vectors igb1: [ITHREAD] igb1: [ITHREAD] igb1: [ITHREAD] igb1: [ITHREAD] igb1: [ITHREAD] igb1: Ethernet address: 00:15:b2:xx:xx:xx em(4) and igb(4) are a direct backport from HEAD, plus the build fix I posted a few days ago. Custom mbuf debugging is attached, as well as the config of the kernel. The goal of the changes was first to enforce mbuf trashing, then locate the bogus m_freem()[0], thus modifying the mbuf free path to taint the mbuf not with the static 0xdeadc0de, but with the IP of the call-site. Then add some consistency check. At this point, any help is appreciated! Thanks in advance, - Arnaud [0]: the reason behind that is that I first got tons of crashes related 0xdeadc0de, in particular, an mbuf being tagged M_PROMISC, while the interface was not in promiscuous mode crashing on an unwanted call to m_freem() in ether_demub() [-- Attachment #2 --] # # GENERIC -- Generic kernel configuration file for FreeBSD/i386 # # For more information on this file, please read the handbook section on # Kernel Configuration Files: # # http://www.FreeBSD.org/doc/en_US.ISO8859-1/books/handbook/kernelconfig-config.html # # The handbook is also available locally in /usr/share/doc/handbook # if you've installed the doc distribution, otherwise always see the # FreeBSD World Wide Web server (http://www.FreeBSD.org/) for the # latest information. # # An exhaustive list of options and more detailed explanations of the # device lines is also present in the ../../conf/NOTES and NOTES files. # If you are in doubt as to the purpose or necessity of a line, check first # in NOTES. # # $FreeBSD$ #cpu I486_CPU #cpu I586_CPU cpu I686_CPU ident GENERIC # To statically compile in device wiring instead of /boot/device.hints #hints "GENERIC.hints" # Default places to look for devices. makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols options SCHED_ULE # ULE scheduler options PREEMPTION # Enable kernel thread preemption options INET # InterNETworking options INET6 # IPv6 communications protocols #options SCTP # Stream Control Transmission Protocol options FFS # Berkeley Fast Filesystem options SOFTUPDATES # Enable FFS soft updates support options UFS_ACL # Support for access control lists options UFS_DIRHASH # Improve performance on big directories options UFS_GJOURNAL # Enable gjournal-based UFS journaling options MD_ROOT # MD is a potential root device #options NFSCLIENT # Network Filesystem Client #options NFSSERVER # Network Filesystem Server #options NFSLOCKD # Network Lock Manager #options NFS_ROOT # NFS usable as /, requires NFSCLIENT #options MSDOSFS # MSDOS Filesystem options CD9660 # ISO 9660 Filesystem options PROCFS # Process filesystem (requires PSEUDOFS) options PSEUDOFS # Pseudo-filesystem framework options GEOM_PART_GPT # GUID Partition Tables. options GEOM_LABEL # Provides labelization options COMPAT_43TTY # BSD 4.3 TTY compat [KEEP THIS!] #options COMPAT_FREEBSD4 # Compatible with FreeBSD4 #options COMPAT_FREEBSD5 # Compatible with FreeBSD5 #options COMPAT_FREEBSD6 # Compatible with FreeBSD6 #options SCSI_DELAY=5000 # Delay (in ms) before probing SCSI options KTRACE # ktrace(1) support options STACK # stack(9) support options SYSVSHM # SYSV-style shared memory options SYSVMSG # SYSV-style message queues options SYSVSEM # SYSV-style semaphores options P1003_1B_SEMAPHORES # POSIX-style semaphores options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions options KBD_INSTALL_CDEV # install a CDEV entry in /dev options ADAPTIVE_GIANT # Giant mutex is adaptive. options STOP_NMI # Stop CPUS using NMI instead of IPI options AUDIT # Security event auditing #options KDTRACE_HOOKS # Kernel DTrace hooks options INCLUDE_CONFIG_FILE # Include this file in kernel # To make an SMP kernel, the next two lines are needed options SMP # Symmetric MultiProcessor Kernel device apic # I/O APIC # CPU frequency control device cpufreq # Bus support. device eisa device pci # Floppy drives #device fdc # ATA and ATAPI devices device ata device atadisk # ATA disk drives device ataraid # ATA RAID drives #device atapicd # ATAPI CDROM drives #device atapifd # ATAPI floppy drives #device atapist # ATAPI tape drives options ATA_STATIC_ID # Static device numbering # SCSI Controllers #device ahb # EISA AHA1742 family #device ahc # AHA2940 and onboard AIC7xxx devices #options AHC_REG_PRETTY_PRINT # Print register bitfields in debug # output. Adds ~128k to driver. #device ahd # AHA39320/29320 and onboard AIC79xx devices #options AHD_REG_PRETTY_PRINT # Print register bitfields in debug # output. Adds ~215k to driver. #device amd # AMD 53C974 (Tekram DC-390(T)) #device hptiop # Highpoint RocketRaid 3xxx series #device isp # Qlogic family #device ispfw # Firmware for QLogic HBAs- normally a module #device mpt # LSI-Logic MPT-Fusion #device ncr # NCR/Symbios Logic #device sym # NCR/Symbios Logic (newer chipsets + those of `ncr') #device trm # Tekram DC395U/UW/F DC315U adapters #device adv # Advansys SCSI adapters #device adw # Advansys wide SCSI adapters #device aha # Adaptec 154x SCSI adapters #device aic # Adaptec 15[012]x SCSI adapters, AIC-6[23]60. #device bt # Buslogic/Mylex MultiMaster SCSI adapters #device ncv # NCR 53C500 #device nsp # Workbit Ninja SCSI-3 #device stg # TMC 18C30/18C50 # SCSI peripherals device scbus # SCSI bus (required for SCSI) device ch # SCSI media changers device da # Direct Access (disks) #device sa # Sequential Access (tape etc) #device cd # CD device pass # Passthrough device (direct SCSI access) device ses # SCSI Environmental Services (and SAF-TE) # RAID controllers interfaced to the SCSI subsystem #device amr # AMI MegaRAID #device arcmsr # Areca SATA II RAID #device asr # DPT SmartRAID V, VI and Adaptec SCSI RAID #device ciss # Compaq Smart RAID 5* #device dpt # DPT Smartcache III, IV - See NOTES for options #device hptmv # Highpoint RocketRAID 182x #device hptrr # Highpoint RocketRAID 17xx, 22xx, 23xx, 25xx #device iir # Intel Integrated RAID #device ips # IBM (Adaptec) ServeRAID #device mly # Mylex AcceleRAID/eXtremeRAID #device twa # 3ware 9000 series PATA/SATA RAID # RAID controllers #device aac # Adaptec FSA RAID #device aacp # SCSI passthrough for aac (requires CAM) #device ida # Compaq Smart RAID #device mfi # LSI MegaRAID SAS #device mlx # Mylex DAC960 family #device pst # Promise Supertrak SX6000 #device twe # 3ware ATA RAID # atkbdc0 controls both the keyboard and the PS/2 mouse device atkbdc # AT keyboard controller device atkbd # AT keyboard device psm # PS/2 mouse device kbdmux # keyboard multiplexer device vga # VGA video card driver #device splash # Splash screen and screen saver support # syscons is the default console driver, resembling an SCO console device sc device agp # support several AGP chipsets # Power management support (see NOTES for more options) #device apm # Add suspend/resume support for the i8254. device pmtimer # PCCARD (PCMCIA) support # PCMCIA and cardbus bridge support #device cbb # cardbus (yenta) bridge #device pccard # PC Card (16-bit) bus #device cardbus # CardBus (32-bit) bus # Serial (COM) ports device sio # 8250, 16[45]50 based serial ports device uart # Generic UART driver # Parallel port #device ppc #device ppbus # Parallel port bus (required) #device lpt # Printer #device plip # TCP/IP over parallel #device ppi # Parallel port interface device #device vpo # Requires scbus and da # If you've got a "dumb" serial or parallel PCI card that is # supported by the puc(4) glue driver, uncomment the following # line to enable it (connects to sio, uart and/or ppc drivers): #device puc # PCI Ethernet NICs. #device de # DEC/Intel DC21x4x (``Tulip'') device em # Intel PRO/1000 Gigabit Ethernet Family device igb # Intel PRO/1000 PCIE Server Gigabit Family #device ixgb # Intel PRO/10GbE Ethernet Card #device le # AMD Am7900 LANCE and Am79C9xx PCnet #device txp # 3Com 3cR990 (``Typhoon'') #device vx # 3Com 3c590, 3c595 (``Vortex'') # PCI Ethernet NICs that use the common MII bus controller code. # NOTE: Be sure to keep the 'device miibus' line in order to use these NICs! device miibus # MII bus support #device age # Attansic/Atheros L1 Gigabit Ethernet #device alc # Atheros AR8131/AR8132 Ethernet #device ale # Atheros AR8121/AR8113/AR8114 Ethernet #device bce # Broadcom BCM5706/BCM5708 Gigabit Ethernet #device bfe # Broadcom BCM440x 10/100 Ethernet #device bge # Broadcom BCM570xx Gigabit Ethernet #device dc # DEC/Intel 21143 and various workalikes #device et # Agere ET1310 10/100/Gigabit Ethernet #device fxp # Intel EtherExpress PRO/100B (82557, 82558) #device jme # JMicron JMC250 Gigabit/JMC260 Fast Ethernet #device lge # Level 1 LXT1001 gigabit Ethernet #device msk # Marvell/SysKonnect Yukon II Gigabit Ethernet #device nfe # nVidia nForce MCP on-board Ethernet #device nge # NatSemi DP83820 gigabit Ethernet #device nve # nVidia nForce MCP on-board Ethernet Networking #device pcn # AMD Am79C97x PCI 10/100 (precedence over 'le') #device re # RealTek 8139C+/8169/8169S/8110S #device rl # RealTek 8129/8139 #device sf # Adaptec AIC-6915 (``Starfire'') #device sge # Silicon Integrated Systems SiS190/191 #device sis # Silicon Integrated Systems SiS 900/SiS 7016 #device sk # SysKonnect SK-984x & SK-982x gigabit Ethernet #device ste # Sundance ST201 (D-Link DFE-550TX) #device stge # Sundance/Tamarack TC9021 gigabit Ethernet #device ti # Alteon Networks Tigon I/II gigabit Ethernet #device tl # Texas Instruments ThunderLAN #device tx # SMC EtherPower II (83c170 ``EPIC'') #device vge # VIA VT612x gigabit Ethernet #device vr # VIA Rhine, Rhine II #device vte # DM&P Vortex86 RDC R6040 Fast Ethernet #device wb # Winbond W89C840F #device xl # 3Com 3c90x (``Boomerang'', ``Cyclone'') # ISA Ethernet NICs. pccard NICs included. #device cs # Crystal Semiconductor CS89x0 NIC # 'device ed' requires 'device miibus' #device ed # NE[12]000, SMC Ultra, 3c503, DS8390 cards #device ex # Intel EtherExpress Pro/10 and Pro/10+ #device ep # Etherlink III based cards #device fe # Fujitsu MB8696x based cards #device ie # EtherExpress 8/16, 3C507, StarLAN 10 etc. #device sn # SMC's 9000 series of Ethernet chips #device xe # Xircom pccard Ethernet # Wireless NIC cards #device wlan # 802.11 support #device wlan_wep # 802.11 WEP support #device wlan_ccmp # 802.11 CCMP support #device wlan_tkip # 802.11 TKIP support #device wlan_amrr # AMRR transmit rate control algorithm #device wlan_scan_ap # 802.11 AP mode scanning #device wlan_scan_sta # 802.11 STA mode scanning #device an # Aironet 4500/4800 802.11 wireless NICs. #device ath # Atheros pci/cardbus NIC's #device ath_hal # Atheros HAL (Hardware Access Layer) #options AH_SUPPORT_AR5416 # enable AR5416 tx/rx descriptors #device ath_rate_sample # SampleRate tx rate control for ath #device awi # BayStack 660 and others #device ral # Ralink Technology RT2500 wireless NICs. #device wi # WaveLAN/Intersil/Symbol 802.11 wireless NICs. #device wl # Older non 802.11 Wavelan wireless NIC. # Pseudo devices. device loop # Network loopback device random # Entropy device device ether # Ethernet support device vlan # 802.1Q VLAN support device sl # Kernel SLIP device ppp # Kernel PPP device tun # Packet tunnel. device pty # Pseudo-ttys (telnet etc) device md # Memory "disks" device gif # IPv6 and IPv4 tunneling device faith # IPv6-to-IPv4 relaying (translation) device firmware # firmware assist module # The `bpf' device enables the Berkeley Packet Filter. # Be aware of the administrative consequences of enabling this! # Note that 'bpf' is required for DHCP. device bpf # Berkeley packet filter # options INVARIANTS options INVARIANT_SUPPORT # options IPFIREWALL options IPFIREWALL_DEFAULT_TO_ACCEPT options IPFIREWALL_FORWARD options IPFIREWALL_VERBOSE # device pf # # options ALTQ options ALTQ_CBQ # Class Based Queueing options ALTQ_RED # Random Early Detection options ALTQ_RIO # RED In/Out options ALTQ_HFSC # Hierarchical Packet Scheduler options ALTQ_CDNR # Traffic conditioner options ALTQ_PRIQ # Priority Queueing options ALTQ_NOPCC # Required if the TSC is unusable # USB support #device uhci # UHCI PCI->USB interface #device ohci # OHCI PCI->USB interface #device ehci # EHCI PCI->USB interface (USB 2.0) #device usb # USB Bus (required) #device udbp # USB Double Bulk Pipe devices #device ugen # Generic #device uhid # "Human Interface Devices" #device ukbd # Keyboard #device ulpt # Printer #device umass # Disks/Mass storage - Requires scbus and da #device ums # Mouse #device urio # Diamond Rio 500 MP3 player #device uscanner # Scanners # USB Serial devices #device ucom # Generic com ttys #device uark # Technologies ARK3116 based serial adapters #device ubsa # Belkin F5U103 and compatible serial adapters #device ubser # BWCT console serial adapters #device uftdi # For FTDI usb serial adapters #device uipaq # Some WinCE based devices #device uplcom # Prolific PL-2303 serial adapters #device uslcom # SI Labs CP2101/CP2102 serial adapters #device uvisor # Visor and Palm devices #device uvscom # USB serial support for DDI pocket's PHS # USB Ethernet, requires miibus #device aue # ADMtek USB Ethernet #device axe # ASIX Electronics USB Ethernet #device cdce # Generic USB over Ethernet #device cue # CATC USB Ethernet #device kue # Kawasaki LSI USB Ethernet #device rue # RealTek RTL8150 USB Ethernet # USB Wireless #device rum # Ralink Technology RT2501USB wireless NICs #device ural # Ralink Technology RT2500USB wireless NICs # FireWire support #device firewire # FireWire bus code #device sbp # SCSI over FireWire (Requires scbus and da) #device fwe # Ethernet over FireWire (non-standard!) #device fwip # IP over FireWire (RFC 2734,3146) #device dcons # Dumb console driver #device dcons_crom # Configuration ROM for dcons [-- Attachment #3 --] From 5221519fc7d4c31b6c1ea2c7581bd998b7505e65 Mon Sep 17 00:00:00 2001 From: Arnaud Lacombe <lacombar@gmail.com> Date: Tue, 30 Aug 2011 21:55:47 -0400 Subject: [PATCH 1/4] inlinize m_freem() --- sys/kern/uipc_mbuf.c | 12 ------------ sys/sys/mbuf.h | 14 +++++++++++++- 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/sys/kern/uipc_mbuf.c b/sys/kern/uipc_mbuf.c index f823e26..9f9288f 100644 --- a/sys/kern/uipc_mbuf.c +++ b/sys/kern/uipc_mbuf.c @@ -151,18 +151,6 @@ m_getm2(struct mbuf *m, int len, int how, short type, int flags) return (m); } -/* - * Free an entire chain of mbufs and associated external buffers, if - * applicable. - */ -void -m_freem(struct mbuf *mb) -{ - - while (mb != NULL) - mb = m_free(mb); -} - /*- * Configure a provided mbuf to refer to the provided external storage * buffer and setup a reference count for said buffer. If the setting diff --git a/sys/sys/mbuf.h b/sys/sys/mbuf.h index 8529cca..f3f98b0 100644 --- a/sys/sys/mbuf.h +++ b/sys/sys/mbuf.h @@ -358,6 +358,7 @@ static __inline struct mbuf *m_getjcl(int how, short type, int flags, int size); static __inline struct mbuf *m_getclr(int how, short type); /* XXX */ static __inline struct mbuf *m_free(struct mbuf *m); +static __inline void m_freem(struct mbuf *m); static __inline void m_clget(struct mbuf *m, int how); static __inline void *m_cljget(struct mbuf *m, int how, int size); static __inline void m_chtype(struct mbuf *m, short new_type); @@ -520,6 +521,18 @@ m_free(struct mbuf *m) return (n); } +/* + * Free an entire chain of mbufs and associated external buffers, if + * applicable. + */ +static __inline void +m_freem(struct mbuf *m) +{ + + while (m != NULL) + m = m_free(m); +} + static __inline void m_clget(struct mbuf *m, int how) { @@ -767,7 +780,6 @@ struct mbuf *m_dup(struct mbuf *, int); int m_dup_pkthdr(struct mbuf *, struct mbuf *, int); u_int m_fixhdr(struct mbuf *); struct mbuf *m_fragment(struct mbuf *, int, int); -void m_freem(struct mbuf *); struct mbuf *m_getm2(struct mbuf *, int, int, short, int); struct mbuf *m_getptr(struct mbuf *, int, int *); u_int m_length(struct mbuf *, struct mbuf **); -- 1.7.6.153.g78432 [-- Attachment #4 --] From 0f2ada7a85c9e665be23f4801e1722e776492fdc Mon Sep 17 00:00:00 2001 From: Arnaud Lacombe <lacombar@gmail.com> Date: Wed, 31 Aug 2011 12:16:58 -0400 Subject: [PATCH 2/4] mbuf use-after-free marking --- sys/kern/uipc_mbuf.c | 4 ++-- sys/sys/mbuf.h | 37 ++++++++++++++++++++++++++++++++----- sys/vm/uma_dbg.c | 6 ++++-- 3 files changed, 38 insertions(+), 9 deletions(-) diff --git a/sys/kern/uipc_mbuf.c b/sys/kern/uipc_mbuf.c index 9f9288f..9b92fb0 100644 --- a/sys/kern/uipc_mbuf.c +++ b/sys/kern/uipc_mbuf.c @@ -197,7 +197,7 @@ m_extadd(struct mbuf *mb, caddr_t buf, u_int size, * storage attached to them if the reference count hits 1. */ void -mb_free_ext(struct mbuf *m) +mb_free_ext(struct mbuf *m, void *arg) { int skipmbuf; @@ -264,7 +264,7 @@ mb_free_ext(struct mbuf *m) m->m_ext.ext_size = 0; m->m_ext.ext_type = 0; m->m_flags &= ~M_EXT; - uma_zfree(zone_mbuf, m); + uma_zfree_arg(zone_mbuf, m, arg); } /* diff --git a/sys/sys/mbuf.h b/sys/sys/mbuf.h index f3f98b0..19ea5e8 100644 --- a/sys/sys/mbuf.h +++ b/sys/sys/mbuf.h @@ -362,7 +362,7 @@ static __inline void m_freem(struct mbuf *m); static __inline void m_clget(struct mbuf *m, int how); static __inline void *m_cljget(struct mbuf *m, int how, int size); static __inline void m_chtype(struct mbuf *m, short new_type); -void mb_free_ext(struct mbuf *); +void mb_free_ext(struct mbuf *, void *); static __inline struct mbuf *m_last(struct mbuf *m); static __inline int @@ -506,21 +506,39 @@ m_free_fast(struct mbuf *m) { KASSERT(SLIST_EMPTY(&m->m_pkthdr.tags), ("doing fast free of mbuf with tags")); +#ifdef INVARIANTS + uma_zfree_arg(zone_mbuf, m, (void *)(0xf00f0000 | MB_NOTAGS)); +#else uma_zfree_arg(zone_mbuf, m, (void *)MB_NOTAGS); +#endif } static __inline struct mbuf * -m_free(struct mbuf *m) +m_free_arg(struct mbuf *m, void *arg) { struct mbuf *n = m->m_next; if (m->m_flags & M_EXT) - mb_free_ext(m); + mb_free_ext(m, arg); else if ((m->m_flags & M_NOFREE) == 0) - uma_zfree(zone_mbuf, m); + uma_zfree_arg(zone_mbuf, m, arg); + return (n); } +static __inline struct mbuf * +m_free(struct mbuf *m) +{ + + return m_free_arg(m, 0); +} + +#ifdef INVARIANTS +#define _THIS_IP_ ({ __label__ __here; __here: (unsigned long)&&__here; }) +#else +#define _THIS_IP_ 0 +#endif + /* * Free an entire chain of mbufs and associated external buffers, if * applicable. @@ -528,9 +546,18 @@ m_free(struct mbuf *m) static __inline void m_freem(struct mbuf *m) { + unsigned long this_ip = (_THIS_IP_ & 0x00ffff00) | (_THIS_IP_ & 0xff) << 24; + + while (m != NULL) + m = m_free_arg(m, (void *)this_ip); +} + +static __inline void +m_freem_arg(struct mbuf *m, void *arg) +{ while (m != NULL) - m = m_free(m); + m = m_free_arg(m, arg); } static __inline void diff --git a/sys/vm/uma_dbg.c b/sys/vm/uma_dbg.c index 9075bf9..669b1f5 100644 --- a/sys/vm/uma_dbg.c +++ b/sys/vm/uma_dbg.c @@ -61,6 +61,7 @@ static const u_int32_t uma_junk = 0xdeadc0de; int trash_ctor(void *mem, int size, void *arg, int flags) { +#if 0 int cnt; u_int32_t *p; @@ -72,6 +73,7 @@ trash_ctor(void *mem, int size, void *arg, int flags) mem, size, *p, p); return (0); } +#endif return (0); } @@ -90,7 +92,7 @@ trash_dtor(void *mem, int size, void *arg) cnt = size / sizeof(uma_junk); for (p = mem; cnt > 0; cnt--, p++) - *p = uma_junk; + *p = (unsigned long)arg; } /* @@ -102,7 +104,7 @@ trash_dtor(void *mem, int size, void *arg) int trash_init(void *mem, int size, int flags) { - trash_dtor(mem, size, NULL); + trash_dtor(mem, size, (void *)uma_junk); return (0); } -- 1.7.6.153.g78432 [-- Attachment #5 --] From c58e8728e4dd75b542fa870bfc7420a3c555d2ca Mon Sep 17 00:00:00 2001 From: Arnaud Lacombe <lacombar@gmail.com> Date: Wed, 31 Aug 2011 22:08:04 -0400 Subject: [PATCH 3/4] 64bits mbuf flags --- sys/kern/uipc_mbuf.c | 6 +++++- sys/sys/mbuf.h | 46 +++++++++++++++++++++++----------------------- 2 files changed, 28 insertions(+), 24 deletions(-) diff --git a/sys/kern/uipc_mbuf.c b/sys/kern/uipc_mbuf.c index 9b92fb0..24f9ec8 100644 --- a/sys/kern/uipc_mbuf.c +++ b/sys/kern/uipc_mbuf.c @@ -50,6 +50,8 @@ __FBSDID("$FreeBSD$"); #include <security/mac/mac_framework.h> +#include <machine/_inttypes.h> + int max_linkhdr; int max_protohdr; int max_hdr; @@ -1405,10 +1407,12 @@ m_print(const struct mbuf *m, int maxlen) pdata = m2->m_len; if (maxlen != -1 && pdata > maxlen) pdata = maxlen; +#if 0 printf("mbuf: %p len: %d, next: %p, %b%s", m2, m2->m_len, m2->m_next, m2->m_flags, "\20\20freelist\17skipfw" "\11proto5\10proto4\7proto3\6proto2\5proto1\4rdonly" "\3eor\2pkthdr\1ext", pdata ? "" : "\n"); +#endif if (pdata) printf(", %*D\n", pdata, (u_char *)m2->m_data, "-"); if (len != -1) @@ -1835,7 +1839,7 @@ m_unshare(struct mbuf *m0, int how) * it anyway, we try to reduce the number of mbufs and * clusters so that future work is easier). */ - KASSERT(m->m_flags & M_EXT, ("m_flags 0x%x", m->m_flags)); + KASSERT(m->m_flags & M_EXT, ("m_flags 0x%" PRIx64, m->m_flags)); /* NB: we only coalesce into a cluster or larger */ if (mprev != NULL && (mprev->m_flags & M_EXT) && m->m_len <= M_TRAILINGSPACE(mprev)) { diff --git a/sys/sys/mbuf.h b/sys/sys/mbuf.h index 19ea5e8..8ad09bb 100644 --- a/sys/sys/mbuf.h +++ b/sys/sys/mbuf.h @@ -91,7 +91,7 @@ struct m_hdr { struct mbuf *mh_nextpkt; /* next chain in queue/record */ caddr_t mh_data; /* location of data */ int mh_len; /* amount of data in this mbuf */ - int mh_flags; /* flags; see below */ + uint64_t mh_flags; /* flags; see below */ short mh_type; /* type of data in this mbuf */ uint8_t pad[M_HDR_PAD];/* word align */ }; @@ -169,28 +169,28 @@ struct mbuf { /* * mbuf flags. */ -#define M_EXT 0x00000001 /* has associated external storage */ -#define M_PKTHDR 0x00000002 /* start of record */ -#define M_EOR 0x00000004 /* end of record */ -#define M_RDONLY 0x00000008 /* associated data is marked read-only */ -#define M_PROTO1 0x00000010 /* protocol-specific */ -#define M_PROTO2 0x00000020 /* protocol-specific */ -#define M_PROTO3 0x00000040 /* protocol-specific */ -#define M_PROTO4 0x00000080 /* protocol-specific */ -#define M_PROTO5 0x00000100 /* protocol-specific */ -#define M_BCAST 0x00000200 /* send/received as link-level broadcast */ -#define M_MCAST 0x00000400 /* send/received as link-level multicast */ -#define M_FRAG 0x00000800 /* packet is a fragment of a larger packet */ -#define M_FIRSTFRAG 0x00001000 /* packet is first fragment */ -#define M_LASTFRAG 0x00002000 /* packet is last fragment */ -#define M_SKIP_FIREWALL 0x00004000 /* skip firewall processing */ -#define M_FREELIST 0x00008000 /* mbuf is on the free list */ -#define M_VLANTAG 0x00010000 /* ether_vtag is valid */ -#define M_PROMISC 0x00020000 /* packet was not for us */ -#define M_NOFREE 0x00040000 /* do not free mbuf, embedded in cluster */ -#define M_PROTO6 0x00080000 /* protocol-specific */ -#define M_PROTO7 0x00100000 /* protocol-specific */ -#define M_PROTO8 0x00200000 /* protocol-specific */ +#define M_EXT 0x00000001ULL /* has associated external storage */ +#define M_PKTHDR 0x00000002ULL /* start of record */ +#define M_EOR 0x00000004ULL /* end of record */ +#define M_RDONLY 0x00000008ULL /* associated data is marked read-only */ +#define M_PROTO1 0x00000010ULL /* protocol-specific */ +#define M_PROTO2 0x00000020ULL /* protocol-specific */ +#define M_PROTO3 0x00000040ULL /* protocol-specific */ +#define M_PROTO4 0x00000080ULL /* protocol-specific */ +#define M_PROTO5 0x00000100ULL /* protocol-specific */ +#define M_BCAST 0x00000200ULL /* send/received as link-level broadcast */ +#define M_MCAST 0x00000400ULL /* send/received as link-level multicast */ +#define M_FRAG 0x00000800ULL /* packet is a fragment of a larger packet */ +#define M_FIRSTFRAG 0x00001000ULL /* packet is first fragment */ +#define M_LASTFRAG 0x00002000ULL /* packet is last fragment */ +#define M_SKIP_FIREWALL 0x00004000ULL /* skip firewall processing */ +#define M_FREELIST 0x00008000ULL /* mbuf is on the free list */ +#define M_VLANTAG 0x00010000ULL /* ether_vtag is valid */ +#define M_PROMISC 0x00020000ULL /* packet was not for us */ +#define M_NOFREE 0x00040000ULL /* do not free mbuf, embedded in cluster */ +#define M_PROTO6 0x00080000ULL /* protocol-specific */ +#define M_PROTO7 0x00100000ULL /* protocol-specific */ +#define M_PROTO8 0x00200000ULL /* protocol-specific */ /* * For RELENG_{6,7} steal these flags for limited multiple routing table * support. In RELENG_8 and beyond, use just one flag and a tag. -- 1.7.6.153.g78432 [-- Attachment #6 --] From d0530867a44e8029a4e64e69cacdcaa998a7e76f Mon Sep 17 00:00:00 2001 From: Arnaud Lacombe <lacombar@gmail.com> Date: Wed, 31 Aug 2011 22:10:56 -0400 Subject: [PATCH 4/4] verify mbuf's flags consistency after tx --- sys/dev/e1000/if_em.c | 14 +++++++++++++- sys/dev/e1000/if_igb.c | 14 +++++++++++++- sys/sys/mbuf.h | 3 +++ 3 files changed, 29 insertions(+), 2 deletions(-) diff --git a/sys/dev/e1000/if_em.c b/sys/dev/e1000/if_em.c index c5e936d..ca1647e 100644 --- a/sys/dev/e1000/if_em.c +++ b/sys/dev/e1000/if_em.c @@ -74,6 +74,7 @@ #include <netinet/udp.h> #include <machine/in_cksum.h> +#include <machine/_inttypes.h> #include <dev/led/led.h> #include <dev/pci/pcivar.h> #include <dev/pci/pcireg.h> @@ -1907,6 +1908,8 @@ em_xmit(struct tx_ring *txr, struct mbuf **m_headp) ctxd->lower.data |= htole32(E1000_TXD_CMD_VLE); } + m_head->m_flags |= M_UNUSED; + tx_buffer->m_head = m_head; tx_buffer_mapped->map = tx_buffer->map; tx_buffer->map = map; @@ -3535,7 +3538,16 @@ em_txeof(struct tx_ring *txr) BUS_DMASYNC_POSTWRITE); bus_dmamap_unload(txr->txtag, tx_buffer->map); - m_freem(tx_buffer->m_head); + { + KASSERT((tx_buffer->m_head->m_flags & M_UNUSED) == M_UNUSED, + ("Corrupted unused flags, expected 0x%" PRIx64 ", got 0x%" PRIx64 ", flags 0x%" PRIx64, + M_UNUSED, tx_buffer->m_head->m_flags & M_UNUSED, + tx_buffer->m_head->m_flags)); + tx_buffer->m_head->m_flags &= ~M_UNUSED; + + m_freem_arg(tx_buffer->m_head, (void *)0x00babe00); + } + tx_buffer->m_head = NULL; } tx_buffer->next_eop = -1; diff --git a/sys/dev/e1000/if_igb.c b/sys/dev/e1000/if_igb.c index 4700829..9ad616c 100644 --- a/sys/dev/e1000/if_igb.c +++ b/sys/dev/e1000/if_igb.c @@ -80,6 +80,7 @@ #include <netinet/udp.h> #include <machine/in_cksum.h> +#include <machine/_inttypes.h> #include <dev/led/led.h> #include <dev/pci/pcivar.h> #include <dev/pci/pcireg.h> @@ -1655,6 +1656,8 @@ igb_xmit(struct tx_ring *txr, struct mbuf **m_headp) txr->next_avail_desc = i; txr->tx_avail -= nsegs; + m_head->m_flags |= M_UNUSED; + tx_buffer->m_head = m_head; tx_buffer_mapped->map = tx_buffer->map; tx_buffer->map = map; @@ -3374,7 +3377,16 @@ igb_txeof(struct tx_ring *txr) bus_dmamap_unload(txr->txtag, tx_buffer->map); - m_freem(tx_buffer->m_head); + { + KASSERT((tx_buffer->m_head->m_flags & M_UNUSED) == M_UNUSED, + ("Corrupted unused flags, expected 0x%" PRIx64 ", got 0x%" PRIx64 ", flags 0x%" PRIx64, + M_UNUSED, tx_buffer->m_head->m_flags & M_UNUSED, + tx_buffer->m_head->m_flags)); + tx_buffer->m_head->m_flags &= ~M_UNUSED; + + m_freem_arg(tx_buffer->m_head, (void *)0x00babe00); + } + tx_buffer->m_head = NULL; } tx_buffer->next_eop = -1; diff --git a/sys/sys/mbuf.h b/sys/sys/mbuf.h index 8ad09bb..299e691 100644 --- a/sys/sys/mbuf.h +++ b/sys/sys/mbuf.h @@ -191,6 +191,9 @@ struct mbuf { #define M_PROTO6 0x00080000ULL /* protocol-specific */ #define M_PROTO7 0x00100000ULL /* protocol-specific */ #define M_PROTO8 0x00200000ULL /* protocol-specific */ + +#define M_UNUSED (~(0xffffffffULL)) + /* * For RELENG_{6,7} steal these flags for limited multiple routing table * support. In RELENG_8 and beyond, use just one flag and a tag. -- 1.7.6.153.g78432help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACqU3MUs9Z9GeuGe=8iVp=MWV6eG-tO%2BkHb1znatsTq2uEqwvA>
