From owner-freebsd-stable@FreeBSD.ORG Mon Aug 30 11:08:47 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 71B1510656A7 for ; Mon, 30 Aug 2010 11:08:47 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta06.emeryville.ca.mail.comcast.net (qmta06.emeryville.ca.mail.comcast.net [76.96.30.56]) by mx1.freebsd.org (Postfix) with ESMTP id 592CF8FC20 for ; Mon, 30 Aug 2010 11:08:47 +0000 (UTC) Received: from omta05.emeryville.ca.mail.comcast.net ([76.96.30.43]) by qmta06.emeryville.ca.mail.comcast.net with comcast id 0b8X1f0050vp7WLA6b8nEf; Mon, 30 Aug 2010 11:08:47 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta05.emeryville.ca.mail.comcast.net with comcast id 0b8l1f0063LrwQ28Rb8mlx; Mon, 30 Aug 2010 11:08:46 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id D6A719B42E; Mon, 30 Aug 2010 04:08:45 -0700 (PDT) Date: Mon, 30 Aug 2010 04:08:45 -0700 From: Jeremy Chadwick To: Greg Byshenk Message-ID: <20100830110845.GA31629@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) Cc: PYUN YongHyeon , freebsd-stable@freebsd.org, Jack Vogel Subject: Re: Crashes on X7SPE-HF with em X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Aug 2010 11:08:47 -0000 Bcc: Subject: Re: igb related(?) panics on 7.3-STABLE Reply-To: In-Reply-To: <20100830094631.GD12467@core.byshenk.net> On Mon, Aug 30, 2010 at 11:46:31AM +0200, Greg Byshenk wrote: > On Sun, Aug 29, 2010 at 08:16:59PM +0200, Greg Byshenk wrote: > > > I've begun seeing problems on a machine running FreeBSD-7.3-STABLE, 64-bit, > > with two igb nics in use. Previously the machine was fine, running earlier > > versions of 7-STABLE, although the load on the network has increased due > > to additional machines being added to the network (the machine functions > > as a fileserver, serving files to compute machines via NFS(v3)). > > > > Any advice is much appreciated. System info is below. > > > Followup with more information. The machine just panic'ed again, with > a lot of load on the network. > > Output from the 'systat' that was running at the time: > > 3 users Load 54.47 42.35 24.25 Aug 30 11:17 > > Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER > Tot Share Tot Share Free in out in out > Act 46232 5504 868140 10548 943324 count > All 456484 7852 1074772k 27740 pages > Proc: Interrupts > r p d s w Csw Trp Sys Int Sof Flt cow 54220 total > 1 170 392k 8 278 22k 195 1 zfod sio0 irq4 > ozfod fdc0 irq6 > 70.4%Sys 3.1%Intr 0.0%User 0.0%Nice 26.5%Idle %ozfod 27 twa0 uhci0 > | | | | | | | | | | | daefr 2001 cpu0: time > ===================================++ prcfr igb0 256 > 9938 dtbuf 1247 totfr igb0 257 > Namei Name-cache Dir-cache 100000 desvn react igb0 258 > Calls hits % hits % 34443 numvn 1 pdwak igb0 259 > 24996 frevn 112852 pdpgs igb0 262 > intrn igb0 263 > Disks da0 da1 pass0 pass1 2570672 wire igb0 264 > KB/t 0.00 12.23 0.00 0.00 46760 act igb0 265 > tps 0 26 0 0 14706896 inact 19449 igb1 266 > MB/s 0.00 0.31 0.00 0.00 0 769796 26585 > 0 21 0 0 173528 > > > -greg > > > > > Machine: > > ======= > > > > FreeBSD server.example.com 7.3-STABLE FreeBSD 7.3-STABLE #36: Wed Aug 25 11:01:07 CEST 2010 root@server.example.com:/usr/obj/usr/src/sys/KERNEL amd64 > > > > Kernel was csup'd earlier in the day on 25 August, immediately prior to > > the build. > > > > > > Panic: > > ====== > > > > Fatal trap 9: general protection fault while in kernel mode > > cpuid = 2; apic id = 02 > > instruction pointer = 0x8:0xffffffff8052f40c > > stack pointer = 0x10:0xffffff82056819d0 > > frame pointer = 0x10:0xffffff82056819f0 > > code segment = base 0x0, limit 0xfffff, type 0x1b > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > processor eflags = interrupt enabled, resume, IOPL = 0 > > current process = 65 (igb1 que) > > trap number = 9 > > panic: general protection fault > > cpuid = 2 > > KDB: stack backtrace: > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > > panic() at panic+0x182 > > trap_fatal() at trap_fatal+0x294 > > trap() at trap+0x106 > > calltrap() at calltrap+0x8 > > --- trap 0x9, rip = 0xffffffff8052f40c, rsp = 0xffffff82056819d0, rbp = 0xffffff82056819f0 --- m_tag_delete_chain() at m_tag_delete_chain+0x1c > > uma_zfree_arg() at uma_zfree_arg+0x41 > > m_freem() at m_freem+0x54 > > ether_demux() at ether_demux+0x85 > > ether_input() at ether_input+0x1bb > > igb_rxeof() at igb_rxeof+0x29d > > igb_handle_que() at igb_handle_que+0x9a > > taskqueue_run() at taskqueue_run+0xac > > taskqueue_thread_loop() at taskqueue_thread_loop+0x46 > > fork_exit() at fork_exit+0x122 > > fork_trampoline() at fork_trampoline+0xe > > --- trap 0, rip = 0, rsp = 0xffffff8205681d30, rbp = 0 --- > > Uptime: 11h57m6s > > Physical memory: 18411 MB > > Dumping 3770 MB: > > > > Fatal trap 12: page fault while in kernel mode > > cpuid = 0; apic id = 00 > > fault virtual address = 0x8000000000 > > fault code = supervisor write data, page not present > > instruction pointer = 0x8:0xffffffff80188b5f > > stack pointer = 0x10:0xffffff82056811f0 > > frame pointer = 0x10:0xffffff82056812f0 > > code segment = base 0x0, limit 0xfffff, type 0x1b > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > processor eflags = interrupt enabled, resume, IOPL = 0 > > current process = 65 (igb1 que) > > trap number = 12 > > > > > > pciconf: > > ======= > > > > igb0@pci0:10:0:0: class=0x020000 card=0x10c915d9 chip=0x10c98086 rev=0x01 hdr=0x00 > > vendor = 'Intel Corporation' > > class = network > > subclass = ethernet > > igb1@pci0:10:0:1: class=0x020000 card=0x10c915d9 chip=0x10c98086 rev=0x01 hdr=0x00 > > vendor = 'Intel Corporation' > > class = network > > subclass = ethernet > > > > > > dmesg: > > ===== > > > > igb0: port 0xe880-0xe89f mem 0xfbe60000-0xfbe > > 7ffff,0xfbe40000-0xfbe5ffff,0xfbeb8000-0xfbebbfff irq 16 at device 0.0 on pci10 > > igb0: Using MSIX interrupts with 10 vectors > > igb0: [ITHREAD] > > igb0: [ITHREAD] > > igb0: [ITHREAD] > > igb0: [ITHREAD] > > igb0: [ITHREAD] > > igb0: [ITHREAD] > > igb0: [ITHREAD] > > igb0: [ITHREAD] > > igb0: [ITHREAD] > > igb0: [ITHREAD] > > igb0: Ethernet address: 00:30:48:ca:cd:72 > > igb1: port 0xec00-0xec1f mem 0xfbee0000-0xfbe > > fffff,0xfbec0000-0xfbedffff,0xfbebc000-0xfbebffff irq 17 at device 0.1 on pci10 > > igb1: Using MSIX interrupts with 10 vectors > > igb1: [ITHREAD] > > igb1: [ITHREAD] > > igb1: [ITHREAD] > > igb1: [ITHREAD] > > igb1: [ITHREAD] > > igb1: [ITHREAD] > > igb1: [ITHREAD] > > igb1: [ITHREAD] > > igb1: [ITHREAD] > > igb1: [ITHREAD] > > igb1: Ethernet address: 00:30:48:ca:cd:73 Adding Jack Vogel of Intel and Yong-Hyeon PYUN to the mix... I don't know if this is possible for you to do, but do you see the same problem when running 8.1-STABLE? I know there has been a lot of positive work on igb(4) in RELENG_8, but not too many of the fixes and improvements are backported to RELENG_7. http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/e1000/if_igb.c Be sure to check out Revision 1.54 there (which is for HEAD/CURRENT, but I'm not sure if it's been backported/incorporated in some other way). Otherwise, as a test/workaround you might try disabling MSI-X support entirely to see if there's any improvement. This could degrade system performance a bit (under heavy interrupt load). In /boot/loader.conf, set hw.pci.enable_msix="0" and reboot. If there's no improvement, be sure to remove this. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |