From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 21 01:00:49 2005 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A62C116A425 for ; Thu, 21 Jul 2005 01:00:49 +0000 (GMT) (envelope-from edwin@verolan.com) Received: from ns11.webmasters.com (ns11.webmasters.com [66.118.156.2]) by mx1.FreeBSD.org (Postfix) with SMTP id 2823A43D48 for ; Thu, 21 Jul 2005 01:00:46 +0000 (GMT) (envelope-from edwin@verolan.com) Received: (qmail 13623 invoked from network); 21 Jul 2005 00:57:38 -0000 Received: from unknown (HELO localhost.localdomain) (204.9.60.14) by ns11.webmasters.com with SMTP; 21 Jul 2005 00:57:38 -0000 Received: from localhost.localdomain (asx01 [127.0.0.1]) by localhost.localdomain (8.13.1/8.13.1) with ESMTP id j6L10hBs005399; Wed, 20 Jul 2005 21:00:43 -0400 Received: (from edwin@localhost) by localhost.localdomain (8.13.1/8.13.1/Submit) id j6L10dWt005398; Wed, 20 Jul 2005 21:00:39 -0400 Date: Wed, 20 Jul 2005 21:00:39 -0400 From: Edwin To: freebsd-hackers@freebsd.org Message-ID: <20050721010039.GA5310@asx01.verolan.com> References: <20050719034215.GB20752@asx01.verolan.com> <200507191120.37526.jhb@FreeBSD.org> <20050720020302.GA24474@asx01.verolan.com> <20050720100623.GA1470@beatrix.daedalusnetworks.priv> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050720100623.GA1470@beatrix.daedalusnetworks.priv> User-Agent: Mutt/1.4.1i X-Operating-System: Linux/(i686) Cc: Edwin Subject: Re: help w/panic under heavy load - 5.4 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Jul 2005 01:00:49 -0000 Giorgos/John/et.al :) I have compiled/tested/traced about 15 separate kernels for this, and am happy to provide crashdumps/etc to anyone interested :) I decided to start over - create a GENERIC kernel (w/ DDB/KDB/INVARIANTS/INVARIANT_SUPPORT) and see what I started to get if I could reproduce the problem more specifically. Just using the GENERIC w/ debug kernel - I did make it crash - although it took some handholding, lots of throwing packets at it and running processes on the box, about 5-10 minutes - didn't really try to reproduce it - since it really wasn't the fast panic that I was concerned about before. i've included the panic below here anyhow. What I did notice - was w/o any options - and turning on ip.fastforwarding via sysctl - the crash was reproducible consistently with the (pretty much) generic kernel, same kernel traces as before basically. I also received an 'interrupt storm' message on the console from the ip.fastforwarding trace - have seen that a few times in the past when polling was not enabled before it panic'd. I welcome all comments/thoughts/directions - happy to poke/prod/compile/debug - just really don't know where to go from here. Thanks for your help! /Edwin Kernel: DDB8-GENDBG (GENERIC + options DDB/KDB/INVARIANTS/INVARIANT_SUPPORT) sysctl: ip.fastforwarding=0 <--- turned off ospfd# panic: m_copym, offset > size of mbuf chain KDB: enter: panic [thread pid 27 tid 100021 ] Stopped at kdb_enter+0x2b: nop db> where Tracing pid 27 tid 100021 td 0xc0ed0180 kdb_enter(c0821a6a) at kdb_enter+0x2b panic(c0826049,0,c076b79c,c102bb00,100) at panic+0xbb m_copym(0,5dc,5c8,1,14) at m_copym+0x60 ip_fragment(c124100e,c76d1a04,5dc,0,1) at ip_fragment+0x214 ip_output(c1201200,0,c76d19d0,1,0,0) at ip_output+0x74c ip_forward(c1201200,0) at ip_forward+0x2d4 ip_input(c1201200) at ip_input+0x4a7 netisr_processqueue(c08ec138) at netisr_processqueue+0x6e swi_net(0) at swi_net+0xc2 ithread_loop(c0ec6580,c76d1d48,c0ec6580,c060030c,0) at ithread_loop+0x124 fork_exit(c060030c,c0ec6580,c76d1d48) at fork_exit+0xa4 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xc76d1d7c, ebp = 0 --- db> call doadump Dumping 128 MB 16 32 48 64 80 96 112 Dump complete 0xf db> Kernel: DDB8-GENDBG (GENERIC + options DDB/KDB/INVARIANTS/INVARIANT_SUPPORT) Sysctl: ip.fastforwarding=1 fb54c# Interrupt storm detected on "irq10: sis0 sis1+"; throttling interrupt source fb54c# fb54c# fb54c# fb54c# panic: m_copym, offset > size of mbuf chain KDB: enter: panic [thread pid 21 tid 100015 ] Stopped at kdb_enter+0x2b: nop db> where Tracing pid 21 tid 100015 td 0xc0ecc780 kdb_enter(c08165b2) at kdb_enter+0x2b panic(c081ab91,0,c0760a0c,c1028800,100) at panic+0xbb m_copym(0,5dc,5c8,1,14) at m_copym+0x60 ip_fragment(c121880e,c76bfc6c,5dc,0,1) at ip_fragment+0x214 ip_fastforward(c11f2600) at ip_fastforward+0x6ed ether_demux(c0f90000,c11f2600,52,c0f8b8d8,a) at ether_demux+0x259 ether_input(c0f90000,c11f2600,c0f902cc,0,c0826fc6) at ether_input+0x25d sis_rxeof(c0f90000) at sis_rxeof+0x18b sis_intr(c0f90000) at sis_intr+0xa3 ithread_loop(c0ec6880,c76bfd48,c0ec6880,c05feb3c,0) at ithread_loop+0x124 fork_exit(c05feb3c,c0ec6880,c76bfd48) at fork_exit+0xa4 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xc76bfd7c, ebp = 0 --- db> doadump No such command db> call doadump Dumping 128 MB 16 32 48 64 80 96 112 Dump complete 0xf db> reset . Giorgos Keramidas (keramida@freebsd.org) wrote: > On 2005-07-19 22:03, Edwin wrote: > > Hi John, > > > > Updated the kernel, same crash under load, looks like m is null, you're right. > > > > Not quite sure where to go from here. I'm happy to do the footwork - just still real > > hazy on the BSD kernel part of things. > > > > panic: m_copym, offset > size of mbuf chain > > KDB: enter: panic > > [thread pid 27 tid 100021 ] > > Stopped at kdb_enter+0x2b: nop > > db> where > > Tracing pid 27 tid 100021 td 0xc0ed0180 > > kdb_enter(c0821a6a) at kdb_enter+0x2b > > panic(c0826049,0,c076b79c,c102d600,100) at panic+0xbb > > m_copym(0,5dc,5c8,1,14) at m_copym+0x60 > > ip_fragment(c123180e,c76d1c38,5dc,0,1) at ip_fragment+0x214 > > ip_fastforward(c11fee00) at ip_fastforward+0x6ed > > ether_demux(c0f90000,c11fee00,52,c0f8aad0,1f) at ether_demux+0x259 > > ether_input(c0f90000,c11fee00,c0f902d0,0,c08336ab) at ether_input+0x25d > > sis_rxeof(c0f90000,1,5,c08e5500,c76d1ce0) at sis_rxeof+0x1ab > > sis_poll(c0f90000,0,5) at sis_poll+0x7f > > netisr_poll(0) at netisr_poll+0x188 > > swi_net(0) at swi_net+0x81 > > ithread_loop(c0ec6580,c76d1d48,c0ec6580,c060030c,0) at ithread_loop+0x124 > > fork_exit(c060030c,c0ec6580,c76d1d48) at fork_exit+0xa4 > > fork_trampoline() at fork_trampoline+0x8 > > --- trap 0x1, eip = 0, esp = 0xc76d1d7c, ebp = 0 --- > > Both tracebacks contain sis_poll() somewhere in the call stack? Are you > using POLLING? If yes, can you try without POLLING and see if the crash > can still be reproduced? > > - Giorgos >