From owner-freebsd-hackers@FreeBSD.ORG  Thu Jul 21 01:00:49 2005
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
X-Original-To: freebsd-hackers@freebsd.org
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id A62C116A425
	for <freebsd-hackers@freebsd.org>; Thu, 21 Jul 2005 01:00:49 +0000 (GMT)
	(envelope-from edwin@verolan.com)
Received: from ns11.webmasters.com (ns11.webmasters.com [66.118.156.2])
	by mx1.FreeBSD.org (Postfix) with SMTP id 2823A43D48
	for <freebsd-hackers@freebsd.org>; Thu, 21 Jul 2005 01:00:46 +0000 (GMT)
	(envelope-from edwin@verolan.com)
Received: (qmail 13623 invoked from network); 21 Jul 2005 00:57:38 -0000
Received: from unknown (HELO localhost.localdomain) (204.9.60.14)
	by ns11.webmasters.com with SMTP; 21 Jul 2005 00:57:38 -0000
Received: from localhost.localdomain (asx01 [127.0.0.1])
	by localhost.localdomain (8.13.1/8.13.1) with ESMTP id j6L10hBs005399; 
	Wed, 20 Jul 2005 21:00:43 -0400
Received: (from edwin@localhost)
	by localhost.localdomain (8.13.1/8.13.1/Submit) id j6L10dWt005398;
	Wed, 20 Jul 2005 21:00:39 -0400
Date: Wed, 20 Jul 2005 21:00:39 -0400
From: Edwin <edwin@verolan.com>
To: freebsd-hackers@freebsd.org
Message-ID: <20050721010039.GA5310@asx01.verolan.com>
References: <20050719034215.GB20752@asx01.verolan.com>
	<200507191120.37526.jhb@FreeBSD.org>
	<20050720020302.GA24474@asx01.verolan.com>
	<20050720100623.GA1470@beatrix.daedalusnetworks.priv>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20050720100623.GA1470@beatrix.daedalusnetworks.priv>
User-Agent: Mutt/1.4.1i
X-Operating-System: Linux/(i686)
Cc: Edwin <edwin@verolan.com>
Subject: Re: help w/panic under heavy load - 5.4
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Jul 2005 01:00:49 -0000

Giorgos/John/et.al :)

I have compiled/tested/traced about 15 separate kernels for this, and am happy
to provide crashdumps/etc to anyone interested :)

I decided to start over - create a GENERIC kernel 
(w/ DDB/KDB/INVARIANTS/INVARIANT_SUPPORT) and see what I started to get if I could
reproduce the problem more specifically.

Just using the GENERIC w/ debug kernel - I did make it crash - although it took some
handholding, lots of throwing packets at it and running processes on the box, about 
5-10 minutes - didn't really try to reproduce it - since it really wasn't the fast
panic that I was concerned about before. i've included the panic below here anyhow.

What I did notice - was w/o any options - and turning on ip.fastforwarding via
sysctl - the crash was reproducible consistently with the (pretty much) generic
kernel, same kernel traces as before basically. I also received an 'interrupt storm'
message on the console from the ip.fastforwarding trace - have seen that a few times
in the past when polling was not enabled before it panic'd.

I welcome all comments/thoughts/directions - happy to poke/prod/compile/debug - 
just really don't know where to go from here.

Thanks for your help!
/Edwin


Kernel: DDB8-GENDBG (GENERIC + options DDB/KDB/INVARIANTS/INVARIANT_SUPPORT)
sysctl: ip.fastforwarding=0 <--- turned off

ospfd# panic: m_copym, offset > size of mbuf chain
KDB: enter: panic
[thread pid 27 tid 100021 ]
Stopped at      kdb_enter+0x2b: nop
db> where
Tracing pid 27 tid 100021 td 0xc0ed0180
kdb_enter(c0821a6a) at kdb_enter+0x2b
panic(c0826049,0,c076b79c,c102bb00,100) at panic+0xbb
m_copym(0,5dc,5c8,1,14) at m_copym+0x60
ip_fragment(c124100e,c76d1a04,5dc,0,1) at ip_fragment+0x214
ip_output(c1201200,0,c76d19d0,1,0,0) at ip_output+0x74c
ip_forward(c1201200,0) at ip_forward+0x2d4
ip_input(c1201200) at ip_input+0x4a7
netisr_processqueue(c08ec138) at netisr_processqueue+0x6e
swi_net(0) at swi_net+0xc2
ithread_loop(c0ec6580,c76d1d48,c0ec6580,c060030c,0) at ithread_loop+0x124
fork_exit(c060030c,c0ec6580,c76d1d48) at fork_exit+0xa4
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xc76d1d7c, ebp = 0 ---
db> call doadump
Dumping 128 MB
 16 32 48 64 80 96 112
Dump complete
0xf
db>

Kernel: DDB8-GENDBG (GENERIC + options DDB/KDB/INVARIANTS/INVARIANT_SUPPORT)
Sysctl: ip.fastforwarding=1

fb54c# Interrupt storm detected on "irq10: sis0 sis1+"; throttling interrupt source
fb54c#
fb54c#
fb54c#
fb54c# panic: m_copym, offset > size of mbuf chain
KDB: enter: panic
[thread pid 21 tid 100015 ]
Stopped at      kdb_enter+0x2b: nop
db> where
Tracing pid 21 tid 100015 td 0xc0ecc780
kdb_enter(c08165b2) at kdb_enter+0x2b
panic(c081ab91,0,c0760a0c,c1028800,100) at panic+0xbb
m_copym(0,5dc,5c8,1,14) at m_copym+0x60
ip_fragment(c121880e,c76bfc6c,5dc,0,1) at ip_fragment+0x214
ip_fastforward(c11f2600) at ip_fastforward+0x6ed
ether_demux(c0f90000,c11f2600,52,c0f8b8d8,a) at ether_demux+0x259
ether_input(c0f90000,c11f2600,c0f902cc,0,c0826fc6) at ether_input+0x25d
sis_rxeof(c0f90000) at sis_rxeof+0x18b
sis_intr(c0f90000) at sis_intr+0xa3
ithread_loop(c0ec6880,c76bfd48,c0ec6880,c05feb3c,0) at ithread_loop+0x124
fork_exit(c05feb3c,c0ec6880,c76bfd48) at fork_exit+0xa4
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xc76bfd7c, ebp = 0 ---
db> doadump
No such command
db> call doadump
Dumping 128 MB
 16 32 48 64 80 96 112
Dump complete
0xf
db> reset

.


Giorgos Keramidas (keramida@freebsd.org) wrote:
> On 2005-07-19 22:03, Edwin <edwin@verolan.com> wrote:
> > Hi John,
> >
> > Updated the kernel, same crash under load, looks like m is null, you're right.
> >
> > Not quite sure where to go from here. I'm happy to do the footwork - just still real
> > hazy on the BSD kernel part of things.
> >
> > panic: m_copym, offset > size of mbuf chain
> > KDB: enter: panic
> > [thread pid 27 tid 100021 ]
> > Stopped at      kdb_enter+0x2b: nop
> > db> where
> > Tracing pid 27 tid 100021 td 0xc0ed0180
> > kdb_enter(c0821a6a) at kdb_enter+0x2b
> > panic(c0826049,0,c076b79c,c102d600,100) at panic+0xbb
> > m_copym(0,5dc,5c8,1,14) at m_copym+0x60
> > ip_fragment(c123180e,c76d1c38,5dc,0,1) at ip_fragment+0x214
> > ip_fastforward(c11fee00) at ip_fastforward+0x6ed
> > ether_demux(c0f90000,c11fee00,52,c0f8aad0,1f) at ether_demux+0x259
> > ether_input(c0f90000,c11fee00,c0f902d0,0,c08336ab) at ether_input+0x25d
> > sis_rxeof(c0f90000,1,5,c08e5500,c76d1ce0) at sis_rxeof+0x1ab
> > sis_poll(c0f90000,0,5) at sis_poll+0x7f
> > netisr_poll(0) at netisr_poll+0x188
> > swi_net(0) at swi_net+0x81
> > ithread_loop(c0ec6580,c76d1d48,c0ec6580,c060030c,0) at ithread_loop+0x124
> > fork_exit(c060030c,c0ec6580,c76d1d48) at fork_exit+0xa4
> > fork_trampoline() at fork_trampoline+0x8
> > --- trap 0x1, eip = 0, esp = 0xc76d1d7c, ebp = 0 ---
> 
> Both tracebacks contain sis_poll() somewhere in the call stack?  Are you
> using POLLING?  If yes, can you try without POLLING and see if the crash
> can still be reproduced?
> 
> - Giorgos
>