From owner-freebsd-bugs@FreeBSD.ORG  Wed Mar 30 06:06:32 2005
Return-Path: <owner-freebsd-bugs@FreeBSD.ORG>
Delivered-To: freebsd-bugs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id DE36616A4CE; Wed, 30 Mar 2005 06:06:32 +0000 (GMT)
Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.85])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 3B45343D46; Wed, 30 Mar 2005 06:06:32 +0000 (GMT)
	(envelope-from bde@zeta.org.au)
Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au
	[61.8.0.87])j2U66VHn023711;	Wed, 30 Mar 2005 16:06:31 +1000
Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246])
	j2U66SMq023864;	Wed, 30 Mar 2005 16:06:29 +1000
Date: Wed, 30 Mar 2005 16:06:28 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@delplex.bde.org
To: Oleg Tarasov <subscriber@osk.com.ua>
In-Reply-To: <1101884216.20050323181742@osk.com.ua>
Message-ID: <20050330155502.E16886@delplex.bde.org>
References: <815955888.20050323113529@osk.com.ua>
	<20050323235823.E19701@epsplex.bde.org>
	<1101884216.20050323181742@osk.com.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
cc: freebsd-bugs@freebsd.org
cc: jhb@freebsd.org
Subject: Re: sio interrupt-level buffer overflows
X-BeenThere: freebsd-bugs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Bug reports <freebsd-bugs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
	<mailto:freebsd-bugs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-bugs>
List-Post: <mailto:freebsd-bugs@freebsd.org>
List-Help: <mailto:freebsd-bugs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
	<mailto:freebsd-bugs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Mar 2005 06:06:33 -0000

On Wed, 23 Mar 2005, Oleg Tarasov wrote:

> About my panics. They persist and when this server panics it somehow
> overloads my network so it stops functioning until reboot. This is
> very, very bad.
>
> Maybe you could tell me where to write, or you could
> personally tell me what should I do.
>
> Using all my theoretical skills I have come to this data I could
> obtain from my dump:
>
> (kgdb) backtrace
> #0  doadump () at pcpu.h:159
> #1  0xc060b063 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:397
> #2  0xc060b389 in panic (fmt=0xc080321d "spin lock held too long")
>    at /usr/src/sys/kern/kern_shutdown.c:553
> #3  0xc060270c in _mtx_lock_spin (m=0xc08d7800, td=0xc19ca320, opts=0,
>    file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:613
> #4  0xc077c165 in siointr (arg=0xc1ab8800) at /usr/src/sys/dev/sio/sio.c:1710
> #5  0xc0790ead in intr_execute_handlers (isrc=0xc19b8890, iframe=0xd541ac94)
>    at /usr/src/sys/i386/i386/intr_machdep.c:203
> #6  0xc07932be in lapic_handle_intr (frame=
>      {if_vec = 52, if_fs = -717160424, if_es = -1067384816, if_ds = 16, if_edi
> = -1046699232, if_esi = -1064591424, if_ebp = -717116188, if_ebx = -1046425600,
> if_edx = -1064566184, if_ecx = 0, if_eax = -1046425600, if_eip = -1067440569, if
> _cs = 8, if_eflags = 582, if_esp = -1045200000, if_ss = 4})
>    at /usr/src/sys/i386/i386/local_apic.c:490
> #7  0xc078d753 in Xapic_isr1 () at apic_vector.s:110
> #8  0x00000034 in ?? ()
> #9  0xd5410018 in ?? ()
> #10 0xc0610010 in coredump (td=0xc08b9fc0) at vnode_if.h:1244
> #11 0xc05f6f46 in ithread_loop (arg=0xc1981c80)
>    at /usr/src/sys/kern/kern_intr.c:546
> #12 0xc05f6001 in fork_exit (callout=0xc05f6df8 <ithread_loop>,
>    arg=0xc1981c80, frame=0xd541ad48) at /usr/src/sys/kern/kern_fork.c:811
> #13 0xc078d3fc in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:209
> ...

I couldn't figure out the problem from this.  Your later mail says that
the problem is caused by ppp not being MPSAFE, at least with sio, so I
won't do much more with this stack trace, but I wonder about some of the
strange entries in it:

#13 - #11 are normal.
#10 is weird.  ithread_loop() shouldn't call coredump().
#8 - #9 seem to be more like stack garbage than module addresses.
#7 is normal, but it looks like someone broke stack traces for interrupts,
    giving the garbage in #8 - #10.
#0 - #6 are normal if the spin lock is already held by the same CPU that
    is handling the interrupt (except this can't happen :-).  I wouldn't
    have thought that broken locking in ppp could cause this.

Bruce