From owner-freebsd-current  Mon Jul 22  0:15:27 2002
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 08FA237B400
	for <current@freebsd.org>; Mon, 22 Jul 2002 00:15:20 -0700 (PDT)
Received: from canning.wemm.org (canning.wemm.org [192.203.228.65])
	by mx1.FreeBSD.org (Postfix) with ESMTP id B80BB43E31
	for <current@freebsd.org>; Mon, 22 Jul 2002 00:15:18 -0700 (PDT)
	(envelope-from peter@wemm.org)
Received: from fw.wemm.org (canning.wemm.org [192.203.228.65])
	by canning.wemm.org (Postfix) with ESMTP id 2C4A92A7EA
	for <current@freebsd.org>; Mon, 22 Jul 2002 00:15:18 -0700 (PDT)
	(envelope-from peter@wemm.org)
Received: from overcee.wemm.org (overcee.wemm.org [10.0.0.3])
	by fw.wemm.org (Postfix) with ESMTP id D9EA24C284
	for <current@freebsd.org>; Mon, 22 Jul 2002 00:15:17 -0700 (PDT)
	(envelope-from peter@wemm.org)
Received: from wemm.org (localhost [127.0.0.1])
	by overcee.wemm.org (Postfix) with ESMTP id D0C733924
	for <current@freebsd.org>; Mon, 22 Jul 2002 00:15:17 -0700 (PDT)
	(envelope-from peter@wemm.org)
X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4
To: current@freebsd.org
Subject: Is it just me or has -current suddenly got massively unstable?
Date: Mon, 22 Jul 2002 00:15:17 -0700
From: Peter Wemm <peter@wemm.org>
Message-Id: <20020722071517.D0C733924@overcee.wemm.org>
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG

It might be just me because I swapped an ISA 'si' card for a PCI version, but
the problems I've been seeing are pretty spectacular.  I'm regularly seeing
the following panics:

- selwakeup() taking fatal traps (always while running postfix/smtpd,
presumably this is happening during the traditional 'select collision'
window - the locking looks rather suspect there too).  This killed my box
3 times today alone.

eg:
Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0xc44a01b4
fault code              = supervisor write, page not present
instruction pointer     = 0x8:0xc027f945
current process         = 4078 (smtpd)
trap number             = 12

#10 0xc025ed8b in panic ()
#11 0xc03758d3 in trap_fatal ()
#12 0xc03755b2 in trap_pfault ()
#13 0xc03750dd in trap ()
#14 0xc027f945 in selwakeup ()
#15 0xc02953f9 in sowakeup ()
#16 0xc0294f60 in soisconnected ()
#17 0xc029a8ad in unp_connect2 ()
#18 0xc029a803 in unp_connect ()
#19 0xc02998ee in uipc_connect ()
#20 0xc0292d8a in soconnect ()
#21 0xc0296c8e in connect ()
#22 0xc0375c11 in syscall ()

This is happening on this line:

1182            if (td == NULL) {
1183                    mtx_unlock(&sellock);
1184                    return;
1185            }
1186 >>>HERE>>> TAILQ_REMOVE(&td->td_selq, sip, si_thrlist);
1187            sip->si_thread = NULL;
1188            mtx_lock_spin(&sched_lock);
1189            if (td->td_wchan == (caddr_t)&selwait) {
1190                    if (td->td_state == TDS_SLP)

All of these panics have been at this identical location -it isn't random.
I briefly went looking and I'm wondering if the locking is adequate here.

- random compiler segfaults

- vdrop/vrele panics

eg:

panic: vdrop: holdcnt

#2  0xc026190b in panic () at ../../../kern/kern_shutdown.c:493
#3  0xc02ae4bb in vdrop (vp=0x0) at ../../../kern/vfs_subr.c:1986
#4  0xc02a33d9 in cache_zap (ncp=0xc03ce03b) at ../../../kern/vfs_cache.c:241
#5  0xc02a393a in cache_enter (dvp=0xc4196e70, vp=0x0, cnp=0xc5c8c540)
    at ../../../kern/vfs_cache.c:452
#6  0xc03225e9 in ufs_lookup (ap=0xda6d2ac0)
    at ../../../ufs/ufs/ufs_lookup.c:457
#7  0xc0328e58 in ufs_vnoperate (ap=0x0) at ../../../ufs/ufs/ufs_vnops.c:2739
#8  0xc02a3d6c in vfs_cache_lookup (ap=0x0) at vnode_if.h:73
#9  0xc0328e58 in ufs_vnoperate (ap=0x0) at ../../../ufs/ufs/ufs_vnops.c:2739
#10 0xc02a801b in lookup (ndp=0xda6d2c24) at vnode_if.h:48
#11 0xc02a7a2e in namei (ndp=0xda6d2c24) at ../../../kern/vfs_lookup.c:175
#12 0xc02b30d2 in lstat (td=0xc5c8c540, uap=0xda6d2d10)
    at ../../../kern/vfs_syscalls.c:1536
#13 0xc0378be1 in syscall (frame=
      {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 0, tf_esi = -1077943328, tf_ebp = -1077943384, tf_isp = -630379148, tf_ebx = -1077943328, tf_edx = -1077943320, tf_ecx = 47, tf_eax = 190, tf_trapno = 12, tf_err = 2, tf_eip = 134629535, tf_cs = 31, tf_eflags = 518, tf_esp = -1077944580, tf_ss = 47})
    at ../../../i386/i386/trap.c:1049

I do not have a -g kernel for this one, sorry.  The vdrop(vp=0x0) traceback
is clearly wrong there though, I'm pretty sure that it is because
of the missing -g info (gdb knows where the temporary copies are with
-g and dwarf2)

- All sorts of other very strange things today.  I missed a few crashdumps
due to full disk.  I'm getting panics just trying to extract tarballs or
compiling largish programs.

Has anybody else been running into this?  I've had most of it happen today,
except for two or three selwakeup() panics over the last few days. The
really bad stuff seemed to start today.  It might be coincidence that today
I also moved that card around.

ie this:
si0 at iomem 0xd8000-0xdffff irq 12 on isa0
si0: SIHOST2 - no ports found

became this:
si0: <Specialix SX PCI host card> port 0x9400-0x947f mem 0xfc100000-0xfc10ffff,0
xfc112000-0xfc11207f irq 9 at device 9.0 on pci0
si0: card: SXPCI, ports: 8, modules: 1, type: 8

Hmm.

Anyway, has anybody else seen this sort of thing today?

Cheers,
-Peter
--
Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com
"All of this is for nothing if we don't go to the stars" - JMS/B5


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message