Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 22 Jul 2002 00:15:17 -0700
From:      Peter Wemm <peter@wemm.org>
To:        current@freebsd.org
Subject:   Is it just me or has -current suddenly got massively unstable?
Message-ID:  <20020722071517.D0C733924@overcee.wemm.org>

next in thread | raw e-mail | index | archive | help
It might be just me because I swapped an ISA 'si' card for a PCI version, but
the problems I've been seeing are pretty spectacular.  I'm regularly seeing
the following panics:

- selwakeup() taking fatal traps (always while running postfix/smtpd,
presumably this is happening during the traditional 'select collision'
window - the locking looks rather suspect there too).  This killed my box
3 times today alone.

eg:
Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0xc44a01b4
fault code              = supervisor write, page not present
instruction pointer     = 0x8:0xc027f945
current process         = 4078 (smtpd)
trap number             = 12

#10 0xc025ed8b in panic ()
#11 0xc03758d3 in trap_fatal ()
#12 0xc03755b2 in trap_pfault ()
#13 0xc03750dd in trap ()
#14 0xc027f945 in selwakeup ()
#15 0xc02953f9 in sowakeup ()
#16 0xc0294f60 in soisconnected ()
#17 0xc029a8ad in unp_connect2 ()
#18 0xc029a803 in unp_connect ()
#19 0xc02998ee in uipc_connect ()
#20 0xc0292d8a in soconnect ()
#21 0xc0296c8e in connect ()
#22 0xc0375c11 in syscall ()

This is happening on this line:

1182            if (td == NULL) {
1183                    mtx_unlock(&sellock);
1184                    return;
1185            }
1186 >>>HERE>>> TAILQ_REMOVE(&td->td_selq, sip, si_thrlist);
1187            sip->si_thread = NULL;
1188            mtx_lock_spin(&sched_lock);
1189            if (td->td_wchan == (caddr_t)&selwait) {
1190                    if (td->td_state == TDS_SLP)

All of these panics have been at this identical location -it isn't random.
I briefly went looking and I'm wondering if the locking is adequate here.

- random compiler segfaults

- vdrop/vrele panics

eg:

panic: vdrop: holdcnt

#2  0xc026190b in panic () at ../../../kern/kern_shutdown.c:493
#3  0xc02ae4bb in vdrop (vp=0x0) at ../../../kern/vfs_subr.c:1986
#4  0xc02a33d9 in cache_zap (ncp=0xc03ce03b) at ../../../kern/vfs_cache.c:241
#5  0xc02a393a in cache_enter (dvp=0xc4196e70, vp=0x0, cnp=0xc5c8c540)
    at ../../../kern/vfs_cache.c:452
#6  0xc03225e9 in ufs_lookup (ap=0xda6d2ac0)
    at ../../../ufs/ufs/ufs_lookup.c:457
#7  0xc0328e58 in ufs_vnoperate (ap=0x0) at ../../../ufs/ufs/ufs_vnops.c:2739
#8  0xc02a3d6c in vfs_cache_lookup (ap=0x0) at vnode_if.h:73
#9  0xc0328e58 in ufs_vnoperate (ap=0x0) at ../../../ufs/ufs/ufs_vnops.c:2739
#10 0xc02a801b in lookup (ndp=0xda6d2c24) at vnode_if.h:48
#11 0xc02a7a2e in namei (ndp=0xda6d2c24) at ../../../kern/vfs_lookup.c:175
#12 0xc02b30d2 in lstat (td=0xc5c8c540, uap=0xda6d2d10)
    at ../../../kern/vfs_syscalls.c:1536
#13 0xc0378be1 in syscall (frame=
      {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 0, tf_esi = -1077943328, tf_ebp = -1077943384, tf_isp = -630379148, tf_ebx = -1077943328, tf_edx = -1077943320, tf_ecx = 47, tf_eax = 190, tf_trapno = 12, tf_err = 2, tf_eip = 134629535, tf_cs = 31, tf_eflags = 518, tf_esp = -1077944580, tf_ss = 47})
    at ../../../i386/i386/trap.c:1049

I do not have a -g kernel for this one, sorry.  The vdrop(vp=0x0) traceback
is clearly wrong there though, I'm pretty sure that it is because
of the missing -g info (gdb knows where the temporary copies are with
-g and dwarf2)

- All sorts of other very strange things today.  I missed a few crashdumps
due to full disk.  I'm getting panics just trying to extract tarballs or
compiling largish programs.

Has anybody else been running into this?  I've had most of it happen today,
except for two or three selwakeup() panics over the last few days. The
really bad stuff seemed to start today.  It might be coincidence that today
I also moved that card around.

ie this:
si0 at iomem 0xd8000-0xdffff irq 12 on isa0
si0: SIHOST2 - no ports found

became this:
si0: <Specialix SX PCI host card> port 0x9400-0x947f mem 0xfc100000-0xfc10ffff,0
xfc112000-0xfc11207f irq 9 at device 9.0 on pci0
si0: card: SXPCI, ports: 8, modules: 1, type: 8

Hmm.

Anyway, has anybody else seen this sort of thing today?

Cheers,
-Peter
--
Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com
"All of this is for nothing if we don't go to the stars" - JMS/B5


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020722071517.D0C733924>