From owner-freebsd-current Mon Jul 22 0:15:27 2002 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 08FA237B400 for ; Mon, 22 Jul 2002 00:15:20 -0700 (PDT) Received: from canning.wemm.org (canning.wemm.org [192.203.228.65]) by mx1.FreeBSD.org (Postfix) with ESMTP id B80BB43E31 for ; Mon, 22 Jul 2002 00:15:18 -0700 (PDT) (envelope-from peter@wemm.org) Received: from fw.wemm.org (canning.wemm.org [192.203.228.65]) by canning.wemm.org (Postfix) with ESMTP id 2C4A92A7EA for ; Mon, 22 Jul 2002 00:15:18 -0700 (PDT) (envelope-from peter@wemm.org) Received: from overcee.wemm.org (overcee.wemm.org [10.0.0.3]) by fw.wemm.org (Postfix) with ESMTP id D9EA24C284 for ; Mon, 22 Jul 2002 00:15:17 -0700 (PDT) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by overcee.wemm.org (Postfix) with ESMTP id D0C733924 for ; Mon, 22 Jul 2002 00:15:17 -0700 (PDT) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: current@freebsd.org Subject: Is it just me or has -current suddenly got massively unstable? Date: Mon, 22 Jul 2002 00:15:17 -0700 From: Peter Wemm Message-Id: <20020722071517.D0C733924@overcee.wemm.org> Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG It might be just me because I swapped an ISA 'si' card for a PCI version, but the problems I've been seeing are pretty spectacular. I'm regularly seeing the following panics: - selwakeup() taking fatal traps (always while running postfix/smtpd, presumably this is happening during the traditional 'select collision' window - the locking looks rather suspect there too). This killed my box 3 times today alone. eg: Fatal trap 12: page fault while in kernel mode fault virtual address = 0xc44a01b4 fault code = supervisor write, page not present instruction pointer = 0x8:0xc027f945 current process = 4078 (smtpd) trap number = 12 #10 0xc025ed8b in panic () #11 0xc03758d3 in trap_fatal () #12 0xc03755b2 in trap_pfault () #13 0xc03750dd in trap () #14 0xc027f945 in selwakeup () #15 0xc02953f9 in sowakeup () #16 0xc0294f60 in soisconnected () #17 0xc029a8ad in unp_connect2 () #18 0xc029a803 in unp_connect () #19 0xc02998ee in uipc_connect () #20 0xc0292d8a in soconnect () #21 0xc0296c8e in connect () #22 0xc0375c11 in syscall () This is happening on this line: 1182 if (td == NULL) { 1183 mtx_unlock(&sellock); 1184 return; 1185 } 1186 >>>HERE>>> TAILQ_REMOVE(&td->td_selq, sip, si_thrlist); 1187 sip->si_thread = NULL; 1188 mtx_lock_spin(&sched_lock); 1189 if (td->td_wchan == (caddr_t)&selwait) { 1190 if (td->td_state == TDS_SLP) All of these panics have been at this identical location -it isn't random. I briefly went looking and I'm wondering if the locking is adequate here. - random compiler segfaults - vdrop/vrele panics eg: panic: vdrop: holdcnt #2 0xc026190b in panic () at ../../../kern/kern_shutdown.c:493 #3 0xc02ae4bb in vdrop (vp=0x0) at ../../../kern/vfs_subr.c:1986 #4 0xc02a33d9 in cache_zap (ncp=0xc03ce03b) at ../../../kern/vfs_cache.c:241 #5 0xc02a393a in cache_enter (dvp=0xc4196e70, vp=0x0, cnp=0xc5c8c540) at ../../../kern/vfs_cache.c:452 #6 0xc03225e9 in ufs_lookup (ap=0xda6d2ac0) at ../../../ufs/ufs/ufs_lookup.c:457 #7 0xc0328e58 in ufs_vnoperate (ap=0x0) at ../../../ufs/ufs/ufs_vnops.c:2739 #8 0xc02a3d6c in vfs_cache_lookup (ap=0x0) at vnode_if.h:73 #9 0xc0328e58 in ufs_vnoperate (ap=0x0) at ../../../ufs/ufs/ufs_vnops.c:2739 #10 0xc02a801b in lookup (ndp=0xda6d2c24) at vnode_if.h:48 #11 0xc02a7a2e in namei (ndp=0xda6d2c24) at ../../../kern/vfs_lookup.c:175 #12 0xc02b30d2 in lstat (td=0xc5c8c540, uap=0xda6d2d10) at ../../../kern/vfs_syscalls.c:1536 #13 0xc0378be1 in syscall (frame= {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 0, tf_esi = -1077943328, tf_ebp = -1077943384, tf_isp = -630379148, tf_ebx = -1077943328, tf_edx = -1077943320, tf_ecx = 47, tf_eax = 190, tf_trapno = 12, tf_err = 2, tf_eip = 134629535, tf_cs = 31, tf_eflags = 518, tf_esp = -1077944580, tf_ss = 47}) at ../../../i386/i386/trap.c:1049 I do not have a -g kernel for this one, sorry. The vdrop(vp=0x0) traceback is clearly wrong there though, I'm pretty sure that it is because of the missing -g info (gdb knows where the temporary copies are with -g and dwarf2) - All sorts of other very strange things today. I missed a few crashdumps due to full disk. I'm getting panics just trying to extract tarballs or compiling largish programs. Has anybody else been running into this? I've had most of it happen today, except for two or three selwakeup() panics over the last few days. The really bad stuff seemed to start today. It might be coincidence that today I also moved that card around. ie this: si0 at iomem 0xd8000-0xdffff irq 12 on isa0 si0: SIHOST2 - no ports found became this: si0: port 0x9400-0x947f mem 0xfc100000-0xfc10ffff,0 xfc112000-0xfc11207f irq 9 at device 9.0 on pci0 si0: card: SXPCI, ports: 8, modules: 1, type: 8 Hmm. Anyway, has anybody else seen this sort of thing today? Cheers, -Peter -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message