Date: Tue, 23 Jul 2002 00:07:04 -0700 From: Peter Wemm <peter@wemm.org> To: Yann Berthier <yb@sainte-barbe.org> Cc: current@freebsd.org, alfred@freebsd.org Subject: Re: Is it just me or has -current suddenly got massively unstable? Message-ID: <20020723070704.7B4CB3925@overcee.wemm.org> In-Reply-To: <20020722101211.GA442@hsc.fr>
next in thread | previous in thread | raw e-mail | index | archive | help
Yann Berthier wrote: > On Mon, 22 Jul 2002, Peter Wemm wrote: > > > It might be just me because I swapped an ISA 'si' card for a PCI version, b ut > > the problems I've been seeing are pretty spectacular. I'm regularly seeing > > the following panics: > > > > - selwakeup() taking fatal traps (always while running postfix/smtpd, > > presumably this is happening during the traditional 'select collision' > > window - the locking looks rather suspect there too). This killed my box > > 3 times today alone. > > > > eg: > > Fatal trap 12: page fault while in kernel mode > > fault virtual address = 0xc44a01b4 > > fault code = supervisor write, page not present > > instruction pointer = 0x8:0xc027f945 > > current process = 4078 (smtpd) > > trap number = 12 > > Same here: 2 panics with a kernel from today while running > postfix/smtpd. > > Sorry, I have no more info to give for now though Thanks for the independent confirmation. Here's a workaround patch that you might like to try: --- kern_thread.c 17 Jul 2002 23:43:55 -0000 1.8 +++ kern_thread.c 22 Jul 2002 23:31:06 -0000 @@ -198,7 +198,7 @@ thread_zone = uma_zcreate("THREAD", sizeof (struct thread), thread_ctor, thread_dtor, thread_init, thread_fini, - UMA_ALIGN_CACHE, 0); + UMA_ALIGN_CACHE, UMA_ZONE_NOFREE); } /* I haven't paniced yet with that change. :-) For some unknown reason, selwakeup() is dereferencing pointers to threads that have long gone and the backing store has been freed. The patch above is a bandaid, not a solution. It basically prevents threads ever being freed back to the general pool, even though everything here supposedly does not need that. (unlike struct proc and socket, for example). peter@overcee[11:57pm]/home/crash-105# gdb -k kernel.12 vmcore.12 ... Fatal trap 12: page fault while in kernel mode fault virtual address = 0xc29b0634 fault code = supervisor write, page not present instruction pointer = 0x8:0xc0257755 current process = 1411 (smtpd) ... (kgdb) l *0xc0257755 0xc0257755 is in selwakeup (../../../kern/sys_generic.c:1186). 1181 } 1182 if (td == NULL) { 1183 mtx_unlock(&sellock); 1184 return; 1185 } 1186 TAILQ_REMOVE(&td->td_selq, sip, si_thrlist); 1187 sip->si_thread = NULL; 1188 mtx_lock_spin(&sched_lock); 1189 if (td->td_wchan == (caddr_t)&selwait) { 1190 if (td->td_state == TDS_SLP) #5 0xc034c68d in trap (frame= {tf_fs = -1069613032, tf_es = 16, tf_ds = -1070006256, tf_edi = 0, tf_esi = -1034848204, tf_ebp = -630072692, tf_isp = -630072736, tf_ebx = -1030027776, tf_edx = -1030911744, tf_ecx = 1, tf_eax = -1030027728, tf_trapno = 12, tf_err = 2, tf_eip = -1071286443, tf_cs = 8, tf_eflags = 66118, tf_esp = -1069571036, tf_ss = 0}) at ../../../i386/i386/trap.c:445 #6 0xc0257755 in selwakeup (sip=0xc2517834) at ../../../kern/sys_generic.c:1186 #7 0xc026d249 in sowakeup (so=0xc25177d0, sb=0xc251781c) at ../../../kern/uipc_socket2.c:300 #8 0xc026cdb0 in soisconnected (so=0xc2750bb8) at ../../../kern/uipc_socket2.c:132 #9 0xc02726fd in unp_connect2 (so=0xc30a3190, so2=0xc2750bb8) at ../../../kern/uipc_usrreq.c:769 #10 0xc0272653 in unp_connect (so=0xc30a3190, nam=0xc4359d00, td=0xc30a3190) at ../../../kern/uipc_usrreq.c:737 #11 0xc027173e in uipc_connect (so=0x0, nam=0x0, td=0xc28d8900) at ../../../kern/uipc_usrreq.c:161 #12 0xc026abda in soconnect (so=0xc263c630, nam=0x0, td=0x0) at ../../../kern/uipc_socket.c:429 #13 0xc026eade in connect (td=0xc30a3190, uap=0xc2750bb8) at ../../../kern/uipc_syscalls.c:441 #14 0xc034d1c1 in syscall (frame= {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 11, tf_esi = 0, tf_ebp = -1077938236, tf_isp = -630071948, tf_ebx = 134708840, tf_edx = -1077938342, tf_ecx = 0, tf_eax = 98, tf_trapno = 22, tf_err = 2, tf_eip = 671906955, tf_cs = 31, tf_eflags = 663, tf_esp = -1077938408, tf_ss = 47}) at ../../../i386/i386/trap.c:1049 I've checked the page tables, it is indeed unmapped. Also note that this is in the guts of the unix domain socket code. :-] (kgdb) peter@overcee[11:58pm]/home/crash-110# gdb -k kernel.10 vmcore.10 Fatal trap 12: page fault while in kernel mode fault virtual address = 0xc44a01b4 fault code = supervisor write, page not present instruction pointer = 0x8:0xc027f945 current process = 4078 (smtpd) [..] #13 0xc03750dd in trap () #14 0xc027f945 in selwakeup () #15 0xc02953f9 in sowakeup () #16 0xc0294f60 in soisconnected () #17 0xc029a8ad in unp_connect2 () #18 0xc029a803 in unp_connect () #19 0xc02998ee in uipc_connect () #20 0xc0292d8a in soconnect () #21 0xc0296c8e in connect () #22 0xc0375c11 in syscall () Interestingly, the stack trace is identical on both of these that I managed to capture. Cheers, -Peter -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020723070704.7B4CB3925>