From owner-freebsd-current@FreeBSD.ORG Thu Sep 2 15:59:49 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 494DF16A4CE; Thu, 2 Sep 2004 15:59:49 +0000 (GMT) Received: from electra.cse.Buffalo.EDU (electra.cse.Buffalo.EDU [128.205.32.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id F11E043D39; Thu, 2 Sep 2004 15:59:48 +0000 (GMT) (envelope-from kensmith@cse.Buffalo.EDU) Received: from electra.cse.Buffalo.EDU (kensmith@localhost [127.0.0.1]) i82FxmTH013424; Thu, 2 Sep 2004 11:59:48 -0400 (EDT) Received: (from kensmith@localhost) by electra.cse.Buffalo.EDU (8.12.10/8.12.9/Submit) id i82FxlwH013422; Thu, 2 Sep 2004 11:59:47 -0400 (EDT) Date: Thu, 2 Sep 2004 11:59:47 -0400 From: Ken Smith To: Kris Kennaway Message-ID: <20040902155947.GA12006@electra.cse.Buffalo.EDU> References: <200407151424.i6FEOdoq060881@fledge.watson.org> <20040715220447.GA32888@xor.obsecurity.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20040715220447.GA32888@xor.obsecurity.org> User-Agent: Mutt/1.4.1i cc: re@FreeBSD.org cc: current@FreeBSD.org Subject: Re: 5.3-RELEASE TODO X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Sep 2004 15:59:49 -0000 On Thu, Jul 15, 2004 at 03:04:47PM -0700, Kris Kennaway wrote: > These are the bugs I'm currently tracking (those I can remember right > now, at least) > > * SMP is unusable for me because of the following frequent panic > (actually a panic and another kernel printf interleaved). Here is the > untangled version: > > panic: APIC: Previous IPI is stu c k > p m a > _ l a z y f i x : s p > u c p u i d = 0 ; > n f o r 5 0 0 0 0 0 0 0 > c D e b u g g e r ( " p a n i > > jhb says: > > > Seems the two CPUs are deadlocked waiting on each other. The first sent a > > pmap_lazyfixup IPI to the second but the second has interrupts disabled as it > > is trying to send an IPI as well. > > He suggested a patch, but it did not fix the problem. Was this fixed with the IPI patches done before BETA2? > * linprocfs > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0x8 > fault code = supervisor read, page not present > instruction pointer = 0x8:0xc04e1870 > stack pointer = 0x10:0xf11e6b50 > frame pointer = 0x10:0xf11e6b6c > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 23938 (mtree) > kernel: type 12 trap, code=0 > Stopped at pfs_getattr+0x130: movl 0x8(%eax),%eax > db> trace > pfs_getattr(f11e6b78,c06fda00,cf397b2c,f11e6b98,d23e8a80) at pfs_getattr+0x130 > vn_stat(cf397b2c,f11e6c80,d23e8a80,0,c5eb0c60) at vn_stat+0x4f > lstat(c5eb0c60,f11e6d14,2,2,297) at lstat+0x6a > syscall(2f,2f,2f,805a200,805a248) at syscall+0x217 > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (190, FreeBSD ELF32, lstat), eip = 0x280ac664, esp = 0xbfbf7594, ebp = 0xbfbf7620 --- > > dosirak# addr2line -e kernel.debug 0xc04e1870 > /usr/src/sys/i386/compile/DOSIRAK/../../../fs/pseudofs/pseudofs_vnops.c:200 > > [...] > if (pvd->pvd_pid != NO_PID) { > if ((proc = pfind(pvd->pvd_pid)) == NULL) > PFS_RETURN (ENOENT); > --> vap->va_uid = proc->p_ucred->cr_ruid; > > rwatson has a patch that works around this particular null pointer > deref, but the underlying cause is not addressed. A patch to pseudofs_vnops.c was made that checks to make sure what pfind() returned was "usable". Did that solve this problem? Looks like that patch went in after you reported this because it's immediately above line 200 you show above. > * ULE has lots of problems (poor performance on HTT, unable to disable > HTT, incorrect load average reporting on SMP machines, ...). Should > be turned off until an active maintainer is found. re@ is discussing this now, it looks likely we will shift to 4BSD soon. > * --- > Fatal trap 12: page fault while in kernel mode > fault virtual address = 0x104 > fault code = supervisor read, page not present > instruction pointer = 0x8:0xc058a8cf > stack pointer = 0x10:0xdcb34cc4 > frame pointer = 0x10:0xdcb34cec > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = resume, IOPL = 0 > current process = 50 (schedcpu) > trap number = 12 > panic: page fault > > syncing disks, buffers remaining... panic: mi_switch: switch in a critical section > > addr2line says the panic was in kern/sched_4bsd.c:327 > > /* > * The kse slptimes are not touched in wakeup > * because the thread may not HAVE a KSE. > */ > if (ke->ke_state == KES_ONRUNQ) { > awake = 1; > ke->ke_flags &= ~KEF_DIDRUN; > ---> } else if ((ke->ke_state == KES_THREAD) && > (TD_IS_RUNNING(ke->ke_thread))) { > awake = 1; > > gdb -k got confused and couldn't make anything out of the backtrace. The code you quote above hasn't changed recently but a few kse related fixes have gone in recently if I recall correctly. Is this one still biting you? > * Machines with 4GB RAM do not auto-tune kernel memory parameters > optimally and easily panic under load with a panic message that does > not at least give instructions on what may be wrong and how to fix it. Work was done on that recently-ish, do you know off hand if that fixed what you were seeing? Thanks... -- Ken Smith - From there to here, from here to | kensmith@cse.buffalo.edu there, funny things are everywhere. | - Theodore Geisel |