Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 2 Sep 2004 11:59:47 -0400
From:      Ken Smith <kensmith@cse.Buffalo.EDU>
To:        Kris Kennaway <kris@obsecurity.org>
Cc:        current@FreeBSD.org
Subject:   Re: 5.3-RELEASE TODO
Message-ID:  <20040902155947.GA12006@electra.cse.Buffalo.EDU>
In-Reply-To: <20040715220447.GA32888@xor.obsecurity.org>
References:  <200407151424.i6FEOdoq060881@fledge.watson.org> <20040715220447.GA32888@xor.obsecurity.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jul 15, 2004 at 03:04:47PM -0700, Kris Kennaway wrote:

> These are the bugs I'm currently tracking (those I can remember right
> now, at least)
> 
> * SMP is unusable for me because of the following frequent panic
> (actually a panic and another kernel printf interleaved).  Here is the
> untangled version:
> 
> panic: APIC: Previous IPI is stu c k
>                                 p m a
>  _ l a z y f i x :   s p
> u c p u i d  =    0 ;
>  n   f o r   5 0 0 0 0 0 0 0
> c D e b u g g e r ( " p a n i
> 
> jhb says:
> 
> > Seems the two CPUs are deadlocked waiting on each other.  The first sent a
> > pmap_lazyfixup IPI to the second but the second has interrupts disabled as it
> > is trying to send an IPI as well.
> 
> He suggested a patch, but it did not fix the problem.

Was this fixed with the IPI patches done before BETA2?

> * linprocfs 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0x8
> fault code              = supervisor read, page not present
> instruction pointer     = 0x8:0xc04e1870
> stack pointer           = 0x10:0xf11e6b50
> frame pointer           = 0x10:0xf11e6b6c
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, def32 1, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 23938 (mtree)
> kernel: type 12 trap, code=0
> Stopped at      pfs_getattr+0x130:      movl    0x8(%eax),%eax
> db> trace
> pfs_getattr(f11e6b78,c06fda00,cf397b2c,f11e6b98,d23e8a80) at pfs_getattr+0x130
> vn_stat(cf397b2c,f11e6c80,d23e8a80,0,c5eb0c60) at vn_stat+0x4f
> lstat(c5eb0c60,f11e6d14,2,2,297) at lstat+0x6a
> syscall(2f,2f,2f,805a200,805a248) at syscall+0x217
> Xint0x80_syscall() at Xint0x80_syscall+0x1f
> --- syscall (190, FreeBSD ELF32, lstat), eip = 0x280ac664, esp = 0xbfbf7594, ebp = 0xbfbf7620 ---
> 
> dosirak# addr2line -e kernel.debug 0xc04e1870
> /usr/src/sys/i386/compile/DOSIRAK/../../../fs/pseudofs/pseudofs_vnops.c:200
> 
> [...]
>         if (pvd->pvd_pid != NO_PID) {
>                 if ((proc = pfind(pvd->pvd_pid)) == NULL)
>                         PFS_RETURN (ENOENT);
> -->             vap->va_uid = proc->p_ucred->cr_ruid;
> 
> rwatson has a patch that works around this particular null pointer
> deref, but the underlying cause is not addressed.

A patch to pseudofs_vnops.c was made that checks to make sure what pfind()
returned was "usable".  Did that solve this problem?  Looks like that
patch went in after you reported this because it's immediately above
line 200 you show above.

> * ULE has lots of problems (poor performance on HTT, unable to disable
> HTT, incorrect load average reporting on SMP machines, ...).  Should
> be turned off until an active maintainer is found.

re@ is discussing this now, it looks likely we will shift to 4BSD soon.

> * ---
> Fatal trap 12: page fault while in kernel mode
> fault virtual address   = 0x104
> fault code              = supervisor read, page not present
> instruction pointer     = 0x8:0xc058a8cf
> stack pointer           = 0x10:0xdcb34cc4
> frame pointer           = 0x10:0xdcb34cec
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, def32 1, gran 1
> processor eflags        = resume, IOPL = 0
> current process         = 50 (schedcpu)
> trap number             = 12
> panic: page fault
> 
> syncing disks, buffers remaining... panic: mi_switch: switch in a critical section
> 
> addr2line says the panic was in kern/sched_4bsd.c:327
> 
>                                 /*
>                                  * The kse slptimes are not touched in wakeup
>                                  * because the thread may not HAVE a KSE.
>                                  */
>                                 if (ke->ke_state == KES_ONRUNQ) {
>                                         awake = 1;
>                                         ke->ke_flags &= ~KEF_DIDRUN;
> --->                            } else if ((ke->ke_state == KES_THREAD) &&
>                                     (TD_IS_RUNNING(ke->ke_thread))) {
>                                         awake = 1;
> 
> gdb -k got confused and couldn't make anything out of the backtrace.

The code you quote above hasn't changed recently but a few kse related
fixes have gone in recently if I recall correctly.  Is this one still
biting you?

> * Machines with 4GB RAM do not auto-tune kernel memory parameters
> optimally and easily panic under load with a panic message that does
> not at least give instructions on what may be wrong and how to fix it.

Work was done on that recently-ish, do you know off hand if that fixed
what you were seeing?

Thanks...

-- 
						Ken Smith
- From there to here, from here to      |       kensmith@cse.buffalo.edu
  there, funny things are everywhere.   |
                      - Theodore Geisel |



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040902155947.GA12006>