Date: Thu, 18 Aug 2011 22:11:44 +0200 From: Attilio Rao <attilio@freebsd.org> To: Andriy Gapon <avg@freebsd.org> Cc: freebsd-hackers@freebsd.org, freebsd-stable@freebsd.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE Message-ID: <CAJ-FndCaTSoAU2Ycj=WEppzc1RmbQ6ugqiuuyCqUpYZuGXKt_g@mail.gmail.com> In-Reply-To: <4E4D717F.3090802@FreeBSD.org> References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk> <A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk> <4E4380C0.7070908@FreeBSD.org> <EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk> <4E43E272.1060204@FreeBSD.org> <62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk> <4E440865.1040500@FreeBSD.org> <6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk> <4E441314.6060606@FreeBSD.org> <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> <4E48D967.9060804@FreeBSD.org> <9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk> <4E490DAF.1080009@FreeBSD.org> <796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk> <4E491D01.1090902@FreeBSD.org> <570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk> <4E4AD35C.7020504@FreeBSD.org> <6A7238AED44542A880B082A40304D940@multiplay.co.uk> <4E4BA21F.6010805@FreeBSD.org> <581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk> <4E4BBA7F.30907@FreeBSD.org> <88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk> <4E4C22D6.6070407@FreeBSD.org> <4E4D717F.3090802@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
2011/8/18 Andriy Gapon <avg@freebsd.org>: > on 17/08/2011 23:21 Andriy Gapon said the following: >> >> It seems like everything starts with some kind of a race between >> terminating >> processes in a jail and termination of the jail itself. =C2=A0This is wh= ere the >> details are very thin so far. =C2=A0What we see is that a process (http)= is in >> exit(2) syscall, in exit1() function actually, and past the place where >> P_WEXIT >> flag is set and even past the place where p_limit is freed and reset to >> NULL. >> At that place the thread calls prison_proc_free(), which calls >> prison_deref(). >> Then, we see that in prison_deref() the thread gets a page fault because >> of what >> seems like a NULL pointer dereference. =C2=A0That's just the start of th= e >> problem and >> its root cause. >> >> Then, trap_pfault() gets invoked and, because addresses close to NULL lo= ok >> like >> userspace addresses, vm_fault/vm_fault_hold gets called, which in its tu= rn >> goes >> on to call vm_map_growstack. =C2=A0First thing that vm_map_growstack doe= s is a >> call >> to lim_cur(), but because p_limit is already NULL, that call results in = a >> NULL >> pointer dereference and a page fault. =C2=A0Goto the beginning of this >> paragraph. >> >> So we get this recursion of sorts, which only ends when a stack is >> exhausted and >> a CPU generates a double-fault. > > BTW, does anyone has an idea why the thread in question would "disappear" > from > the kgdb's point of view? > > (kgdb) p cpuid_to_pcpu[2]->pc_curthread->td_tid > $3 =3D 102057 > (kgdb) tid 102057 > invalid tid > > info threads also doesn't list the thread. > > Is it because the panic happened while the thread was somewhere in exit1(= )? > is there an easy way to examine its stack in this case? Yes it is likely it. 'tid' command should lookup the tid_to_thread() table (or similar name) which returns NULL, which means the thread has past beyond the point it was in the lookup table. Attilio --=20 Peace can only be achieved by understanding - A. Einstein
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndCaTSoAU2Ycj=WEppzc1RmbQ6ugqiuuyCqUpYZuGXKt_g>