Date: Thu, 18 Aug 2011 12:21:16 +0300 From: Andriy Gapon <avg@FreeBSD.org> To: Steven Hartland <killing@multiplay.co.uk> Cc: freebsd-hackers@FreeBSD.org, freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE Message-ID: <4E4CD98C.1000301@FreeBSD.org> In-Reply-To: <4019027648B5493AAC4B654BD821DE88@multiplay.co.uk> References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk><4E43E272.1060204@FreeBSD.org><62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk><4E440865.1040500@FreeBSD.org><6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk><4E441314.6060606@FreeBSD.org><2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk><4E48D967.9060804@FreeBSD.org><9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk><4E490DAF.1080009@FreeBSD.org><796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk><4E491D01.1090902@FreeBSD.org><570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk><4E4AD35C.7020504@FreeBSD.org><6A7238AED44542A880B082A40304D940@multiplay.co.uk><4E4BA21F.6010805@FreeBSD.org><581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk><4E4BBA7F.30907@FreeBSD.org><88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk> <4E4C22D6.6070407@FreeBSD.org> <4019027648B5493AAC4B654BD821DE88@multiplay.co.! uk>
next in thread | previous in thread | raw e-mail | index | archive | help
on 18/08/2011 02:15 Steven Hartland said the following: > ----- Original Message ----- From: "Andriy Gapon" <avg@FreeBSD.org> > >> Thanks to the debug that Steven provided and to the help that I received from >> Kostik, I think that now I understand the basic mechanics of this panic, but, >> unfortunately, not the details of its root cause. >> >> It seems like everything starts with some kind of a race between terminating >> processes in a jail and termination of the jail itself. This is where the >> details are very thin so far. What we see is that a process (http) is in >> exit(2) syscall, in exit1() function actually, and past the place where P_WEXIT >> flag is set and even past the place where p_limit is freed and reset to NULL. >> At that place the thread calls prison_proc_free(), which calls prison_deref(). >> Then, we see that in prison_deref() the thread gets a page fault because of what >> seems like a NULL pointer dereference. That's just the start of the problem and >> its root cause. > > Thats interesting, are you using http as an example or is that something thats > been gleaned from the debugging of our output? I ask as there's only one process > running in each of our jails and thats a single java process. It's from the debug data: p_comm = "httpd" I also would like to ask you to revert the last patch that I sent you (with tf_rip comparisons) and try the patch from Kostik instead. Given what we suspect about the problem, can please also try to provoke the problem by e.g. doing frequent jail restarts or something else that supposedly should hit the bug. -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E4CD98C.1000301>