Date: Sat, 20 Aug 2011 14:14:15 +0100 From: "Steven Hartland" <killing@multiplay.co.uk> To: "Andriy Gapon" <avg@FreeBSD.org> Cc: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE Message-ID: <4E55CB4A4F694A7997FEBDF9EADF87F5@multiplay.co.uk> References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk><4E43E272.1060204@FreeBSD.org><62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk><4E440865.1040500@FreeBSD.org><6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk><4E441314.6060606@FreeBSD.org><2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk><4E48D967.9060804@FreeBSD.org><9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk><4E490DAF.1080009@FreeBSD.org><796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk><4E491D01.1090902@FreeBSD.org><570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk><4E4AD35C.7020504@FreeBSD.org><6A7238AED44542A880B082A40304D940@multiplay.co.uk><4E4BA21F.6010805@FreeBSD.org><581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk><4E4BBA7F.30907@FreeBSD.org><88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk> <4E4C22D6.6070407@FreeBSD.org> <4019027648B5493AAC4B654BD821DE88@multiplay.co.uk> <4E4F8631.1070300@FreeBSD.org> <4E4F8821.80108@ FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
----- Original Message ----- From: "Andriy Gapon" <avg@FreeBSD.org> > BTW, I suspect the following scenario, but I am not able to > verify it either via testing or in the code: > - last process in a dying jail exits > - pr_uref of the jail reaches zero > - pr_uref of prison0 gets decremented > - you attach to the jail and resurrect it > - but pr_uref of prison0 stays decremented > > Repeat this enough times and prison0.pr_uref reaches zero. > To reach zero even sooner just kill enough of non-jailed processes. Ahh now that explains all of our experienced panic scenarios:- 1. jail stop / start causing the panic but only after at least a few days worth of uptime. Here what we're seeing is enough "leak" of pr_uref from the restarted jails to decrement prison0.pr_uref to 0 even with all the standard unjailed processes still running. 2. A machine reboot, after all jails have been stopped but after less time than #2. In this case we haven't seen enough leakage to decrement prison0.pr_uref to 0 given the number or prison0 process but it has been incorrectly decremented, so as soon as the reboot kicks in and prison0 processes start exiting, prison0.pr_uref gets further decremented and again hits 0 when it shouldn't Now if this is the case, we should be able to confirm it with a little more info. 1. What exactly does pr_uref represent? 2. Can what its value should be, be calculated from examining other details of the system i.e. number of running processes, number of running jails? If we can calculate the value that prison0.pr_uref should be, then by examining the machines we have which have been up for a while, we should be able to confirm if an incorrect value is present on them and hence prove this is the case. Ideally a little script to run in kgdb to test this would be the best way to go. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E55CB4A4F694A7997FEBDF9EADF87F5>