Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 20 Aug 2011 14:14:15 +0100
From:      "Steven Hartland" <killing@multiplay.co.uk>
To:        "Andriy Gapon" <avg@FreeBSD.org>
Cc:        freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org
Subject:   Re: debugging frequent kernel panics on 8.2-RELEASE
Message-ID:  <4E55CB4A4F694A7997FEBDF9EADF87F5@multiplay.co.uk>
References:  <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk><4E43E272.1060204@FreeBSD.org><62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk><4E440865.1040500@FreeBSD.org><6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk><4E441314.6060606@FreeBSD.org><2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk><4E48D967.9060804@FreeBSD.org><9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk><4E490DAF.1080009@FreeBSD.org><796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk><4E491D01.1090902@FreeBSD.org><570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk><4E4AD35C.7020504@FreeBSD.org><6A7238AED44542A880B082A40304D940@multiplay.co.uk><4E4BA21F.6010805@FreeBSD.org><581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk><4E4BBA7F.30907@FreeBSD.org><88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk> <4E4C22D6.6070407@FreeBSD.org> <4019027648B5493AAC4B654BD821DE88@multiplay.co.uk> <4E4F8631.1070300@FreeBSD.org> <4E4F8821.80108@ FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
----- Original Message ----- 
From: "Andriy Gapon" <avg@FreeBSD.org>

> BTW, I suspect the following scenario, but I am not able to
> verify it either via testing or in the code:
> - last process in a dying jail exits
> - pr_uref of the jail reaches zero
> - pr_uref of prison0 gets decremented
> - you attach to the jail and resurrect it
> - but pr_uref of prison0 stays decremented
> 
> Repeat this enough times and prison0.pr_uref reaches zero.
> To reach zero even sooner just kill enough of non-jailed processes.

Ahh now that explains all of our experienced panic scenarios:-
1. jail stop / start causing the panic but only after at least a
few days worth of uptime.

Here what we're seeing is enough "leak" of pr_uref from the restarted
jails to decrement prison0.pr_uref to 0 even with all the standard
unjailed processes still running.

2. A machine reboot, after all jails have been stopped but after
less time than #2.

In this case we haven't seen enough leakage to decrement
prison0.pr_uref to 0 given the number or prison0 process but
it has been incorrectly decremented, so as soon as the reboot kicks
in and prison0 processes start exiting, prison0.pr_uref gets 
further decremented and again hits 0 when it shouldn't

Now if this is the case, we should be able to confirm it with a little
more info.

1. What exactly does pr_uref represent?
2. Can what its value should be, be calculated from examining other
details of the system i.e. number of running processes, number of
running jails?

If we can calculate the value that prison0.pr_uref should be, then
by examining the machines we have which have been up for a while,
we should be able to confirm if an incorrect value is present on
them and hence prove this is the case.

Ideally a little script to run in kgdb to test this would be the
best way to go.

    Regards
    Steve


================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E55CB4A4F694A7997FEBDF9EADF87F5>