From owner-freebsd-jail@FreeBSD.ORG Sat Aug 20 10:10:55 2011 Return-Path: Delivered-To: freebsd-jail@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9C3F0106564A; Sat, 20 Aug 2011 10:10:55 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 91EF98FC17; Sat, 20 Aug 2011 10:10:44 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA10423; Sat, 20 Aug 2011 13:10:42 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1QuiVl-000Nrj-VF; Sat, 20 Aug 2011 13:10:41 +0300 Message-ID: <4E4F8821.80108@FreeBSD.org> Date: Sat, 20 Aug 2011 13:10:41 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:6.0) Gecko/20110819 Thunderbird/6.0 MIME-Version: 1.0 To: Steven Hartland References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><4E43E272.1060204@FreeBSD.org><62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk><4E440865.1040500@FreeBSD.org><6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk><4E441314.6060606@FreeBSD.org><2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk><4E48D967.9060804@FreeBSD.org><9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk><4E490DAF.1080009@FreeBSD.org><796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk><4E491D01.1090902@FreeBSD.org><570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk><4E4AD35C.7020504@FreeBSD.org><6A7238AED44542A880B082A40304D940@multiplay.co.uk><4E4BA21F.6010805@FreeBSD.org><581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk><4E4BBA7F.30907@FreeBSD.org><88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk> <4E4C22D6.6070407@FreeBSD.org> <4019027648B5493AAC4B654BD821DE88@multiplay.co.uk> <4E4F8631.1070300@FreeBSD.org> In-Reply-To: <4E4F8631.1070300@FreeBSD.org> X-Enigmail-Version: undefined Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-jail@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion about FreeBSD jail\(8\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 10:10:55 -0000 on 20/08/2011 13:02 Andriy Gapon said the following: > on 18/08/2011 02:15 Steven Hartland said the following: >> In a nutshell the jail manager we're using will attempt to resurrect the jail >> from a dieing state in a few specific scenarios. >> >> Here's an exmaple:- >> 1. jail restart requested >> 2. jail is stopped, so the java processes is killed off, but active tcp sessions >> may prevent the timely full shutdown of the jail. >> 3. if an existing jail is detected, i.e. a dieing jail from #2, instead of >> starting a new jail we attach to the old one and exec the new java process. >> 4. if an existing jail isnt detected, i.e. where there where not hanging tcp >> sessions and #2 cleanly shutdown the jail, a new jail is created, attached to >> and the java exec'ed. >> >> The system uses static jailid's so its possible to determine if an existing >> jail for this "service" exists or not. This prevents duplicate services as >> well as making services easy to identify by their jailid. >> >> So what we could be seeing is a race between the jail shutdown and the attach >> of the new process? > > Not a jail expert at all, but a few suggestions... > > First, wouldn't the 'persist' jail option simplify your life a little bit? > > Second, you may want to try to monitor value of prison0.pr_uref variable (e.g. > via kgdb) while executing various scenarios of what you do now. If after > finishing a certain scenario you end up with a value lower than at the start of > scenario, then this is the troublesome one. > Please note that prison0.pr_uref is composed from a number of non-jailed > processes plus a number of top-level jails. So take this into account when > comparing prison0.pr_uref values - it's better to record the initial value when > no jails are started and it's important to keep the number of non-jailed > processes the same (or to account for its changes). BTW, I suspect the following scenario, but I am not able to verify it either via testing or in the code: - last process in a dying jail exits - pr_uref of the jail reaches zero - pr_uref of prison0 gets decremented - you attach to the jail and resurrect it - but pr_uref of prison0 stays decremented Repeat this enough times and prison0.pr_uref reaches zero. To reach zero even sooner just kill enough of non-jailed processes. -- Andriy Gapon