From owner-freebsd-hackers@FreeBSD.ORG Fri Dec 1 11:15:47 2006 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id BDB6C16A403 for ; Fri, 1 Dec 2006 11:15:47 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id CBE7D43CA8 for ; Fri, 1 Dec 2006 11:15:33 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id BACAC46DC7; Fri, 1 Dec 2006 06:15:46 -0500 (EST) Date: Fri, 1 Dec 2006 11:15:46 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: "Bjoern A. Zeeb" In-Reply-To: <20061201104809.P91892@maildrop.int.zabbadoz.net> Message-ID: <20061201111209.M79653@fledge.watson.org> References: <00c001c71535$7e7d7670$b3db87d4@multiplay.co.uk> <20061201104809.P91892@maildrop.int.zabbadoz.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-hackers@freebsd.org, Steven Hartland Subject: Re: Unable to stop a jail X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Dec 2006 11:15:47 -0000 On Fri, 1 Dec 2006, Bjoern A. Zeeb wrote: > On Fri, 1 Dec 2006, Steven Hartland wrote: > >> We've got a jail here which we cant stop with either killall jexec or jkill >> all return success but jls still reports the jail as running. >> >> The machines running several other jails which I cant restart at this time >> so I ended up starting the jail again jls now reports: jls >> JID IP Address Hostname Path >> 9 10.10.0.5 jail6 /usr/local/jails/jail6 >> 7 10.10.0.5 jail6 /usr/local/jails/jail6 >> 6 10.10.0.4 jail5 /usr/local/jails/jail5 >> 5 10.10.0.39 jail4 /usr/local/jails/jail4 >> 3 10.10.0.6 jail3 /usr/local/jails/jail3 >> 2 10.10.0.8 jail2 /usr/local/jails/jail2 >> 1 10.10.0.7 jail1 /usr/local/jails/jail1 >> >> Host machine is running FreeBSD-6.1-P10 >> >> Any ideas some sort of kernel data corruption? > > no the jails should really be gone (you should not find any sockets or > processes for them after some seconds) - at least it should be that way... > > See http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/89528 Not all cases of straggling jails are leaks -- does netstat -n show that all the TIME_WAIT TCP connections in the jail have been GC'd? Because security state may be used in the network stack for TCP packet transmission/reception, the ucred remains referenced until the last socket/pcb associated with it are free'd. I've been wondering if we should add a jail process counter, and hide jails in jls if the counter is zero (with a -a argument or such to show them). One idea I've been kicking around is adding a zombie state for jails, in which some straggling references exist, but (a) there are no processes in the jail, and (b) no new processes are allowed to enter the jail. The significance of (b) is that we could vrele() the vnode reference hung off the jail; there's been at least one report that this vnode reference causes issues, as the file system it's from can't be unmounted until the last jail reference evaporates. In essence, this would move to having two reference counts on the prison: a "strong" reference that has to do with having process members, and a "weak" reference that has to do with ucreds pointing at the prison. Robert N M Watson Computer Laboratory University of Cambridge