From owner-freebsd-current@FreeBSD.ORG Sun Jul 4 09:42:15 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A073F16A4CE for ; Sun, 4 Jul 2004 09:42:15 +0000 (GMT) Received: from mail1.registrar.no (pop.registrar.no [217.116.80.19]) by mx1.FreeBSD.org (Postfix) with ESMTP id 68B0543D48 for ; Sun, 4 Jul 2004 09:42:15 +0000 (GMT) (envelope-from niklasfilter@mail1.registrar.no) Received: by mail1.registrar.no (Postfix, from userid 1004) id C5E6B525066; Sun, 4 Jul 2004 11:42:13 +0200 (CEST) Received: from [217.116.81.5] (rodhette.registrar.no [217.116.81.5]) by mail1.registrar.no (Postfix) with ESMTP id 870B8524FBF for ; Sun, 4 Jul 2004 11:42:13 +0200 (CEST) Message-ID: <40E7D0F4.9080104@saers.com> Date: Sun, 04 Jul 2004 11:42:12 +0200 From: Niklas Saers User-Agent: Mozilla Thunderbird 0.7 (X11/20040615) X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-current@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Filesystem lock in jailed environment X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Jul 2004 09:42:15 -0000 Hi all, in our system we've got three jail-hosts based on FreeBSD-CURRENT of July 2nd. The following problem has been a recurring problem since ~April 15th. The setup of every host is four "base" directories and ~250 jail environements that via nullfs have the base contents (userland applications, installed ports, etc) mounted and only save the data and config-files specific to their jail. Each jail is a webserver with its own IP. Upon doing large filesystem functions, such as a 'cp -R' of the data, tar'ing them, dumping the filesystems for backup, the servers have a 50% chance of "hanging". From having had a top(1) running while the system becomes inresponsive, it seems processes will go into an infinate loop waiting for the filesystem. (I once had it hanging for running df(1). ;-) ) And indeed, all the webs running will function until they make filesystem requests that are not cached. I'm not an experienced kernel-debugger, and hitting Ctrl-Alt-Esc only gives me a trace of the keyboard. If this is a good route to take, hints to how to go about are most welcome. The extra sysctl settings I've got in /boot/loader.conf are beastie_disable="YES" kern.ipc.maxpipekva="104857600" kern.maxfiles="65536" net.inet.ip.portrange.lowfirst="79" net.inet.ip.portrange.reservedhigh="79" kern.ipc.maxpipekva and maxfiles were set because all the jails required more than the default values were set to. The values now are about 5 times what is really used. The last two are to allow users to be in control of their webserver that binds to port 80. I've replaced all the hardware. I've rebuilt FreeBSD and the ports regularly. Still the servers go down about once every 24 hours, particularly when I'm asleep. ;-) And I'm fresh out of ideas to how to go about solving this. All suggestions are very much appreciated. Cheers Nik