From owner-freebsd-current@FreeBSD.ORG Tue Jan 13 13:34:06 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4E9F316A518; Tue, 13 Jan 2004 13:34:06 -0800 (PST) Received: from corbulon.video-collage.com (corbulon.video-collage.com [64.35.99.179]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0847543D54; Tue, 13 Jan 2004 13:34:04 -0800 (PST) (envelope-from mi+mx@aldan.algebra.com) Received: from 250-217.customer.cloud9.net (195-11.customer.cloud9.net [168.100.195.11])i0DLY15N096953 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 13 Jan 2004 16:34:02 -0500 (EST) (envelope-from mi+mx@aldan.algebra.com) Received: from localhost (mteterin@localhost [127.0.0.1]) i0DLXuYY059850; Tue, 13 Jan 2004 16:33:56 -0500 (EST) (envelope-from mi+mx@aldan.algebra.com) From: mi+mx@aldan.algebra.com Organization: Murex N.A. To: current@FreeBSD.org, Robert Watson Date: Tue, 13 Jan 2004 16:33:55 -0500 User-Agent: KMail/1.5.4 References: In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200401131633.55839@misha-mx.virtual-estates.net> X-Scanned-By: MIMEDefang 2.39 X-Mailman-Approved-At: Tue, 13 Jan 2004 15:14:41 -0800 Subject: Re: core-dumping over NFS X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jan 2004 21:34:06 -0000 On Mon, 12 Jan 2004, Mikhail Teterin wrote: => I've observed the following bad behaviour of -current mostly related to => dumping core of a buggy program over the NFS. =Sounds unfortunate. A quick starting question: Does the behavior change =at all if the core file does or doesn't already exist? It always exist already. The program crashes at the very end of its life, so I did not bother fixing it for a while. =Another question: this is a FreeBSD binary, or is it emulated? Could =you send the output of running 'file' on the binary? It is a FreeBSD binary, but it is produced using the Intel's compiler (and a lang/icc port). > . 5.2-CURRENT (Dec 14) client, Solaris-8 server: > created core file is empty (zero sized). => . 5.2-CURRENT (Dec 14) server, RedHat-9 client: => core is created properly, but sometimes the server goes => into a frenzy with the sys-component (bufdaemon) taking => up the entire 100% of the CPU-time (P4 at 2GHz); it only => writes @4Mb/s (~14% of the disk's bandwidth) and the => only cure is to restart the /etc/rc.d/nfsd; trying to, => for example, switch from X11 to a textual console, when => this is happening reliably hangs the machine. =Er. Ouch. Can you confirm if there's an on-going series of RPCs from =the client driving the I/O, or if it's just things going nuts on the =server? Can't tell right now... But it does not always happen -- usually (say, 90% of the time), the program will just dump core and die. =Also, what block size is the RedHat client using by default? =Could you set up a serial console -- if so, do you get any interesting =messages? The machine will survive this "storm" if I restart nfs and there will be nothing interesting in /var/log/messages. => . 5.2-CURRENT (Dec 14) server, 5.2-RC2 (Jan 10) client: => dumps happen normally with rw-mounts, but mounting the => FS read-only (so as to prevent core-dumps) leads to a => panic on the client... => => The mounts are regular and default (v3?), except for the ``intr'' flag. => No rpc.lock or anything... =Could you provide the panic message and stack trace? Not right now. The machine is in use by another person... -mi