From owner-freebsd-stable@FreeBSD.ORG Wed Apr 4 14:39:51 2007 Return-Path: X-Original-To: freebsd-stable@FreeBSD.ORG Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5BB7416A477 for ; Wed, 4 Apr 2007 14:39:51 +0000 (UTC) (envelope-from chris#@1command.com) Received: from mail.1command.com (mail.1command.com [216.177.243.35]) by mx1.freebsd.org (Postfix) with ESMTP id E393413C46E for ; Wed, 4 Apr 2007 14:39:50 +0000 (UTC) (envelope-from chris#@1command.com) Received: from mail.1command.com (localhost.1command.com [127.0.0.1]) by mail.1command.com (8.13.3/8.13.3) with ESMTP id l34Ede2c048722 for ; Wed, 4 Apr 2007 07:39:47 -0700 (PDT) (envelope-from chris#@1command.com) Received: (from www@localhost) by mail.1command.com (8.13.3/8.13.3/Submit) id l34Eddsw048721 for freebsd-stable@FreeBSD.ORG; Wed, 4 Apr 2007 07:39:39 -0700 (PDT) (envelope-from chris#@1command.com) X-Authentication-Warning: mail.1command.com: www set sender to chris#@1command.com using -f Received: from brickwall.spam-fighters.com (brickwall.spam-fighters.com [216.177.243.55]) by webmail.1command.com (H.R. Communications Messaging System) with HTTP; Wed, 04 Apr 2007 07:39:39 -0700 Message-ID: <20070404073939.h9p3mgp2m88kswk8@webmail.1command.com> X-Priority: 3 (Normal) Date: Wed, 04 Apr 2007 07:39:39 -0700 From: "Chris H." To: freebsd-stable@FreeBSD.ORG References: <200704041427.l34ERGP3037877@lurza.secnetix.de> In-Reply-To: <200704041427.l34ERGP3037877@lurza.secnetix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable User-Agent: H.R. Communications Internet Messaging System (HCIMS) 4.1 Professional (not for redistribution) / FreeBSD-5.5 Cc: Subject: Re: NFS == lock && reboot X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Apr 2007 14:39:51 -0000 Quoting Oliver Fromme : > Chris H. wrote: > > Thomas David Rivers wrote: > > > I have found that if I kill rpc.lockd on the NFS server, > > > most of the NFS issues I have (including a similar lock-up on > > > 6.1-RELEASE) go away. > > FWIW, I also had problems with running rpc.lockd and > rpc.statd (no panics, though). If you don't need them > (i.e. you don't need cross-machine locking), then don't > use them. Use the -L flag to mount_nfs so at least > local locking works. > > > You don't happen to have any experiences keeping rpc.statd > > running? > > Basically, it doesn't make much sense to run one without > the other. If you disable rpc.lockd, you can also safely > disable rpc.statd. > > However, I don't think that your actual problem (lock-up > and panics) is related to rpc.lockd or rpc.statd. It > rather sounds like something else is wrong with your > machine. NFS works perfectly fine for me, including > copying huge files. > > You wrote that you had a lot of crashes that accumulated > many files in lost+found. Well, maybe your filesystem > was somehow damaged in the process. It is possible to > damage file systems in a way that can lead to panics, and > it's not necessarily detected and repaired by fsck. Indeed. I /too/ considered this. However, I largely dismissed this as a possibility as most all of them are 0 length in size. The others are fragments of logs. I'm not /completely/ ruling this out though. > > > > > # cp /path/to/approx/10Mb/file /host/path/to/dest/dir/ > > > > > > > > Fatal double fault > > > > eis 0x0blah > > > > eiblah blah0x > > > > panic double fault > > > > no dump device defined > > You should try to setup a dump device, so you get a kernel > crash dump next time. The crash dump can be used to find > out where the crash occured -- and I bet it's not in the > NFS code. > > See the Handbook for details on how to setup a dump device. > > By the way, does the problem also occur when copying the > file to/from a memory disk, so no physical disk is involved? > That way you would exclude the disk and the disk driver as > potential causes. Similarly, try a loopback NFS mount > (i.e. mount from 127.0.0.1) in order to exclude the network > interface driver as a potential cause. > > If the problem still exists when copying a 10 MB file from > a memory disk to a memory disk (same or other) via a > localhost mount on the same machine, then it looks like > the NFS code might be at fault. > > Best regards > Oliver All good advise. I'm going to /initially/ take the easy way out first (remove lockd/statd from rc.conf). As a quick experiment. Then I'll endevour to investigate further using your suggestions. Thank you very much for all your time and thoughtful answer. --Chris > > -- > Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. > Handelsregister: Registergericht Muenchen, HRA 74606, Gesch=E4ftsfuehrun= g: > secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht M=FC= n- > chen, HRB 125758, Gesch=E4ftsf=FChrer: Maik Bachmann, Olaf Erb, Ralf Geb= hart > > FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd > > "C++ is the only current language making COBOL look good." > -- Bertrand Meyer > -- panic: kernel trap (ignored) ----------------------------------------------------------------- FreeBSD 5.4-RELEASE-p12 (SMP - 900x2) Tue Mar 7 19:37:23 PST 2006 /////////////////////////////////////////////////////////////////