From owner-freebsd-stable@FreeBSD.ORG Tue Jul 10 07:51:57 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 80C06106566C for ; Tue, 10 Jul 2012 07:51:57 +0000 (UTC) (envelope-from vince@unsane.co.uk) Received: from unsane.co.uk (unsane-pt.tunnel.tserv5.lon1.ipv6.he.net [IPv6:2001:470:1f08:110::2]) by mx1.freebsd.org (Postfix) with ESMTP id E952D8FC15 for ; Tue, 10 Jul 2012 07:51:56 +0000 (UTC) Received: from vincemacbook.unsane.co.uk (vincemacbook.unsane.co.uk [10.10.10.20]) (authenticated bits=0) by unsane.co.uk (8.14.5/8.14.5) with ESMTP id q6A7Bw0d051960 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 10 Jul 2012 08:13:00 +0100 (BST) (envelope-from vince@unsane.co.uk) Message-ID: <4FFBD5BD.7010905@unsane.co.uk> Date: Tue, 10 Jul 2012 08:11:57 +0100 From: Vincent Hoffman User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:13.0) Gecko/20120614 Thunderbird/13.0.1 MIME-Version: 1.0 To: "Arno J. Klaassen" References: <4FF7055D.9000507@unsane.co.uk> <4FF76066.1000401@unsane.co.uk> In-Reply-To: X-Enigmail-Version: 1.4.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org Subject: Re: nfs-bug when server for 9-Stable becomes client as well ? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jul 2012 07:51:57 -0000 On 09/07/2012 23:00, Arno J. Klaassen wrote: > Vincent Hoffman writes: > >> On 06/07/2012 18:51, Arno J. Klaassen wrote: >>> Vincent Hoffman writes: >>> >>>> On 06/07/2012 14:19, Arno J. Klaassen wrote: >>>>> Hello, >>>>> >>>>> looks like I discouvered a probable bug in the nfs-code, very >>>>> easy to reproduce in my setup : >>>>> >>>>> >>>>> Machine-1 : Today's 9-stable, exporting /files (ufs) and /z2 (zfs) >>>>> >>>>> Machine-2 : 8-stable as of April the 10th exporting /raid1 >>>>> >>>>> On Machine-1 I mount /raid1 (rw,nfsv3,intr,tcp,rsize=32768,wsize=32768) >>>>> and start a script on this mount looping something like : >>>>> >>>>> dd if=/dev/random of=BIG bs=1048576 count=${SIZE} >>>>> cp -fp BIG BIG2 >>>>> cmp -x BIG BIG2 >>>>> >>>>> I let this run for 24 hours (from time to time stressing Machine-1 with >>>>> other scripts, including provoking heavy swapping), no problem at all. >>>>> >>>>> However, then I mount /z2 (rw,nfsv3,intr,tcp,rsize=32768,wsize=32768) >>>>> on Machine-2, and *immediately* the above loop on Machine-1 fails : >>>>> >>>>> Copying file ...cp: BIG: Permission denied >>>>> >>>>> No console messages this time, last time I got >>>>> >>>>> kernel: nfs_getpages: error 13 >>>>> kernel: vm_fault: pager read error, pid 87803 (cmp) >>>>> >>>>> on Machine-1. >>>>> >>>>> I repeated this scenario by replacing Machine-2 with a good old >>>>> 6-4-stable one, same outcome. >>>>> >>>>> Please tell me what I could do to nail this down a bit more. >>>> Its possible (although not definite) that you have hit the a mountd bug >>>> as documented in PRs >>>> >>>> kern/131342 >>>> kern/136865 >>> especially kern/131342 looks similar and quite old; funny I never hit >>> this before, I basically do the same tests since 'ages' on each new box. >>> Could be that faster network/cpu unreveals some race condition; I notice >>> as well that this server is the first (IIRC) who uses 3 different IRQs >>> for network interrupts (em(4) Intel(R) PRO/1000). >> Certainly possible and seems reasonable enough. > just my $0.02, I glanced kern/131342, looks like the culprit should be > something like a 'non-atomic'-operation in-between invalidating old > /etc/exports and validating new /etc/exports. > Wonder if just verifying /var/run/mountd.pid is newer than /etc/exports > and if true just skip that operation would be an acceptable band-aid (if > I understood correctly, a rewrite of mountd correcting this (amongst > others) is close to hit -current (?)) I dont know how close it (nfse) is to hitting -current. It certainly looked good from my quick look over but there are a few minor incompatibilities in the exports syntax even in compatibility mode that seem to be stopping acceptance (I'm hoping the problem is a little more complex but thats all I understand it to be.) In the mean time I'm testing a second patch from rick to see if that helps. > >>>> I've recently asked on -CURRENT about this and had a patch to try from >>>> Rick, I'm testing it now but it doesnt seem to fix it for me, just >>>> improve it alothough I'm trying to get enough runs to be a valid sample. >>>> (see >>>> http://docs.freebsd.org/cgi/getmsg.cgi?fetch=377627+0+archive/2012/freebsd-current/20120701.freebsd-current >>>> ) >>>> >>>> What I did for my production nas was edit mount.c so it didnt send a >>>> SIGHUP to mountd as suggested by rick, as it was easy to do and non >>>> intrusive. >>> hmm, this means I should patch each fbsd-client, no? May be easier to >>> patch mountd to ignore SIHGUP and use some non-standard signal to force >>> re-init? >> No just patch /sbin/mount on the nfs server so it doesnt send the SIGHUP >> to mountd. > [In my case] it's the mount on a client which causes the server to fail, > I don't see how patching /sbin/mount on the nfs server should fix this? > As I don't remember if it's possible to discriminate a -1 signal send > from a process against one sent from terminal, if so, another bandaid, > one sent from a process could be ignored at all? Your message above seemed to say that you were running the test on machine-1 on an export from machine-2, you then mounted an export from machine-1 on machine-2 (ran the mount command on machine-2, the original NFS server) which caused the test machine-1 was running to fail, as machine-2 sent a "permission denied" If i understood this incorrectly my guess at your problem could be completely off track. Vince > Merci > > Arno > > >> you can manually HUP mountd if needed. >>> Arno >>> >>> >>>> Vince >>>> >>>>> Thanx in advance, >>>>> >>>>> Best, Arno >> >> _______________________________________________ >> freebsd-stable@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >>