From owner-freebsd-stable@FreeBSD.ORG  Mon Jul  9 22:01:26 2012
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id DBEF5106566B
	for <freebsd-stable@freebsd.org>; Mon,  9 Jul 2012 22:01:26 +0000 (UTC)
	(envelope-from arno@heho.snv.jussieu.fr)
Received: from shiva.jussieu.fr (shiva.jussieu.fr [134.157.0.129])
	by mx1.freebsd.org (Postfix) with ESMTP id 8AB888FC0C
	for <freebsd-stable@freebsd.org>; Mon,  9 Jul 2012 22:01:26 +0000 (UTC)
Received: from heho.snv.jussieu.fr (heho.snv.jussieu.fr [134.157.184.22])
	by shiva.jussieu.fr (8.14.4/jtpda-5.4) with ESMTP id q69M0rxH029382
	; Tue, 10 Jul 2012 00:01:06 +0200 (CEST)
X-Ids: 165
Received: from heho.snv.jussieu.fr (localhost [127.0.0.1])
	by heho.snv.jussieu.fr (8.14.3/8.14.3) with ESMTP id q69M0O8E070483;
	Tue, 10 Jul 2012 00:00:24 +0200 (CEST)
	(envelope-from arno@heho.snv.jussieu.fr)
Received: (from arno@localhost)
	by heho.snv.jussieu.fr (8.14.3/8.14.3/Submit) id q69M0NMC070480;
	Tue, 10 Jul 2012 00:00:23 +0200 (CEST) (envelope-from arno)
To: Vincent Hoffman <vince@unsane.co.uk>
From: "Arno J. Klaassen" <arno@heho.snv.jussieu.fr>
References: <wpy5mxxc1f.fsf@heho.snv.jussieu.fr>
	<4FF7055D.9000507@unsane.co.uk> <wp8vewsrqn.fsf@heho.snv.jussieu.fr>
	<4FF76066.1000401@unsane.co.uk>
Date: Tue, 10 Jul 2012 00:00:23 +0200
In-Reply-To: <4FF76066.1000401@unsane.co.uk> (Vincent Hoffman's message of
	"Fri\, 06 Jul 2012 23\:02\:14 +0100")
Message-ID: <wpmx38bnns.fsf@heho.snv.jussieu.fr>
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.3 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Miltered: at jchkmail.jussieu.fr with ID 4FFB5495.002 by Joe's j-chkmail
	(http : // j-chkmail dot ensmp dot fr)!
X-j-chkmail-Enveloppe: 4FFB5495.002/134.157.184.22/heho.snv.jussieu.fr/heho.snv.jussieu.fr/<arno@heho.snv.jussieu.fr>
Cc: freebsd-stable@freebsd.org
Subject: Re: nfs-bug when server for 9-Stable becomes client as well ?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 09 Jul 2012 22:01:27 -0000

Vincent Hoffman <vince@unsane.co.uk> writes:

> On 06/07/2012 18:51, Arno J. Klaassen wrote:
>> Vincent Hoffman <vince@unsane.co.uk> writes:
>>
>>> On 06/07/2012 14:19, Arno J. Klaassen wrote:
>>>> Hello,
>>>>
>>>> looks like I discouvered a probable bug in the nfs-code, very
>>>> easy to reproduce in my setup :
>>>>
>>>>
>>>>    Machine-1 : Today's 9-stable, exporting /files (ufs) and /z2 (zfs)
>>>>
>>>>    Machine-2 : 8-stable as of April the 10th exporting /raid1
>>>>
>>>> On Machine-1 I mount /raid1 (rw,nfsv3,intr,tcp,rsize=32768,wsize=32768)
>>>> and start a script on this mount looping something like :
>>>>
>>>>   dd if=/dev/random of=BIG bs=1048576 count=${SIZE}
>>>>   cp -fp BIG BIG2
>>>>   cmp -x BIG BIG2
>>>>
>>>> I let this run for 24 hours (from time to time stressing Machine-1 with
>>>> other scripts, including provoking heavy swapping), no problem at all.
>>>>
>>>> However, then I mount /z2 (rw,nfsv3,intr,tcp,rsize=32768,wsize=32768)
>>>> on Machine-2, and *immediately* the above loop on Machine-1 fails :
>>>>
>>>>   Copying file ...cp: BIG: Permission denied
>>>>
>>>> No console messages this time, last time I got 
>>>>
>>>>   kernel: nfs_getpages: error 13
>>>>   kernel: vm_fault: pager read error, pid 87803 (cmp)
>>>>
>>>> on Machine-1.
>>>>
>>>> I repeated this scenario by replacing Machine-2 with a good old
>>>> 6-4-stable one, same outcome.
>>>>
>>>> Please tell me what I could do to nail this down a bit more.
>>> Its possible (although not definite) that you have hit the a mountd bug
>>> as documented in PRs
>>>
>>> kern/131342
>>> kern/136865
>> especially kern/131342 looks similar and quite old; funny I never hit
>> this before, I basically do the same tests since 'ages' on each new box.
>> Could be that faster network/cpu unreveals some race condition; I notice
>> as well that this server is the first (IIRC) who uses 3 different IRQs
>> for network interrupts (em(4) Intel(R) PRO/1000).
> Certainly possible and seems reasonable enough.

just my $0.02, I glanced kern/131342, looks like the culprit should be
something like a 'non-atomic'-operation in-between invalidating old
/etc/exports and validating new /etc/exports.
Wonder if just verifying /var/run/mountd.pid is newer than /etc/exports
and if true just skip that operation would be an acceptable band-aid (if
I understood correctly, a rewrite of mountd correcting this (amongst
others) is close to hit -current (?))

>>> I've recently asked on -CURRENT about this and had a patch to try from
>>> Rick, I'm testing it now but it doesnt seem to fix it for me, just
>>> improve it alothough I'm trying to get enough runs to be a valid sample.
>>> (see
>>> http://docs.freebsd.org/cgi/getmsg.cgi?fetch=377627+0+archive/2012/freebsd-current/20120701.freebsd-current
>>> )
>>>
>>> What I did for my production nas was edit mount.c so it didnt send a
>>> SIGHUP to mountd as suggested by rick, as it was easy to do and non
>>> intrusive.
>> hmm, this means I should patch each fbsd-client, no? May be easier to
>> patch mountd to ignore SIHGUP and use some non-standard signal to force
>> re-init?
> No just patch /sbin/mount on the nfs server so it doesnt send the SIGHUP
> to mountd.

[In my case] it's the mount on a client which causes the server to fail,
I don't see how patching /sbin/mount on the nfs server should fix this?
As I don't remember if it's possible to discriminate a -1 signal send
from a process against one sent from terminal, if so, another bandaid,
one sent from a process could be ignored at all?

Merci

Arno


> you can manually HUP mountd if needed.
>>
>> Arno
>>
>>
>>> Vince
>>>
>>>> Thanx in advance,
>>>>
>>>> Best, Arno
>
>
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>

-- 

  Arno J. Klaassen

  SCITO S.A.
  8 rue des Haies
  F-75020 Paris, France
  http://scito.com