From owner-freebsd-questions@FreeBSD.ORG Fri Mar 19 20:29:49 2010 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A54C9106564A for ; Fri, 19 Mar 2010 20:29:49 +0000 (UTC) (envelope-from korvus@comcast.net) Received: from mx04.pub.collaborativefusion.com (mx04.pub.collaborativefusion.com [206.210.72.84]) by mx1.freebsd.org (Postfix) with ESMTP id 5E7488FC19 for ; Fri, 19 Mar 2010 20:29:49 +0000 (UTC) Received: from [192.168.2.164] ([206.210.89.202]) by mx04.pub.collaborativefusion.com (StrongMail Enterprise 4.1.1.4(4.1.1.4-47689)); Fri, 19 Mar 2010 16:45:20 -0400 X-VirtualServerGroup: Default X-MailingID: 00000::00000::00000::00000::::1300 X-SMHeaderMap: mid="X-MailingID" X-Destination-ID: freebsd-questions@freebsd.org X-SMFBL: ZnJlZWJzZC1xdWVzdGlvbnNAZnJlZWJzZC5vcmc= Message-ID: <4BA3DEBC.2000608@comcast.net> Date: Fri, 19 Mar 2010 16:29:48 -0400 From: Steve Polyack User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.7) Gecko/20100311 Thunderbird/3.0.1 MIME-Version: 1.0 To: John Baldwin References: <4BA3613F.4070606@comcast.net> <201003190831.00950.jhb@freebsd.org> <4BA37AE9.4060806@comcast.net> <4BA392B1.4050107@comcast.net> In-Reply-To: <4BA392B1.4050107@comcast.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, User Questions , bseklecki@noc.cfi.pgh.pa.us Subject: Re: FreeBSD NFS client goes into infinite retry loop X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Mar 2010 20:29:49 -0000 On 03/19/10 11:05, Steve Polyack wrote: > On 03/19/10 09:23, Steve Polyack wrote: >> On 03/19/10 08:31, John Baldwin wrote: >>> On Friday 19 March 2010 7:34:23 am Steve Polyack wrote: >>>> Hi, we use a FreeBSD 8-STABLE (from shortly after release) system >>>> as an >>>> NFS server to provide user home directories which get mounted across a >>>> few machines (all 6.3-RELEASE). For the past few weeks we have been >>>> running into problems where one particular client will go into an >>>> infinite loop where it is repeatedly trying to write data which causes >>>> the NFS server to return "reply ok 40 write ERROR: Input/output error >>>> PRE: POST:". This retry loop can cause between 20mbps and 500mbps of >>>> constant traffic on our network, depending on the size of the data >>>> associated with the failed write. >>>> >>> Yes, your feeling is correct. This sort of race is inherent to NFS >>> if you do >>> not use some sort of locking protocol to resolve the race. The >>> infinite >>> retries sound like a client-side issue. Have you been able to try a >>> newer OS >>> version on a client to see if it still causes the same behavior? >>> >> I can't try a newer FBSD version on the client where we are seeing >> the problems, but I can recreate the problem fairly easily. Perhaps >> I'll try it with an 8.0 client. If I remember correctly, one of the >> strange things is that it doesn't seem to hit "critical mass" until a >> few hours after the operation first fails. I may be wrong, but I'll >> double check that when I check vs. 8.0-release. >> >> I forgot to add this in the first post, but these are all TCP NFS v3 >> mounts. >> >> Thanks for the response. > > Ok, so I'm still able to trigger what appears to be the same retry > loop with an 8.0-RELEASE nfsv3 client (going on 1.5 hours now): > $ cat nfs.sh > client#!/usr/local/bin/bash > for a in {1..15} ; do > sleep 1; > echo "$a$a$"; > done > client$ ./nfs.sh >~/output > > the on the server while the above is running: > server$ rm ~/output > > What happens is that you will see 3-4 of the same write attempts > happen per minute via tcpdump. Our previous logs show that this is > how it starts, and then ~4 hours later it begins to spiral out of > control, throwing out up to 3,000 of the same failed write requests > per second. To anyone who is interested: I did some poking around with DTrace, which led me to the nfsiod client code. In src/sys/nfsclient/nfs_nfsiod.c: } else { if (bp->b_iocmd == BIO_READ) (void) nfs_doio(bp->b_vp, bp, bp->b_rcred, NULL); else (void) nfs_doio(bp->b_vp, bp, bp->b_wcred, NULL); } These two calls to nfs_doio trash the return codes (which are errors cascading up from various other NFS write-related functions). I'm not entirely familiar with the way nfsiod works, but if nfs_doio() or other subsequent functions are supposed to be removing the current async NFS operation from a queue which nfsiod handles, they are not doing so when they encounter an error. They simply report the error back to the caller, who in this case is not even looking at the value. I've tested this by pushing the return code into a new int, errno, and adding: if (errno) { NFS_DPF(ASYNCIO, ("nfssvc_iod: iod %d nfs_doio returned errno: %d\n", myiod, errno)); } The result is that my problematic repeatable circumstance begins logging "nfssvc_iod: iod 0 nfs_doio returned errno: 5" (corresponding to NFSERR_INVAL?) for each repetition of the failed write. The only things triggering this are my failed writes. I can also see the nfsiod0 process waking up each iteration. Do we need some kind of "retry x times then abort" logic within nfsiod_iod(), or does this belong in the subsequent functions, such as nfs_doio()? I think it's best to avoid these sorts of infinite loops which have the potential to take out the system or overload the network due to dumb decisions made by unprivileged users.