From owner-freebsd-questions@FreeBSD.ORG Sat Mar 20 02:41:03 2010 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DD526106564A for ; Sat, 20 Mar 2010 02:41:02 +0000 (UTC) (envelope-from korvus@comcast.net) Received: from qmta02.westchester.pa.mail.comcast.net (qmta02.westchester.pa.mail.comcast.net [76.96.62.24]) by mx1.freebsd.org (Postfix) with ESMTP id 871C88FC21 for ; Sat, 20 Mar 2010 02:41:02 +0000 (UTC) Received: from omta04.westchester.pa.mail.comcast.net ([76.96.62.35]) by qmta02.westchester.pa.mail.comcast.net with comcast id vEG51d0060ldTLk52ETniK; Sat, 20 Mar 2010 02:27:47 +0000 Received: from [10.0.0.51] ([71.199.122.142]) by omta04.westchester.pa.mail.comcast.net with comcast id vETm1d00334Sj4f3QETn5K; Sat, 20 Mar 2010 02:27:47 +0000 Message-ID: <4BA432C8.4040707@comcast.net> Date: Fri, 19 Mar 2010 22:28:24 -0400 From: Steve Polyack User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.1.8) Gecko/20100227 Lightning/1.0b1 Thunderbird/3.0.3 MIME-Version: 1.0 To: Rick Macklem References: <4BA3613F.4070606@comcast.net> <201003190831.00950.jhb@freebsd.org> <4BA37AE9.4060806@comcast.net> <4BA392B1.4050107@comcast.net> <4BA3DEBC.2000608@comcast.net> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, bseklecki@noc.cfi.pgh.pa.us, User Questions , John Baldwin Subject: Re: FreeBSD NFS client goes into infinite retry loop X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Mar 2010 02:41:03 -0000 On 3/19/2010 9:32 PM, Rick Macklem wrote: > > On Fri, 19 Mar 2010, Steve Polyack wrote: > >> >> To anyone who is interested: I did some poking around with DTrace, >> which led me to the nfsiod client code. >> In src/sys/nfsclient/nfs_nfsiod.c: >> } else { >> if (bp->b_iocmd == BIO_READ) >> (void) nfs_doio(bp->b_vp, bp, bp->b_rcred, NULL); >> else >> (void) nfs_doio(bp->b_vp, bp, bp->b_wcred, NULL); >> } >> > > If you look t nfs_doio(), it decides whether or not to mark the buffer > invalid, based on the return value it gets. Some (EINTR, ETIMEDOUT, EIO) > are not considered fatal, but the others are. (When the async I/O > daemons call nfs_doio(), they are threads that couldn't care less if > the underlying I/O op succeeded. The outcome of the I/O operation > determines what nfs_doio() does with the buffer cache block.) I was looking at this and noticed the above after my last post. >> >> The result is that my problematic repeatable circumstance begins >> logging "nfssvc_iod: iod 0 nfs_doio returned errno: 5" (corresponding >> to NFSERR_INVAL?) for each repetition of the failed write. The only >> things triggering this are my failed writes. I can also see the >> nfsiod0 process waking up each iteration. >> > > Nope, errno 5 is EIO and that's where the problem is. I don't know why > the server is returning EIO after the file has been deleted on the > server (I assume you did that when running your little shell script?). Yes, while running the simple shell script I simply deleted the file on the NFS server itself. >> Do we need some kind of "retry x times then abort" logic within >> nfsiod_iod(), or does this belong in the subsequent functions, such >> as nfs_doio()? I think it's best to avoid these sorts of infinite >> loops which have the potential to take out the system or overload the >> network due to dumb decisions made by unprivileged users. >> > Nope, people don't like data not getting written back to a server when > it is slow or temporarily network partitioned. The only thing that should > stop a client from retrying a write back to the server is a fatal error > from the server that says "this won't ever succeed". > > I think we need to figure out if the EIO (NFS3ERR_IO in wireshark) or > if the server is sending NFS3ERR_STALE and the client is somehow munging > that into EIO, causing the confusion. This makes sense. According to wireshark, the server is indeed transmitting "Status: NFS3ERR_IO (5)". Perhaps this should be STALE instead; it sounds more correct than marking it a general IO error. Also, the NFS server is serving its share off of a ZFS filesystem, if it makes any difference. I suppose ZFS could be talking to the NFS server threads with some mismatched language, but I doubt it. Thanks for the informative response, Steve