From owner-freebsd-fs@FreeBSD.ORG Sat Mar 20 01:20:02 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D0AC9106566C; Sat, 20 Mar 2010 01:20:02 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 54A2D8FC0C; Sat, 20 Mar 2010 01:20:02 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAGO/o0uDaFvJ/2dsb2JhbACbPnO7OoR8BA X-IronPort-AV: E=Sophos;i="4.51,277,1267419600"; d="scan'208";a="69651639" Received: from ganges.cs.uoguelph.ca ([131.104.91.201]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 19 Mar 2010 21:20:01 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by ganges.cs.uoguelph.ca (Postfix) with ESMTP id C5821FB809B; Fri, 19 Mar 2010 21:20:00 -0400 (EDT) X-Virus-Scanned: amavisd-new at ganges.cs.uoguelph.ca Received: from ganges.cs.uoguelph.ca ([127.0.0.1]) by localhost (ganges.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CrkY7SyDEoUh; Fri, 19 Mar 2010 21:20:00 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by ganges.cs.uoguelph.ca (Postfix) with ESMTP id 5766FFB8066; Fri, 19 Mar 2010 21:20:00 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id o2K1WoZ23611; Fri, 19 Mar 2010 21:32:50 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Fri, 19 Mar 2010 21:32:50 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: Steve Polyack In-Reply-To: <4BA3DEBC.2000608@comcast.net> Message-ID: References: <4BA3613F.4070606@comcast.net> <201003190831.00950.jhb@freebsd.org> <4BA37AE9.4060806@comcast.net> <4BA392B1.4050107@comcast.net> <4BA3DEBC.2000608@comcast.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org, bseklecki@noc.cfi.pgh.pa.us, User Questions Subject: Re: FreeBSD NFS client goes into infinite retry loop X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Mar 2010 01:20:03 -0000 On Fri, 19 Mar 2010, Steve Polyack wrote: > > To anyone who is interested: I did some poking around with DTrace, which led > me to the nfsiod client code. > In src/sys/nfsclient/nfs_nfsiod.c: > } else { > if (bp->b_iocmd == BIO_READ) > (void) nfs_doio(bp->b_vp, bp, bp->b_rcred, NULL); > else > (void) nfs_doio(bp->b_vp, bp, bp->b_wcred, NULL); > } > If you look t nfs_doio(), it decides whether or not to mark the buffer invalid, based on the return value it gets. Some (EINTR, ETIMEDOUT, EIO) are not considered fatal, but the others are. (When the async I/O daemons call nfs_doio(), they are threads that couldn't care less if the underlying I/O op succeeded. The outcome of the I/O operation determines what nfs_doio() does with the buffer cache block.) > > The result is that my problematic repeatable circumstance begins logging > "nfssvc_iod: iod 0 nfs_doio returned errno: 5" (corresponding to > NFSERR_INVAL?) for each repetition of the failed write. The only things > triggering this are my failed writes. I can also see the nfsiod0 process > waking up each iteration. > Nope, errno 5 is EIO and that's where the problem is. I don't know why the server is returning EIO after the file has been deleted on the server (I assume you did that when running your little shell script?). > Do we need some kind of "retry x times then abort" logic within nfsiod_iod(), > or does this belong in the subsequent functions, such as nfs_doio()? I think > it's best to avoid these sorts of infinite loops which have the potential to > take out the system or overload the network due to dumb decisions made by > unprivileged users. > Nope, people don't like data not getting written back to a server when it is slow or temporarily network partitioned. The only thing that should stop a client from retrying a write back to the server is a fatal error from the server that says "this won't ever succeed". I think we need to figure out if the EIO (NFS3ERR_IO in wireshark) or if the server is sending NFS3ERR_STALE and the client is somehow munging that into EIO, causing the confusion. rick