Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 19 Mar 2010 21:32:50 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Steve Polyack <korvus@comcast.net>
Cc:        freebsd-fs@freebsd.org, bseklecki@noc.cfi.pgh.pa.us, User Questions <freebsd-questions@freebsd.org>
Subject:   Re: FreeBSD NFS client goes into infinite retry loop
Message-ID:  <Pine.GSO.4.63.1003192120470.17841@muncher.cs.uoguelph.ca>
In-Reply-To: <4BA3DEBC.2000608@comcast.net>
References:  <4BA3613F.4070606@comcast.net> <201003190831.00950.jhb@freebsd.org> <4BA37AE9.4060806@comcast.net> <4BA392B1.4050107@comcast.net> <4BA3DEBC.2000608@comcast.net>

next in thread | previous in thread | raw e-mail | index | archive | help


On Fri, 19 Mar 2010, Steve Polyack wrote:

>
> To anyone who is interested: I did some poking around with DTrace, which led 
> me to the nfsiod client code.
> In src/sys/nfsclient/nfs_nfsiod.c:
>        } else {
>            if (bp->b_iocmd == BIO_READ)
>                (void) nfs_doio(bp->b_vp, bp, bp->b_rcred, NULL);
>            else
>                (void) nfs_doio(bp->b_vp, bp, bp->b_wcred, NULL);
>        }
>

If you look t nfs_doio(), it decides whether or not to mark the buffer
invalid, based on the return value it gets. Some (EINTR, ETIMEDOUT, EIO)
are not considered fatal, but the others are. (When the async I/O
daemons call nfs_doio(), they are threads that couldn't care less if
the underlying I/O op succeeded. The outcome of the I/O operation
determines what nfs_doio() does with the buffer cache block.)

>
> The result is that my problematic repeatable circumstance begins logging 
> "nfssvc_iod: iod 0 nfs_doio returned errno: 5" (corresponding to 
> NFSERR_INVAL?) for each repetition of the failed write.  The only things 
> triggering this are my failed writes.  I can also see the nfsiod0 process 
> waking up each iteration.
>

Nope, errno 5 is EIO and that's where the problem is. I don't know why
the server is returning EIO after the file has been deleted on the
server (I assume you did that when running your little shell script?).


> Do we need some kind of "retry x times then abort" logic within nfsiod_iod(), 
> or does this belong in the subsequent functions, such as nfs_doio()?  I think 
> it's best to avoid these sorts of infinite loops which have the potential to 
> take out the system or overload the network due to dumb decisions made by 
> unprivileged users.
>
Nope, people don't like data not getting written back to a server when
it is slow or temporarily network partitioned. The only thing that should
stop a client from retrying a write back to the server is a fatal error
from the server that says "this won't ever succeed".

I think we need to figure out if the EIO (NFS3ERR_IO in wireshark) or
if the server is sending NFS3ERR_STALE and the client is somehow munging
that into EIO, causing the confusion.

rick




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.63.1003192120470.17841>