Date: Mon, 17 Dec 2001 13:45:18 -0800 From: Conrad Minshall <conrad@apple.com> To: Matthew Dillon <dillon@apollo.backplane.com> Cc: Jordan Hubbard <jkh@winston.freebsd.org>, Peter Wemm <peter@wemm.org>, Mike Smith <msmith@freebsd.org>, hackers@freebsd.org, msmith@mass.dis.org Subject: Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step ) Message-ID: <l03130307b8440fd3aac2@[17.219.180.26]> In-Reply-To: <200112162024.fBGKOSt22277@apollo.backplane.com> References: <58885.1008217148@winston.freebsd.org> <l03130304b8413ae1338b@[17.219.180.26]> <l03130302b841fdae1ebc@[17.219.180.26]>
next in thread | previous in thread | raw e-mail | index | archive | help
At 12:24 PM -0800 12/16/01, Matthew Dillon wrote: > program runs fine in an overnight test. We still have a known issue > with out-of-order operations from nfsiod's that apparently may come > up after a week or so of testing. I asked Jordan to try to track down > the NeXT guy who fixed that one in the old NFS stack. This bug showed up recently here with fsx testing. I seem to have fixed it last week in MacOS X. The diffs were widespread but the idea was simple enough so a few code snippets should suffice: nfs_request gets a new argument (u_int64_t *xidp) and fills it in here: m = nfsm_rpchead(cred, nmp->nm_flag, procnum, auth_type, auth_len, auth_str, verf_len, verf_str, mrest, mrest_len, &mheadend, &xid); if (xidp) *xidp = xid + ((u_int64_t)nfs_xidwrap << 32); nfsm_rpchead bumps nfs_xidwrap when avoiding a zero xid. Callers of nfs_request take the returned xid and pass it via the macros to nfs_loadattrcache, from which the following code is snipped: if (*xidp < np->n_xid) { /* * We have already updated attributes with a response from * a later request. The attributes we have here are probably * stale so we drop them (just return). However, our * out-of-order receipt could be correct - if the requests were * processed out of order at the server. Given the uncertainty * we invalidate our cached attributes. *xidp is zeroed here * to indicate the attributes were dropped - only getattr * cares - it needs to retry the rpc. */ np->n_attrstamp = 0; *xidp = 0; return (0); } Further down in nfs_loadattrcache: np->n_xid = *xidp; Note xids are kept in a 64 bit form so relative comparison won't fail in the unlikely case that xids wrap around zero. Here's the change in nfs_getattr. I don't expect to ever see the panic. avoidfloods = 0; tryagain: nfsstats.rpccnt[NFSPROC_GETATTR]++; nfsm_reqhead(vp, NFSPROC_GETATTR, NFSX_FH(v3)); nfsm_fhtom(vp, v3); nfsm_request(vp, NFSPROC_GETATTR, ap->a_p, ap->a_cred, &xid); if (!error) { nfsm_loadattr(vp, ap->a_vap, &xid); if (!xid) { /* out-of-order rpc - attributes were dropped */ m_freem(mrep); if (avoidfloods++ < 100) goto tryagain; /* * avoidfloods>1 is bizarre. at 100 pull the plug */ panic("nfs_getattr: getattr flood\n"); } -- Conrad Minshall, conrad@apple.com, 408 974-2749 Apple Computer, Mac OS X Core Operating Systems To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?l03130307b8440fd3aac2>