From owner-freebsd-hackers Mon Dec 17 13:46:53 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from mail-out1.apple.com (mail-out1.apple.com [17.254.0.52]) by hub.freebsd.org (Postfix) with ESMTP id DFF3737B405; Mon, 17 Dec 2001 13:46:49 -0800 (PST) Received: from mailgate2.apple.com (A17-129-100-225.apple.com [17.129.100.225]) by mail-out1.apple.com (8.11.3/8.11.3) with ESMTP id fBHLkmu29351; Mon, 17 Dec 2001 13:46:49 -0800 (PST) Received: from scv2.apple.com (scv2.apple.com) by mailgate2.apple.com (Content Technologies SMTPRS 4.2.1) with ESMTP id ; Mon, 17 Dec 2001 13:46:40 -0800 Received: from [17.219.180.26] (minshallidsl1.apple.com [17.219.180.26]) by scv2.apple.com (8.11.3/8.11.3) with ESMTP id fBHLkHB14100; Mon, 17 Dec 2001 13:46:17 -0800 (PST) X-Sender: conrad@mail.apple.com Message-Id: In-Reply-To: <200112162024.fBGKOSt22277@apollo.backplane.com> References: <58885.1008217148@winston.freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Date: Mon, 17 Dec 2001 13:45:18 -0800 To: Matthew Dillon From: Conrad Minshall Subject: Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step ) Cc: Jordan Hubbard , Peter Wemm , Mike Smith , hackers@freebsd.org, msmith@mass.dis.org Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG At 12:24 PM -0800 12/16/01, Matthew Dillon wrote: > program runs fine in an overnight test. We still have a known issue > with out-of-order operations from nfsiod's that apparently may come > up after a week or so of testing. I asked Jordan to try to track down > the NeXT guy who fixed that one in the old NFS stack. This bug showed up recently here with fsx testing. I seem to have fixed it last week in MacOS X. The diffs were widespread but the idea was simple enough so a few code snippets should suffice: nfs_request gets a new argument (u_int64_t *xidp) and fills it in here: m = nfsm_rpchead(cred, nmp->nm_flag, procnum, auth_type, auth_len, auth_str, verf_len, verf_str, mrest, mrest_len, &mheadend, &xid); if (xidp) *xidp = xid + ((u_int64_t)nfs_xidwrap << 32); nfsm_rpchead bumps nfs_xidwrap when avoiding a zero xid. Callers of nfs_request take the returned xid and pass it via the macros to nfs_loadattrcache, from which the following code is snipped: if (*xidp < np->n_xid) { /* * We have already updated attributes with a response from * a later request. The attributes we have here are probably * stale so we drop them (just return). However, our * out-of-order receipt could be correct - if the requests were * processed out of order at the server. Given the uncertainty * we invalidate our cached attributes. *xidp is zeroed here * to indicate the attributes were dropped - only getattr * cares - it needs to retry the rpc. */ np->n_attrstamp = 0; *xidp = 0; return (0); } Further down in nfs_loadattrcache: np->n_xid = *xidp; Note xids are kept in a 64 bit form so relative comparison won't fail in the unlikely case that xids wrap around zero. Here's the change in nfs_getattr. I don't expect to ever see the panic. avoidfloods = 0; tryagain: nfsstats.rpccnt[NFSPROC_GETATTR]++; nfsm_reqhead(vp, NFSPROC_GETATTR, NFSX_FH(v3)); nfsm_fhtom(vp, v3); nfsm_request(vp, NFSPROC_GETATTR, ap->a_p, ap->a_cred, &xid); if (!error) { nfsm_loadattr(vp, ap->a_vap, &xid); if (!xid) { /* out-of-order rpc - attributes were dropped */ m_freem(mrep); if (avoidfloods++ < 100) goto tryagain; /* * avoidfloods>1 is bizarre. at 100 pull the plug */ panic("nfs_getattr: getattr flood\n"); } -- Conrad Minshall, conrad@apple.com, 408 974-2749 Apple Computer, Mac OS X Core Operating Systems To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message