From owner-freebsd-fs@freebsd.org Wed Jul 29 11:35:28 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A12919AECB7 for ; Wed, 29 Jul 2015 11:35:28 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 365FD819 for ; Wed, 29 Jul 2015 11:35:27 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2CqBAB6ubhV/61jaINbg2lpBoMduGqCA4V3AoIPEgEBAQEBAQGBCoQjAQEBAwEjBFIFCwIBCA4KAgINGQICVwIEE4gmCA24fJYFAQEBAQEBAQECAQEBAQEBARcEgSKKLIQbIQkONAeCaYFDBYcXhTGIKIR6gmKGLYQgk0kCJoQZIjEBAYEFQYEEAQEB X-IronPort-AV: E=Sophos;i="5.15,570,1432612800"; d="scan'208";a="228894874" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-annu.net.uoguelph.ca with ESMTP; 29 Jul 2015 07:35:26 -0400 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id F2AC415F542; Wed, 29 Jul 2015 07:35:25 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id bvW2imZAWBJu; Wed, 29 Jul 2015 07:35:24 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id B1E1015F55D; Wed, 29 Jul 2015 07:35:24 -0400 (EDT) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id nZ2_QjLRKugM; Wed, 29 Jul 2015 07:35:24 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 7CEC215F542; Wed, 29 Jul 2015 07:35:24 -0400 (EDT) Date: Wed, 29 Jul 2015 07:35:24 -0400 (EDT) From: Rick Macklem To: Ahmed Kamal Cc: Graham Allan , Ahmed Kamal via freebsd-fs Message-ID: <1603742210.4824721.1438169724361.JavaMail.zimbra@uoguelph.ca> In-Reply-To: References: <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <55B12EB7.6030607@physics.umn.edu> <1935759160.2320694.1437688383362.JavaMail.zimbra@uoguelph.ca> <576106597.2326662.1437688749018.JavaMail.zimbra@uoguelph.ca> <1089316279.4709692.1438126760802.JavaMail.zimbra@uoguelph.ca> Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.12] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF34 (Win)/8.0.9_GA_6191) Thread-Topic: Linux NFSv4 clients are getting (bad sequence-id error!) Thread-Index: KB+lVMviB2CyfbdENQ+YzLz+Rlbo0A== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Jul 2015 11:35:28 -0000 Ahmed Kamal wrote: > hmm, if I understand you correctly, this time_seconds value is the number > of seconds till the box booted ? No, it is the number of seconds when the box booted. Once, it was supposed to be the number of seconds since Jan. 1, 1970, but I don't if that is still the case. (For my i386, it is about 1.4billion when it boots, so I'd guess they still use Jan. 1, 1970. 3600*24*365*45 = 1419120000. Yea, I didn't bother with leap years, etc.) Now, I don't know if the clock could somehow set it to a value > 4billion when the nfsd starts (it copies time_seconds to nfsrvboottime as it starts up), but the current clock code is pretty convoluted stuff, so?? rick ps: From the NFSv4 server's point of view, it only needs a number that is unique and changes every time the server reboots. As such, using the low order 32bits of it would be sufficient, even if it exceeds 4billion. However, the code incorrectly assumes it won't exceed 4*1024*1024*1024 - 1 unless you apply the patch. > If so, I guess this is not really the > cause of what we're seeing as the box is only up for 8 days > > bsd# uptime > 11:28AM up 8 days, 6:20, 6 users, load averages: 0.94, 0.91, 0.84 > > The NFS client box's uptime is > linux# uptime > 11:31:39 up 8 days, 5:51, 11 users, load average: 87.74, 87.43, 87.35 > > and yes the huge load is most likely due to this NFS bug > > On Wed, Jul 29, 2015 at 1:39 AM, Rick Macklem wrote: > > > Ahmed Kamal wrote: > > > Hi again Rick, > > > > > > Seems that I'm still being unlucky with nfs :/ I caught one of the newly > > > installed RHEL6 boxes having high CPU usage, and bombarding the BSD NFS > > box > > > with 10Mbps traffic .. I caught a tcpdump as you mentioned .. You can > > > download it here: > > > > > > > > https://dl.dropboxusercontent.com/u/51939288/nfs41-high-client-cpu.pcap.bz2 > > > > > Ok, the packet trace suggests that the NFSv4 server is broken (it is > > replying > > with NFS4ERR_STALE_CLIENTID for a recently generated ClientID). > > Now, I can't be sure, but the only explanation I can come up with is... > > - For some arches (I only have i386, so I wouldn't have seen this during > > testing), > > time_t is 64bits (uint64_t). > > --> If time_seconds somehow doesn't fit in the low order 32bits, then > > the code > > would be busted for these arches because nfsrvboottime is set to > > time_seconds > > when the server is started and then there are comparisons like: > > if (nfsrvboottime != clientid.lval[0]) > > return (NFSERR_STALECLIENTID); > > /* where clientid.lval[0] is a uint32_t */ > > Anyhow, if this is what is happening, the attached simple patch should fix > > it. > > (I don't know how time_seconds would exceed 4billion, but the clock code is > > pretty convoluted, so I can't say if it can possibly happen?) > > > > rick > > ps: Hmm, on i386 time_seconds ends up at 1438126486, so maybe it can exceed > > 4*1024*1024*1024 - 1 on amd64? > > > > > I didn't restart the client yet .. so if you catch me in the next few > > hours > > > and want me to run any diagnostics, let me know. Thanks a lot all for > > > helping > > > > > > On Thu, Jul 23, 2015 at 11:59 PM, Rick Macklem > > wrote: > > > > > > > Ahmed Kamal wrote: > > > > > Can you please let me know the ultimate packet trace command I'd > > need to > > > > > run in case of any nfs4 troubles .. I guess this should be > > comprehensive > > > > > even at the expense of a larger output size (which we can trim > > later).. > > > > > Thanks a lot for the help! > > > > > > > > > tcpdump -s 0 -w .pcap host > > > > ( refers to a file name you choose and refers > > to > > > > the host name of a client generating traffic.) > > > > --> But you won't be able to allow this to run for long during the > > storm > > > > or the > > > > file will be huge. > > > > > > > > Then you look at .pcap in wireshark, which knows NFS. > > > > > > > > rick > > > > > > > > > On Thu, Jul 23, 2015 at 11:53 PM, Rick Macklem > > > > > > wrote: > > > > > > > > > > > Graham Allan wrote: > > > > > > > For our part, the user whose code triggered the pathological > > > > behaviour > > > > > > > on SL5 reran it on SL6 without incident - I still see lots of > > > > > > > sequence-id errors in the logs, but nothing bad happened. > > > > > > > > > > > > > > I'd still like to ask them to rerun again on SL5 to see if the > > > > "accept > > > > > > > skipped seqid" patch had any effect, though I think we expect > > not. > > > > Maybe > > > > > > > it would be nice if I could get set up to capture rolling > > tcpdumps of > > > > > > > the nfs traffic before they run that though... > > > > > > > > > > > > > > Graham > > > > > > > > > > > > > > On 7/20/2015 10:26 PM, Ahmed Kamal wrote: > > > > > > > > Hi folks, > > > > > > > > > > > > > > > > I've upgraded a test client to rhel6 today, and I'll keep an > > eye > > > > on it > > > > > > > > to see what happens. > > > > > > > > > > > > > > > > During the process, I made the (I guess mistake) of zfs send | > > > > recv to > > > > > > a > > > > > > > > locally attached usb disk for backup purposes .. long story > > short, > > > > > > > > sharenfs property on the received filesystem was causing some > > > > > > nfs/mountd > > > > > > > > errors in logs .. I wasn't too happy with what I got .. I > > > > destroyed the > > > > > > > > backup datasets and the whole pool eventually .. and then > > rebooted > > > > the > > > > > > > > whole nas box .. After reboot my logs are still flooded with > > > > > > > > > > > > > > > > Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session > > > > > > > > Jul 21 05:13:07 nas last message repeated 7536 times > > > > > > > > Jul 21 05:15:08 nas last message repeated 29664 times > > > > > > > > > > > > > > > > Not sure what that means .. or how it can be stopped .. Anyway, > > > > will > > > > > > > > keep you posted on progress. > > > > > > > > > > > > > Oh, I didn't see the part about "reboot" before. Unfortunately, it > > > > sounds > > > > > > like the > > > > > > client isn't recovering after the session is lost. When the server > > > > > > reboots, the > > > > > > client(s) will get NFS4ERR_BAD_SESSION errors back because the > > server > > > > > > reboot has > > > > > > deleted all sessions. The NFS4ERR_BAD_SESSION should trigger state > > > > > > recovery on the client. > > > > > > (It doesn't sound like the clients went into recovery, starting > > with a > > > > > > Create_session > > > > > > operation, but without a packet trace, I can't be sure?) > > > > > > > > > > > > rick > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > ------------------------------------------------------------------------- > > > > > > > Graham Allan - gta@umn.edu - allan@physics.umn.edu > > > > > > > School of Physics and Astronomy - University of Minnesota > > > > > > > > > > > > > ------------------------------------------------------------------------- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >