From owner-freebsd-fs@freebsd.org Wed Jul 1 23:20:08 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5B8129920D7 for ; Wed, 1 Jul 2015 23:20:08 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 0A0AE18D5 for ; Wed, 1 Jul 2015 23:20:07 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DhBABGdZRV/61jaINbg2VfBoMZuiGBZAqFLkoCggYSAQEBAQEBAYEKhCIBAQECAQEBAQEgBCcgCwULAgEIDgoCAg0ZAgInAQkmAgQIBwQBGgIEiAYIDbYBlxEBAQEBAQEEAQEBAQEBAQEagSGKKYQ0AQEFFzQHgmiBQwWMFod6hF2ENoQIRIZdj2cCJoQWIjEBBoEFOoECAQEB X-IronPort-AV: E=Sophos;i="5.15,389,1432612800"; d="scan'208";a="223364281" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-annu.net.uoguelph.ca with ESMTP; 01 Jul 2015 19:19:46 -0400 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 3D08115F533; Wed, 1 Jul 2015 19:19:46 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id YxGeaC-6_u76; Wed, 1 Jul 2015 19:19:45 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 6ED9815F54D; Wed, 1 Jul 2015 19:19:45 -0400 (EDT) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id bScfgRANA9oB; Wed, 1 Jul 2015 19:19:45 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 5363315F533; Wed, 1 Jul 2015 19:19:45 -0400 (EDT) Date: Wed, 1 Jul 2015 19:19:45 -0400 (EDT) From: Rick Macklem To: Ahmed Kamal Cc: freebsd-fs@freebsd.org Message-ID: <2124485979.2769788.1435792785282.JavaMail.zimbra@uoguelph.ca> In-Reply-To: References: Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.12] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF34 (Win)/8.0.9_GA_6191) Thread-Topic: Linux NFSv4 clients are getting (bad sequence-id error!) Thread-Index: p02pgg/QbRC4TZXd26n3g9GKfmu8sQ== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Jul 2015 23:20:08 -0000 Ahmed Kamal wrote: > Hi all, > > I'm a refugee from linux land. I just set up my first freebsd 10.1 zfs box, > sharing /home over nfs. Since every home directory is its own zfs dataset, > I chose to use nfsv4 to enable recursively sharing/mounting any directory > under /home (I understand nfs4 is a must in this scenario!) > > I'm able to mount form linux (rhel5 latest kernel) successfully. Users are > working fine. However every now and then a user screams that his session is > frozen. Usually the processes are stuck in nfs_wait or rpc_* state. I tried > using a much newer linux kernel (3.2 however it still faced the same > problem). The errors in Linux log files are mostly: > Jul 1 17:41:47 mammoth kernel: NFS: v4 server nas returned a *bad > sequence-id error*! > Jul 1 17:52:32 mammoth kernel: nfs4_reclaim_locks: unhandled error -11. > Zeroing state > Jul 1 17:52:32 mammoth kernel: nfs4_reclaim_open_state: Lock reclaim > failed! > > My search led me to (https://access.redhat.com/solutions/1328073) a > detailed analysis of the issue, which you can read over here > https://dl.dropboxusercontent.com/u/51939288/nfs4-bad-seq.pdf .. NetApp > confirmed this was a bug for them (I'm wondering if this is still in > FreeBSD?!) > Well, the Netapp NFS server code is proprietary to them and has no commonality with the FreeBSD code, so it seems unlikely that they will have the same bug. > PS: Right before sending this, I saw dmesg on the freebsd box advising > increasing vfs.nfsd.tcphighwater .. So I up'ed that to 64000. I also up'ed > the number of nfs server threads (-t) from 10 to 60 (we're roughly 40 linux > machines) > This indicates that the server's DRC has gotten constipated and this could cause issues for NFSv4.0. Things to try: - the above increase of vfs.nfsd.tcphighwater might do the trick. --> You can also try decreasing vfs.nfsd.tcpcachetimeo. If you can't find values of these that avoid the constipation, you can disable it by setting vfs.nfsd.cachetcp to 0. Alternately, go to the Linux client and see if the mount is using minorversion 0 or 1. (I think "nfsstat -m" on the client will do that.) Then use the minorversion= option to force it to use the other minorversion (ie. if its 0, force it to 1 or vice versa) Since NFSv4.1 doesn't use the DRC, I'd guess these are NFSv4.0 mounts and using NFSv4.1 would avoid any DRC related issues. Good luck with it, rick > Any advice is most appreciated! > > Thanks > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >