From owner-freebsd-fs@freebsd.org Sat Feb 20 22:58:27 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D5644AAF0C2 for ; Sat, 20 Feb 2016 22:58:27 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 56668D37; Sat, 20 Feb 2016 22:58:26 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) IronPort-PHdr: 9a23:gymhrBaj0uW/ZIFVobs4jLD/LSx+4OfEezUN459isYplN5qZpc+8bnLW6fgltlLVR4KTs6sC0LqJ9f26EjVeu96oizMrTt9lb1c9k8IYnggtUoauKHbQC7rUVRE8B9lIT1R//nu2YgB/Ecf6YEDO8DXptWZBUiv2OQc9HOnpAIma153xjLDtvcCJKFwW3nKUWvBbElaflU3prM4YgI9veO4a6yDihT92QdlQ3n5iPlmJnhzxtY+a9Z9n9DlM6bp6r5YTGfayQ6NtSbFGJBo8Pm0f3+GtsgPMHiWV4X5JaGQdkVJtCgPG6Bz/FsPrtyLxte5w3QGHOsLrQLQsWXKp5vE4G1fTlC4bOmthoynsgctqgfcDrQ== X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DOAQCI7shW/61jaINehH+4NYITAQ2BaIYNAoFnFAEBAQEBAQEBYyeCLYIVAQEEIwRSEAIBCBgCAg0ZAgJXAogxrGiONQEBAQEGAgEde4UXgXWCRoQFARABBhaDAoE6BYdShkw9iCycUY5HAh4BAUKCAhqBZh6HaQgXHX0BAQE X-IronPort-AV: E=Sophos;i="5.22,478,1449550800"; d="scan'208";a="267131325" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 20 Feb 2016 17:58:25 -0500 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 5348415F5B4; Sat, 20 Feb 2016 17:58:25 -0500 (EST) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id Yta-nqU_U3Ga; Sat, 20 Feb 2016 17:58:24 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 80B7015F5B8; Sat, 20 Feb 2016 17:58:24 -0500 (EST) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id gE0Yjo_hm-Ii; Sat, 20 Feb 2016 17:58:24 -0500 (EST) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 64CF815F5B4; Sat, 20 Feb 2016 17:58:24 -0500 (EST) Date: Sat, 20 Feb 2016 17:58:24 -0500 (EST) From: Rick Macklem To: lev@FreeBSD.org Cc: freebsd-fs@freebsd.org Message-ID: <353969052.570755.1456009104365.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <56C84922.8050803@FreeBSD.org> References: <56C752CD.4090203@FreeBSD.org> <1022369130.4303814.1455930123897.JavaMail.zimbra@uoguelph.ca> <56C84922.8050803@FreeBSD.org> Subject: Re: Panic in NFS client on CURRENT MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.10] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF44 (Win)/8.0.9_GA_6191) Thread-Topic: Panic in NFS client on CURRENT Thread-Index: JDQ9VGQ3TCkbUOS8TmKxloHynFS2UA== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Feb 2016 22:58:27 -0000 Lev wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA512 > > On 20.02.2016 04:02, Rick Macklem wrote: > > >> Basically, I'm asking if there was a server reboot or nfsd thread > >> restart or some kind of network partition that would separate > >> some client(s) from the server. OR Panic occurred during normal > >> operation. > There was NO server reboot/restarts. MAYBE, this VM (where client > runs) lost network connectivity for several seconds, but server itself > was NOT stopped, restarted or rebooted. > Well, the stack trace you put in the PR showed a recovery from an expired lease. This should only occur when the client is partitioned from the server for more than a lease duration (120sec on FreeBSD). Even a 120+sec network partitioning won't cause an expired recovery unless a conflicting open/lock request is made for a FreeBSD server. (A Linux server will NFS4ERR_EXPIRED as soon as the lease has exceeded without a renewal and Linux uses a lease of 60sec, so it is easier to reproduce with a Linux NFSv4 server if you happen to have one.) --> So I don't know why it would go into a lease expired recovery. (A network partitioning of a few seconds shouldn't do it.) I think the only way to know what caused this would be to have a packet capture that started before the problem occurred. (Maybe your network setup is somehow directing some RPC messages to the wrong place or they`re being blocked by some firewall setup.) If you have an NFSv4.0 mount you should see a Renew RPC about once per minute (half a lease duration) which keeps the lease from expiring. For NFSv4.1, it is an RPC with just a Sequence operation which should have the same effect. Reproducing this shouldn't be easy (which is a good thing;-). It has been a while, but it should take something like: - network partition a client from the server while it has a file open, for several minutes. (It might also need to have a byte range lock on the file, I can`t remember for sure if just an open is sufficient.) - Try and open the same file on another client (and get a conflicting byte range lock maybe). --> This should result in a reply to the client of NFS4ERR_EXPIRED. If you look at a packet trace in wireshark, it is a server reply of NFS4ERR_EXPIRED that tells the client to go into this recovery cycle. Unfortunately I am away from home until April, so I don't have access to wireshark until then. (I will try and reproduce a NFSERR_EXPIRED failure with the laptops I have with me, but I'm not sure if I can pull it off.) Btw, this type of recovery isn't specified by the RFC and can only recover opens and not byte range locks. Fyi, the recovery from a server reboot (or reload of nfsd.ko in a FreeBSD server) is specified by the RFC and can recover opens and locks. It starts with the server replying NFSERR_STALECLIENTID or NFSERR_STALESTATEID. Good luck with it, rick ps: I`ll email if I reproduce the NFSERR_EXPIRED and find any problem beyond the panic with fix already posted. > - -- > // Lev Serebryakov > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2 > > iQJ7BAEBCgBmBQJWyEkiXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w > ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRGOTZEMUNBMEI1RjQzMThCNjc0QjMzMEFF > QUIwM0M1OEJGREM0NzhGAAoJEOqwPFi/3EePfCsP9AuK494J8cUft0MmvAly7yVw > iF0R0joxwttp9t6qydMjlQfmj6yoX+UACFWWRBZGgGrS8K7PcGSsGFl5s/Bt1ylL > lw3GDr7GsVNDOhG4ypwsiqI2Wq/PzFhBMUpuUq6A+kdqZVH1ApQFyDKrdWbvDQLx > 9Dm6vvW/fx6W1PgJp4i2B8zSf4vz7s91JyPMXnN9IQNG/1H9WERudzx/2kp1ws9y > wYCXVmsidMO9j0DQ4eVVSM2vSfc6VKgyjWhVeHguRXc5F3L5VGuoSXyzCkceC66r > t+8MDYhrsm00hrkZyTO6s1KcC8OKrgZBr9p0UIM1oMaqo02DyWp7KfM1nDMW9FI6 > IXsLaizPnnf7u+gGI2SllNXMaPvcREAxrnQDHKTdifKkpXrSroYYfJGmxAsRidmY > 8nwZ1bytGeSHlTYSq1XTJLCWsSoM/o0Vgl+bGXvajWFkFT/GRGb5akWUBZhkzo7n > TTpm0zrLuSvqWwRvqisoAuKW7QmCF2E0ei0E01TA3DDpF31dLOCApMq4t/UooT5h > w25dTRpc+WPUEwKXSzZ90kPHmmoRz7dn8y6Oeb681GtqoauMBgVUuWhI7+sobRBy > gcyIIpPB1Y0vteslzd5JDRUWcDUGg23fqRgax+J+motaNEXus2P6RxZTkq3DmOgO > qpvv/BwLVn++rVnTNWY= > =hihT > -----END PGP SIGNATURE----- >