From owner-freebsd-fs@freebsd.org Sun Nov 13 20:22:41 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5B74EC3FF68 for ; Sun, 13 Nov 2016 20:22:41 +0000 (UTC) (envelope-from 010001585f5b0e9e-06fea713-b61d-40d0-8bac-b7aa01d6a2e2-000000@amazonses.com) Received: from a8-237.smtp-out.amazonses.com (a8-237.smtp-out.amazonses.com [54.240.8.237]) (using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 253D214DA for ; Sun, 13 Nov 2016 20:22:40 +0000 (UTC) (envelope-from 010001585f5b0e9e-06fea713-b61d-40d0-8bac-b7aa01d6a2e2-000000@amazonses.com) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/simple; s=vnqrkfnvu6csdl6mwgk5t6ix3nnepx57; d=tarsnap.com; t=1479068553; h=To:From:Subject:Message-ID:Date:MIME-Version:Content-Type:Content-Transfer-Encoding; bh=VlkCcWhMmZELV/WM//zYLBBEZylDA95LL0df2kEFWQE=; b=XnCmu74jqt9beBvB06lzgTPMdrkXFCPSYvALfnT3EtoZfd3+MSRECAgsPWu7vKg7 XTabdiolAttt+p0fDKAX/2cvJIKAvApNOsRGJYa+DXFO7W4D8IoyznWuLkEohT94r7Z gy7Y9FH36TjRodXgSuX+JBDRmto9PSrw6Q16jBXQ= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/simple; s=6gbrjpgwjskckoa6a5zn6fwqkn67xbtw; d=amazonses.com; t=1479068553; h=To:From:Subject:Message-ID:Date:MIME-Version:Content-Type:Content-Transfer-Encoding:Feedback-ID; bh=VlkCcWhMmZELV/WM//zYLBBEZylDA95LL0df2kEFWQE=; b=LsdhpBACLB1yiEEabaC2KflQ4NUostD+AclPp9FAR8UM3q1YMf7nh/FVoKyzPDUi zIuKt6iANtHqUct3cqeojJfHgwIVTHM8oCmGy1cWqT9KOcPoqjyT4D56kByEyl39L8b J8L3stYYrsQ8UyzawJWTNjQ6eNXARmgbVCSIvTtM= To: "freebsd-fs@freebsd.org" From: Colin Percival Subject: NFSv4.1 hanging with NFSERR_BADSESSION Message-ID: <010001585f5b0e9e-06fea713-b61d-40d0-8bac-b7aa01d6a2e2-000000@email.amazonses.com> Date: Sun, 13 Nov 2016 20:22:33 +0000 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-SES-Outgoing: 2016.11.13-54.240.8.237 Feedback-ID: 1.us-east-1.Lv9FVjaNvvR5llaqfLoOVbo2VxOELl7cjN0AOyXnPlk=:AmazonSES X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Nov 2016 20:22:41 -0000 Hi all, I'm trying to get FreeBSD to talk nicely to Amazon Elastic File System, which is some flavour of NFSv4.1. (This is what prompted the earlier discussion about umount and UDP RPCs.) The latest problem I've run into is that partway through a buildworld I'll get 32 I/O errors along with 'nfsv4 recover err returned 10052' from the kernel, and thereafter anything touching that NFS mount will get stuck in nfsclseq. (And kernels and filesystems being the wonderful things they are, the processes which get stuck this way are completely uninterruptible.) My guess is that this is a bug along the lines of "there's 32 slots for NFS protocol requests and when an error occurs they're not getting marked as available, resulting in everything piling up waiting for a free slot which will never arrive"... but I don't know NFS or our client code nearly well enough to see where this might be. Any ideas? -- Colin Percival Security Officer Emeritus, FreeBSD | The power to serve Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid