From owner-freebsd-current@freebsd.org  Mon Jun 18 01:48:07 2018
Return-Path: <owner-freebsd-current@freebsd.org>
Delivered-To: freebsd-current@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E80BE1011C41
 for <freebsd-current@mailman.ysv.freebsd.org>;
 Mon, 18 Jun 2018 01:48:06 +0000 (UTC) (envelope-from kaduk@mit.edu)
Received: from dmz-mailsec-scanner-6.mit.edu (dmz-mailsec-scanner-6.mit.edu
 [18.7.68.35])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 903ED68930
 for <freebsd-current@freebsd.org>; Mon, 18 Jun 2018 01:48:06 +0000 (UTC)
 (envelope-from kaduk@mit.edu)
X-AuditID: 12074423-e29ff70000003bcd-6f-5b270e24b0c3
Received: from mailhub-auth-4.mit.edu ( [18.7.62.39])
 (using TLS with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by dmz-mailsec-scanner-6.mit.edu (Symantec Messaging Gateway) with SMTP id
 73.04.15309.42E072B5; Sun, 17 Jun 2018 21:43:00 -0400 (EDT)
Received: from outgoing.mit.edu (OUTGOING-AUTH-1.MIT.EDU [18.9.28.11])
 by mailhub-auth-4.mit.edu (8.13.8/8.9.2) with ESMTP id w5I1gx79009117;
 Sun, 17 Jun 2018 21:42:59 -0400
Received: from kduck.kaduk.org (24-107-191-124.dhcp.stls.mo.charter.com
 [24.107.191.124]) (authenticated bits=56)
 (User authenticated as kaduk@ATHENA.MIT.EDU)
 by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id w5I1gtHs001308
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT);
 Sun, 17 Jun 2018 21:42:57 -0400
Date: Sun, 17 Jun 2018 20:42:55 -0500
From: Benjamin Kaduk <kaduk@mit.edu>
To: Rick Macklem <rmacklem@uoguelph.ca>
Cc: "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>,
 "andreas.nagy@frequentis.com" <andreas.nagy@frequentis.com>
Subject: Re: ESXi NFSv4.1 client id is nasty
Message-ID: <20180618014254.GC64971@kduck.kaduk.org>
References: <YTOPR0101MB0953E687D013E2E97873061ADD720@YTOPR0101MB0953.CANPRD01.PROD.OUTLOOK.COM>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <YTOPR0101MB0953E687D013E2E97873061ADD720@YTOPR0101MB0953.CANPRD01.PROD.OUTLOOK.COM>
User-Agent: Mutt/1.9.1 (2017-09-22)
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrGIsWRmVeSWpSXmKPExsUixG6nrqvCpx5tsOi3vMXJGatYLea8+cBk
 8XDZNSYHZo8Zn+azeDTe7mDz+L15L1MAcxSXTUpqTmZZapG+XQJXxoH1J1gKtptU3F10j7mB
 cb1mFyMnh4SAicSMz5/Yuxi5OIQEFjNJfHn6gxHC2cgo8eP3K6jMVSaJnoX32UFaWARUJW7t
 vsQEYrMJqEg0dF9mBrFFBNQlNq/uZwZpYBZoZZT4uWsmK0hCWEBH4ui852wgNi/QvsXLjoA1
 CAkkSLx7Mo8ZIi4ocXLmExYQm1lAS+LGv5dACziAbGmJ5f84QMKcAokSf0/OANsrKqAssbfv
 EPsERoFZSLpnIemehdC9gJF5FaNsSm6Vbm5iZk5xarJucXJiXl5qka6ZXm5miV5qSukmRnDw
 uijvYHzZ532IUYCDUYmH12KiWrQQa2JZcWXuIUZJDiYlUd7vLSrRQnxJ+SmVGYnFGfFFpTmp
 xYcYJTiYlUR4m7NUo4V4UxIrq1KL8mFS0hwsSuK8OYsYo4UE0hNLUrNTUwtSi2CyMhwcShK8
 0rzq0UKCRanpqRVpmTklCGkmDk6Q4TxAw7u5gWp4iwsSc4sz0yHypxgVpcR5G0CaBUASGaV5
 cL2g5CKRvb/mFaM40CvCvE4gVTzAxATX/QpoMBPQ4P0LVUAGlyQipKQaGOUYrtt0T5IrU/5m
 4/JEsXDpj1AjvoZJZfkRnzzfaC3hTbg7OX2CbsrdVOUZKVN40vpakh+Z/5vsrfWlvzGJcU5V
 jTrj5mzfM0b9bPsafKPcI8/9eTPrVWzF/ppor1PuV/xPLlx197eoiezaRz8snCa/KWX12lju
 LvfC9bLhYkujg2u/eP2dosRSnJFoqMVcVJwIABSC4zMJAwAA
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.26
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current/>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Jun 2018 01:48:07 -0000

On Sun, Jun 17, 2018 at 12:35:12PM +0000, Rick Macklem wrote:
> Hi,
> 
> Andreas Nagy has been doing a lot of testing of the NFSv4.1 client in ESXi 6.5u1
> (VMware) against the FreeBSD server. I have given him a bunch of hackish patches
> to try and some of them do help. However not all issues are resolved.
> The problem is that these hacks pretty obviously violate the NFSv4.1 RFC (5661).
> (Details on these come later, for those interested in such things.)
> 
> I can think of three ways to deal with this:
> 1 - Just leave the server as is and point people to the issues that should be addressed
>      in the ESXi client.
> 2 - Put the hacks in, but only enable them based on a sysctl not enabled by default.
>      (The main problem with this is when the server also has non-ESXi mounts.)
> 3 - Enable the hacks for ESXi client mounts only, using the implementation ID
>      it presents at mount time in its ExchangeID arguments.
>      - This is my preferred solution, but the RFC says:
>    An example use for implementation identifiers would be diagnostic
>    software that extracts this information in an attempt to identify
>    interoperability problems, performance workload behaviors, or general
>    usage statistics.  Since the intent of having access to this
>    information is for planning or general diagnosis only, the client and
>    server MUST NOT interpret this implementation identity information in
>    a way that affects interoperational behavior of the implementation.
>    The reason is that if clients and servers did such a thing, they
>    might use fewer capabilities of the protocol than the peer can
>    support, or the client and server might refuse to interoperate.

None of these are great options, but adding in behavior
dependency on the implementation ID feels really bad for the
ecosystem, and I would be unhappy if it was enabled by default.
Is it feasible to do one sysctl per workaround and have the sysctl
set the implementation ID(s) to which to apply?

-Ben

P.S. I feel like the nfsv4 WG list should probably hear about this
sort of issue, in addition to here.


> Note the "MUST NOT" w.r.t. doing this. Of course, I could argue that, since the
> hacks violate the RFC, then why not enable them in a way that violates the RFC.
> 
> Anyhow, I would like to hear from others w.r.t. how they think this should be handled?
> 
> Here's details on the breakage and workarounds for those interested, from looking
> at packet traces in wireshark:
> Fairly benign ones:
> - The client does a ReclaimComplete with one_fs == false and then does a
>   ReclaimComplete with one_fs == true. The server returns
>   NFS4ERR_COMPLETE_ALREADY for the second one, which the ESXi client
>   doesn't like.
>   Woraround: Don't return an error for the one_fs == true case and just assume
>        that same as "one_fs == false".
>   There is also a case where the client only does the
>   ReclaimComplete with one_fs == true. Since FreeBSD exports a hierarchy of
>   file systems, this doesn't indicate to the server that all reclaims are done.
>   (Other extant clients never do the "one_fs == true" variant of
>   ReclaimComplete.)
>   This case of just doing the "one_fs == true" variant is actually a limitation
>   of the server which I don't know how to fix. However the same workaround
>   as listed about gets around it.
> 
> - The client puts random garbage in the delegate_type argument for
>   Open/ClaimPrevious.
>   Workaround: Since the client sets OPEN4_SHARE_ACCESS_WANT_NO_DELEG, it doesn't
>       want a delegation, so assume OPEN_DELEGATE_NONE or OPEN_DELEGATE_NONE_EXT
>       instead of garbage. (Not sure which of the two values makes it happier.)
> 
> Serious ones:
> - The client does a OpenDowngrade with arguments set to OPEN_SHARE_ACCESS_BOTH
>   and OPEN_SHARE_DENY_BOTH.
>   Since OpenDowngrade is supposed to decrease share_access and share_deny,
>   the server returns NFS4ERR_INVAL. OpenDowngrade is not supposed to ever
>   conflict with another Open. (A conflict happens when another Open has
>   set an OPEN_SHARE_DENY that denies the result of the OpenDowngrade.)
>   with NFS4ERR_SHARE_DENIED.
>   I believe this one is done by the client for something it calls a
>   "device lock" and really doesn't like this failing.
>   Workaround: All I can think of is ignore the check for new bits not being set
>       and reply NFS_OK, when no conflicting Open exists.
>       When there is a conflicting Open, returning NFS4ERR_INVAL seems to be the
>       only option, since NFS4ERR_SHARE_DENIED isn't listed for OpenDowngrade.
> 
> - When a server reboots, client does not serialize ExchangeID/CreateSession.
>   When the server reboots, a client needs to do a serialized set of RPCs
>   with ExchangeID followed by CreateSession to confirm it. The reply to
>   ExchangeID has a sequence number (csr_sequence) in it and the
>   CreateSession needs to have the same value in its csa_sequence argument
>   to confirm the clientid issued by the ExchangeID.
>   The client sends many ExchangeIDs and CreateSessions, so they end up failing
>   many times due to the sequence number not matching the last ExchangeID.
>   (This might only happen in the trunked case.)
>   Workaround: Nothing that I can think of.
> 
> - ExchangeID sometimes sends eia_clientowner.co_verifier argument as all zeros.
>   Sometimes the client bogusly fills in the eia_clientowner.co_verifier
>   argument to ExchangeID with all 0s instead of the correct value.
>   This indicates to the server that the client has rebooted (it has not)
>   and results in the server discarding any state for the client and
>   re-initializing the clientid.
>   Workaround: The server can ignore the verifier changing and make the recovery
>       work better. This clearly violates RFC5661 and can only be done for
>       ESXi clients, since ignoring this breaks a Linux client hard reboot.
> 
> - The client doesn't seem to handle NFS4ERR_GRACE errors correctly.
>   These occur when any non-reclaim operations are done during the grace
>   period after a server boot.
>   (A client needs to delay a while and then retry the operation, repeating
>    for as long as NFS4ERR_GRACE is received from the server. This client
>    does not do this.)
>   Workaround: Nothing that I can think of.
> 
> Thanks in advance for any comments, rick
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"