From owner-freebsd-stable@freebsd.org Thu Mar 8 14:35:07 2018 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5A383F36E7B for ; Thu, 8 Mar 2018 14:35:07 +0000 (UTC) (envelope-from Andreas.Nagy@frequentis.com) Received: from mail1.frequentis.com (mail1.frequentis.com [195.20.158.50]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "spamquarantine.frequentis.frq", Issuer "Frequentis Enterprise Issuing CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id A781C7D2F9 for ; Thu, 8 Mar 2018 14:35:06 +0000 (UTC) (envelope-from Andreas.Nagy@frequentis.com) X-IronPort-AV: E=Sophos;i="5.47,441,1515452400"; d="scan'208";a="27544724" Received: from vie191nt.frequentis.frq ([172.16.1.191]) by mail1.frequentis.com with ESMTP; 08 Mar 2018 15:35:04 +0100 Received: from vie196nt.frequentis.frq ([172.16.1.196]) by vie191nt.frequentis.frq ([172.16.1.191]) with mapi id 14.03.0382.000; Thu, 8 Mar 2018 15:35:04 +0100 From: NAGY Andreas To: Rick Macklem , "'freebsd-stable@freebsd.org'" Subject: =?iso-8859-1?Q?RE:_NFS_4.1_RECLAIM=5FCOMPLETE_FS=A0failed_error_in_combin?= =?iso-8859-1?Q?ation_with_ESXi_client?= Thread-Topic: =?iso-8859-1?Q?NFS_4.1_RECLAIM=5FCOMPLETE_FS=A0failed_error_in_combinatio?= =?iso-8859-1?Q?n_with_ESXi_client?= Thread-Index: AdOx8zAe5+TceuOWQkax+IhJZhNDgQAnzopHABn27/AAIBzCQgAP8SUgAAmzYWAADy6jKgAZTvpAABQ08bcAKLRnLwArPwd4AAPK4ZAAFJa1tQAZS9wA Date: Thu, 8 Mar 2018 14:35:03 +0000 Message-ID: References: , , , , , <2feda1e2-16d5-43b5-98eb-dcc71cc67c6f@frequentis.com> , In-Reply-To: Accept-Language: de-AT, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.72.192] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2018 14:35:07 -0000 Thanks you, really great how fast you adapt the source/make patches for thi= s. Saw so many posts were people did not get NFS41 working with ESXi and Fr= eeBSD and now I have it already running with your changes. I have now compiled the kernel with all 4 patches, and it works now. Some problems are still left: - the "Server returned improper reason for no delegation: 2" warnings are s= till in the vmkernel.log. 2018-03-08T11:41:20.290Z cpu0:68011 opID=3D488969b0)WARNING: NFS41: NFS41= ValidateDelegation:608: Server returned improper reason for no delegation: = 2 - can't delete a folder with the VMware host client datastore browser: 2018-03-08T11:34:00.349Z cpu1:67981 opID=3Df5159ce3)WARNING: NFS41: NFS41= FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158:= Transient file system condition, suggest retry 2018-03-08T11:34:00.349Z cpu1:67981 opID=3Df5159ce3)WARNING: NFS41: NFS41= FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158:= Transient file system condition, suggest retry 2018-03-08T11:34:00.349Z cpu1:67981 opID=3Df5159ce3)WARNING: NFS41: NFS41= FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158:= Transient file system condition, suggest retry 2018-03-08T11:34:00.350Z cpu1:67981 opID=3Df5159ce3)WARNING: NFS41: NFS41= FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158:= Transient file system condition, suggest retry 2018-03-08T11:34:00.350Z cpu1:67981 opID=3Df5159ce3)WARNING: NFS41: NFS41= FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158:= Transient file system condition, suggest retry 2018-03-08T11:34:00.350Z cpu1:67981 opID=3Df5159ce3)WARNING: NFS41: NFS41= FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158:= Transient file system condition, suggest retry 2018-03-08T11:34:00.351Z cpu1:67981 opID=3Df5159ce3)WARNING: NFS41: NFS41= FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158:= Transient file system condition, suggest retry 2018-03-08T11:34:00.351Z cpu1:67981 opID=3Df5159ce3)WARNING: NFS41: NFS41= FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158:= Transient file system condition, suggest retry 2018-03-08T11:34:00.351Z cpu1:67981 opID=3Df5159ce3)WARNING: NFS41: NFS41= FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158:= Transient file system condition, suggest retry 2018-03-08T11:34:00.351Z cpu1:67981 opID=3Df5159ce3)WARNING: NFS41: NFS41= FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158:= Transient file system condition, suggest retry 2018-03-08T11:34:00.352Z cpu1:67981 opID=3Df5159ce3)WARNING: NFS41: NFS41= FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158:= Transient file system condition, suggest retry 2018-03-08T11:34:00.352Z cpu1:67981 opID=3Df5159ce3)WARNING: UserFile: 21= 55: hostd-worker: Directory changing too often to perform readdir operation= (11 retries), returning busy - after a reboot of the FreeBSD machine the ESXi does not restore the NFS d= atastore again with following warning (just disconnecting the links is fine= ) 2018-03-08T12:39:44.602Z cpu23:66484)WARNING: NFS41: NFS41_Bug:2361: BUG = - Invalid BIND_CONN_TO_SESSION error: NFS4ERR_NOTSUPP Actually I have only made some quick benchmarks with ATTO in a Windows VM w= hich has a vmdk on the NFS41 datastore which is mounted over two 1GB links = in different subnets. Read is nearly the double of just a single connection and write is just a b= it faster. Don't know if write speed could be improved, actually the share = is UFS on a HW raid controller which has local write speeds about 500MB/s. At following link is the vmkernel.log from mouning the NFS share, attaching= a vmdk from the share to a Win VM, running ATTO benchmark on it, disconnec= ting/reconnecting network and also the problem with the BIND_CONN_TO_SESSIO= N error: NFS4ERR_NOTSUPP after reboot. Till the reboot I have also made a trace on one of the two links. (nfs41_tr= ace_before_reboot.pcap and nfs41_trace_after_reboot.pcap) https://files.fm/u/wvybmdmc andi -----Original Message----- From: Rick Macklem [mailto:rmacklem@uoguelph.ca]=20 Sent: Donnerstag, 8. M=E4rz 2018 03:48 To: NAGY Andreas ; 'freebsd-stable@freebsd.org= ' Subject: Re: NFS 4.1 RECLAIM_COMPLETE FS=A0failed error in combination with= ESXi client NAGY Andreas wrote: >attached the trace. If I see it correct it uses FORE_OR_BOTH.=20 >(bctsa_dir: >CDFC4_FORE_OR_BOTH (0x00000003)) Yes. The scary part is the ExchangeID before the BindConnectiontoSession. (Normally that is only done at the beginning of a new mount to get a Client= ID, followed immediately by a CreateSession. I don't know why it would do = this?) The attached patch might get BindConnectiontoSession to work. I have no way= to test it beyond seeing it compile. Hopefully it will apply cleanly. >The trace is only with the first patch, have not compiled the wantdeleg pa= tches so >far. That's fine. I don't think that matters much. >I think this is related to the BIND_CONN_TO_SESSION; after a disconnect th= e ESXi >cannot connect to the NFS also with this warning: >2018-03-07T16:55:11.227Z cpu21:66484)WARNING: NFS41: NFS41_Bug:2361:=20 >>BUG - Invalid BIND_CONN_TO_SESSION error: NFS4ERR_NOTSUPP If the attached patch works, you'll find out what it fixes. >Another thing I noticed today is that it is not possible to delete a folde= r with the >ESXi datastorebrowser on the NFS mount. Maybe it is a VMWare bu= g, but with >NFS3 it works. > >Here the vmkernel.log with only one connection contains mounting, trying t= o >delete a folder and disconnect: > >2018-03-07T16:46:04.543Z cpu12:68008 opID=3D55bea165)World: 12235: VC=20 >opID >c55dbe59 maps to vmkernel opID 55bea165 2018-03-07T16:46:04.543Z=20 >cpu12:68008 opID=3D55bea165)NFS41: >NFS41_VSIMountSet:423: Mount server:=20 >10.0.0.225, port: 2049, path: /, label: >nfsds1, security: 1 user: ,=20 >options: 2018-03-07T16:46:04.543Z cpu12:68008=20 >opID=3D55bea165)StorageApdHandler: >977: APD Handle Created with=20 >lock[StorageApd-0x43046e4c6d70] 2018-03-07T16:46:04.544Z=20 >cpu11:66486)NFS41: >NFS41ProcessClusterProbeResult:3873: Reclaiming=20 >state, cluster 0x43046e4c7ee0 >[7] 2018-03-07T16:46:04.545Z cpu12:68008=20 >opID=3D55bea165)NFS41: >NFS41FSCompleteMount:3791: Lease time: 120=20 >2018-03-07T16:46:04.545Z cpu12:68008 opID=3D55bea165)NFS41:=20 >>NFS41FSCompleteMount:3792: Max read xfer size: 0x20000=20 >2018-03-07T16:46:04.545Z cpu12:68008 opID=3D55bea165)NFS41:=20 >>NFS41FSCompleteMount:3793: Max write xfer size: 0x20000=20 >2018-03-07T16:46:04.545Z cpu12:68008 opID=3D55bea165)NFS41:=20 >>NFS41FSCompleteMount:3794: Max file size: 0x800000000000=20 >2018-03-07T16:46:04.545Z cpu12:68008 opID=3D55bea165)NFS41:=20 >>NFS41FSCompleteMount:3795: Max file name: 255 2018-03-07T16:46:04.545Z=20 >cpu12:68008 opID=3D55bea165)WARNING: NFS41: >NFS41FSCompleteMount:3800:=20 >The max file name size (255) of file system is >larger than that of FSS=20 >(128) 2018-03-07T16:46:04.546Z cpu12:68008 opID=3D55bea165)NFS41:=20 >>NFS41FSAPDNotify:5960: Restored connection to the server 10.0.0.225=20 >mount >point nfsds1, mounted as 1a7893c8-eec764a7-0000-000000000000=20 >("/") 2018-03-07T16:46:04.546Z cpu12:68008 opID=3D55bea165)NFS41:=20 >>NFS41_VSIMountSet:435: nfsds1 mounted successfully=20 >2018-03-07T16:47:19.869Z cpu21:67981 opID=3De47706ec)World: 12235: VC=20 >opID >c55dbe91 maps to vmkernel opID e47706ec 2018-03-07T16:47:19.869Z=20 >cpu21:67981 opID=3De47706ec)WARNING: NFS41: >NFS41FileOpReaddir:4728:=20 >Failed to process READDIR result for fh 0x43046e4c6 I have no idea if getting BindConnectiontoSession working will fix this or = not? rick