Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 29 Aug 2020 02:10:45 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        "Scheffenegger, Richard" <Richard.Scheffenegger@netapp.com>
Cc:        Michael Tuexen <tuexen@FreeBSD.org>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   Re: TFO for NFS
Message-ID:  <QB1PR01MB3364F941467684F1F42FCC0BDD530@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <SN4PR0601MB37284C354A2C2870711948B786520@SN4PR0601MB3728.namprd06.prod.outlook.com>
References:  <SN4PR0601MB372838333EC1C96E7FEAC77186550@SN4PR0601MB3728.namprd06.prod.outlook.com> <QB1PR01MB3364668B0FFD5E3D17E38E3ADD520@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM>, <SN4PR0601MB37284C354A2C2870711948B786520@SN4PR0601MB3728.namprd06.prod.outlook.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Scheffenegger, Richard wrote:=0A=
>I know, NFS TCP sessions are some of the most long-lived sessions in regul=
ar use.=0A=
Ok, so I'll admit I can't wrap my head around this.=0A=
It is way out of my area of expertise (so I've added freebsd-net@ to the cc=
), but=0A=
it seems to me that NFS is the about the least appropriate use fot TFO.=0A=
=0A=
It seems that, for TFO to be useful, the application needs to be doing freq=
uent=0A=
short lived TCP connections and often across WAN/Internet.=0A=
NFS mounts do neither of the above.=0A=
- They, as we've noted, only normally do a TCP connect at mount time.=0A=
   Usually run on low latency LAN environments. (High latency connections=
=0A=
   hammer NFS performance, due to its frequent small RPCs that the client=
=0A=
   must wait for replies to sychronously.)=0A=
=0A=
All you might save is one RTT. Take a look at how many RPCs (each with a RT=
T)=0A=
happen on an active NFS mount.=0A=
=0A=
>My rationale is two-fold:=0A=
>=0A=
>First, having a relatively high-profile use of the TFO option in the core =
OS modules >will definitely expose that feature to at least some use.=0A=
Well, I don't think it is NFS's job to expose a feature that is not useful =
for it.=0A=
(If you were to implement this and benchmarking showed a significant=0A=
 improvement in elapsed time to do an NFS mount, then that could be a=0A=
different story.)=0A=
=0A=
>Second, in case of a network disconnect (or, something with my company doe=
s, >that would be most comparable to unassigning and reassigning the server=
 IP >address between different physical ports), while there is IO load, TFO=
 may reduce >(ever so slightly) the latency impact of the enqueued IOs.=0A=
I'm not sure I understand this. NFS always uses port# 2049.=0A=
If you are referring to the host IP address, then wouldn't that be handled =
via.=0A=
Arp and routing? (Does this require a fresh TCP connection to the same serv=
er=0A=
IP address?)=0A=
=0A=
>My plan is first to simply enable the socket option - that should result i=
n TFO to >get negotiated for, but no actual latency improvement, while the =
traditional >connect() sequence to set up a TCP session is done., from the =
client side; the >server side will not need to change, and can send out ini=
tial data right away with >the syn/ack (at least in theory, if the syn cont=
ained a full NFS request that can be >responded to).=0A=
>=0A=
>Changing the client to make use of the SYN+data facilities would be a 2nd =
step.=0A=
Well, during an NFS mount, there is first a TCP connection made by=0A=
mount_nfs im userspace and it is only used for a single Null RPC.=0A=
--> This checks that the server is up and running.=0A=
Then mount_nfs does nmount(2), which will create a second TCP connection=0A=
which is normally used until unmount.=0A=
--> All you save is the RTT for the one first RPC of many.=0A=
=0A=
>Also, I shall make this a configurable, since some network devices may inh=
ibit TFO >packets, incurring a delay (but that's mostly public internet, no=
t private networks >where NFS is being used). Ideally with TFO default to o=
n (once it's working >properly), but able to explicitly disable it for cert=
ain mounts.=0A=
NFS suffered TSO related bugs for several (> 5) years (and I wouldn't be su=
rprised=0A=
if there are still net device drivers broken such that TSO must be disabled=
 to make=0A=
NFS work ok on them.=0A=
=0A=
As such, I get very nervous about this kind of thing.=0A=
=0A=
Reliability always trumps performance when it comes to file system work.=0A=
=0A=
Now, if you are interested in improving NFS performance over TCP, that=0A=
could be a very interesting project, but I doubt TFO would be relevant.=0A=
Especially when you look at long fat pipes (TCP connections with a large=0A=
delay * bandwidth), there is probably a lot that could be done.=0A=
--> Read-ahead, write-back algorithm changes. Read/Write data size.=0A=
       Throttling/congestion avoidance/window sizing in TCP.=0A=
       And the list goes on and on...=0A=
=0A=
I do hope that NFS over TLS allows more use of NFS across the Internet,=0A=
so performance work related to NFS running on WAN/Internet connections=0A=
would be a great thing to do. (I'm not conversant with the current TCP stac=
k,=0A=
so I'm not the guy to tackle this.)=0A=
=0A=
rick=0A=
=0A=
Richard Scheffenegger=0A=
=0A=
=0A=
-----Original Message-----=0A=
From: Rick Macklem <rmacklem@uoguelph.ca>=0A=
Sent: Freitag, 28. August 2020 04:35=0A=
To: Scheffenegger, Richard <Richard.Scheffenegger@netapp.com>; rmacklem@fre=
ebsd.org=0A=
Cc: Michael Tuexen <tuexen@freebsd.org>=0A=
Subject: Re: TFO for NFS=0A=
=0A=
NetApp Security WARNING: This is an external email. Do not click links or o=
pen attachments unless you recognize the sender and know the content is saf=
e.=0A=
=0A=
=0A=
=0A=
=0A=
Well, you'll find the soconnect() stuff in sys/rpc/clnt_vc.c.=0A=
If you just want to play around with it, have fun.=0A=
=0A=
As for this being useful in practice, that seems unlikely.=0A=
When the kernel RPC code uses TCP it establishes one TCP connection at moun=
t time and uses that connection until unmount unless the connection breaks =
somehow.=0A=
(A server will often disconnect after about 5 minutes of  no activity on th=
e connection. This almost never happens  for NFSv4, since the NFSv4 client =
does an RPC every 30sec  to maintain the lease against the server.)=0A=
--> A new TCP connection usually only happens after a=0A=
      network partitioning heals.=0A=
(There was a bug that caused reconnects during certain  cases of signal han=
dling, but that was fixed about 3 years ago.)=0A=
=0A=
rick=0A=
=0A=
________________________________________=0A=
From: Scheffenegger, Richard <Richard.Scheffenegger@netapp.com>=0A=
Sent: Thursday, August 27, 2020 6:29 PM=0A=
To: rmacklem@freebsd.org=0A=
Cc: Michael Tuexen=0A=
Subject: TFO for NFS=0A=
=0A=
CAUTION: This email originated from outside of the University of Guelph. Do=
 not click links or open attachments unless you recognize the sender and kn=
ow the content is safe. If in doubt, forward suspicious emails to IThelp@uo=
guelph.ca=0A=
=0A=
Hi Rick,=0A=
=0A=
I've seen you are very active with the fbsd nfs code, having branched the n=
fs-over-tls project.=0A=
=0A=
Is anyone else contributing to this project yet?=0A=
=0A=
After some discussion in todays freebsd-transport call with tuexen@ , I was=
 wondering if the TCP Fast Open Option could be added as a proof-of-concept=
 to the in-kernel RPC handler. It may also be a nice augmentation of nfs-ov=
er-tls when available, to absorb some of the added tls connection setup lat=
ency when available...=0A=
=0A=
Right now, I am quite unfamiliar with all the rpc code, which appears to ha=
ndle all the basic plumbing of NFS;=0A=
=0A=
Would you be interested in helping me with advice and reviews, in order to =
try and get something around TFO working?=0A=
=0A=
(The reduction in time-to-first-IO by 1 RTT may be helpful in some scenario=
s, or when TLS 1.2 instead of 1.3 is in use, where speeding up the tls hand=
shake would potentially also be a nice property.=0A=
=0A=
=0A=
Having said all this, for a client to actually make use of TFO, it is likel=
y the slight changes / additions need to be done, in order to send out the =
initial data (TLS or RPC) right away before any soconnect(), using sendmsg(=
) instead - causing the socket itself to figure out that tcp can connect an=
d send data at the same time...=0A=
=0A=
Best regards,=0A=
=0A=
Richard Scheffenegger=0A=
=0A=
=0A=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?QB1PR01MB3364F941467684F1F42FCC0BDD530>