FreeBSD Mail Archives

Date:      Sat, 29 Aug 2020 09:33:52 +0000
From:      "Scheffenegger, Richard" <Richard.Scheffenegger@netapp.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        Michael Tuexen <tuexen@FreeBSD.org>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   RE: TFO for NFS
Message-ID:  <SN4PR0601MB37281A4BB799FD169DD59CE986530@SN4PR0601MB3728.namprd06.prod.outlook.com>
In-Reply-To: <QB1PR01MB3364F941467684F1F42FCC0BDD530@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM>
References:  <SN4PR0601MB372838333EC1C96E7FEAC77186550@SN4PR0601MB3728.namprd06.prod.outlook.com> <QB1PR01MB3364668B0FFD5E3D17E38E3ADD520@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM>, <SN4PR0601MB37284C354A2C2870711948B786520@SN4PR0601MB3728.namprd06.prod.outlook.com> <QB1PR01MB3364F941467684F1F42FCC0BDD530@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM>

Hi Rick,

> It seems that, for TFO to be useful, the application needs to be doing fr=
equent short lived TCP connections and often across WAN/Internet.
> NFS mounts do neither of the above.
> - They, as we've noted, only normally do a TCP connect at mount time.
>   Usually run on low latency LAN environments. (High latency connections
>   hammer NFS performance, due to its frequent small RPCs that the client
>   must wait for replies to sychronously.)

Standard, run of the mill kernel-NFS-client mounts don't. OTOH, TFO is a tr=
ansparent feature (on the server side), and requires slight changes on the =
client side to be useful. Providing  a "worked example" of how to do this (=
properly?) might inspire other uses later on.=20

You may have heared from a large corporation mostly known for their Databas=
es (and Java). There, the application itself implements a streamlined, ligh=
t-weight NFS client (dNFS). In this scenario, each DB task / worker very fr=
equently sets up a dedicated NFS session to perform the set of IOs necessar=
y to complete its task, and then stop using this tcp session. And there are=
 a couple more implementations of User-space, streamlined NFS applications,=
 which utilize many parallel TCP sessions when working against an NFS serve=
r, bypassing the in-kernel NFS client.

> (If you were to implement this and benchmarking showed a significant  imp=
rovement in elapsed time to do an NFS mount, then that could be a different=
 story.)

Let me reach out to the maintainer of one of the applications using a users=
pace NFS client (a rsync like app, bypassing the in-kernel NFS client for d=
ramatic bandwidth gains during simple copy jobs, assuming exclusive access =
for the duration). That could be tweaked more easily to show the behavior o=
f above mentione, widely deploye DB application.

> I'm not sure I understand this. NFS always uses port# 2049.
> If you are referring to the host IP address, then wouldn't that be handle=
d via.

The NFS entity (and state, for v4) is handed off between hosts, while the I=
P remains, in this scenario. However, TCP state is not, and the client will=
 observe a sudden RST when such a migration happened on the server side; th=
en it has to re-establish the TCP and reclaim locks (in the v4 case), befor=
e being able to continue.

Anyway, I try to come up with a proof-of-concept  patch, and try to get ben=
chmark data. Let's continue the discussion then.

Best regards,
   Richard

-----Original Message-----
From: Rick Macklem <rmacklem@uoguelph.ca>=20
Sent: Samstag, 29. August 2020 04:11
To: Scheffenegger, Richard <Richard.Scheffenegger@netapp.com>
Cc: Michael Tuexen <tuexen@FreeBSD.org>; freebsd-net@freebsd.org
Subject: Re: TFO for NFS

NetApp Security WARNING: This is an external email. Do not click links or o=
pen attachments unless you recognize the sender and know the content is saf=
e.

Scheffenegger, Richard wrote:
>I know, NFS TCP sessions are some of the most long-lived sessions in regul=
ar use.
Ok, so I'll admit I can't wrap my head around this.
It is way out of my area of expertise (so I've added freebsd-net@ to the cc=
), but it seems to me that NFS is the about the least appropriate use fot T=
FO.

It seems that, for TFO to be useful, the application needs to be doing freq=
uent short lived TCP connections and often across WAN/Internet.
NFS mounts do neither of the above.
- They, as we've noted, only normally do a TCP connect at mount time.
   Usually run on low latency LAN environments. (High latency connections
   hammer NFS performance, due to its frequent small RPCs that the client
   must wait for replies to sychronously.)

All you might save is one RTT. Take a look at how many RPCs (each with a RT=
T) happen on an active NFS mount.

>My rationale is two-fold:
>
>First, having a relatively high-profile use of the TFO option in the core =
OS modules >will definitely expose that feature to at least some use.
Well, I don't think it is NFS's job to expose a feature that is not useful =
for it.
(If you were to implement this and benchmarking showed a significant  impro=
vement in elapsed time to do an NFS mount, then that could be a different s=
tory.)

>Second, in case of a network disconnect (or, something with my company doe=
s, >that would be most comparable to unassigning and reassigning the server=
 IP >address between different physical ports), while there is IO load, TFO=
 may reduce >(ever so slightly) the latency impact of the enqueued IOs.
I'm not sure I understand this. NFS always uses port# 2049.
If you are referring to the host IP address, then wouldn't that be handled =
via.
Arp and routing? (Does this require a fresh TCP connection to the same serv=
er IP address?)

>My plan is first to simply enable the socket option - that should result i=
n TFO to >get negotiated for, but no actual latency improvement, while the =
traditional >connect() sequence to set up a TCP session is done., from the =
client side; the >server side will not need to change, and can send out ini=
tial data right away with >the syn/ack (at least in theory, if the syn cont=
ained a full NFS request that can be >responded to).
>
>Changing the client to make use of the SYN+data facilities would be a 2nd =
step.
Well, during an NFS mount, there is first a TCP connection made by mount_nf=
s im userspace and it is only used for a single Null RPC.
--> This checks that the server is up and running.
Then mount_nfs does nmount(2), which will create a second TCP connection wh=
ich is normally used until unmount.
--> All you save is the RTT for the one first RPC of many.

>Also, I shall make this a configurable, since some network devices may inh=
ibit TFO >packets, incurring a delay (but that's mostly public internet, no=
t private networks >where NFS is being used). Ideally with TFO default to o=
n (once it's working >properly), but able to explicitly disable it for cert=
ain mounts.
NFS suffered TSO related bugs for several (> 5) years (and I wouldn't be su=
rprised if there are still net device drivers broken such that TSO must be =
disabled to make NFS work ok on them.

As such, I get very nervous about this kind of thing.

Reliability always trumps performance when it comes to file system work.

Now, if you are interested in improving NFS performance over TCP, that coul=
d be a very interesting project, but I doubt TFO would be relevant.
Especially when you look at long fat pipes (TCP connections with a large de=
lay * bandwidth), there is probably a lot that could be done.
--> Read-ahead, write-back algorithm changes. Read/Write data size.
       Throttling/congestion avoidance/window sizing in TCP.
       And the list goes on and on...

I do hope that NFS over TLS allows more use of NFS across the Internet, so =
performance work related to NFS running on WAN/Internet connections would b=
e a great thing to do. (I'm not conversant with the current TCP stack, so I=
'm not the guy to tackle this.)

rick

Richard Scheffenegger

-----Original Message-----
From: Rick Macklem <rmacklem@uoguelph.ca>
Sent: Freitag, 28. August 2020 04:35
To: Scheffenegger, Richard <Richard.Scheffenegger@netapp.com>; rmacklem@fre=
ebsd.org
Cc: Michael Tuexen <tuexen@freebsd.org>
Subject: Re: TFO for NFS

NetApp Security WARNING: This is an external email. Do not click links or o=
pen attachments unless you recognize the sender and know the content is saf=
e.

Well, you'll find the soconnect() stuff in sys/rpc/clnt_vc.c.
If you just want to play around with it, have fun.

As for this being useful in practice, that seems unlikely.
When the kernel RPC code uses TCP it establishes one TCP connection at moun=
t time and uses that connection until unmount unless the connection breaks =
somehow.
(A server will often disconnect after about 5 minutes of  no activity on th=
e connection. This almost never happens  for NFSv4, since the NFSv4 client =
does an RPC every 30sec  to maintain the lease against the server.)
--> A new TCP connection usually only happens after a
      network partitioning heals.
(There was a bug that caused reconnects during certain  cases of signal han=
dling, but that was fixed about 3 years ago.)

rick

________________________________________
From: Scheffenegger, Richard <Richard.Scheffenegger@netapp.com>
Sent: Thursday, August 27, 2020 6:29 PM
To: rmacklem@freebsd.org
Cc: Michael Tuexen
Subject: TFO for NFS

CAUTION: This email originated from outside of the University of Guelph. Do=
 not click links or open attachments unless you recognize the sender and kn=
ow the content is safe. If in doubt, forward suspicious emails to IThelp@uo=
guelph.ca

Hi Rick,

I've seen you are very active with the fbsd nfs code, having branched the n=
fs-over-tls project.

Is anyone else contributing to this project yet?

After some discussion in todays freebsd-transport call with tuexen@ , I was=
 wondering if the TCP Fast Open Option could be added as a proof-of-concept=
 to the in-kernel RPC handler. It may also be a nice augmentation of nfs-ov=
er-tls when available, to absorb some of the added tls connection setup lat=
ency when available...

Right now, I am quite unfamiliar with all the rpc code, which appears to ha=
ndle all the basic plumbing of NFS;

Would you be interested in helping me with advice and reviews, in order to =
try and get something around TFO working?

(The reduction in time-to-first-IO by 1 RTT may be helpful in some scenario=
s, or when TLS 1.2 instead of 1.3 is in use, where speeding up the tls hand=
shake would potentially also be a nice property.

Having said all this, for a client to actually make use of TFO, it is likel=
y the slight changes / additions need to be done, in order to send out the =
initial data (TLS or RPC) right away before any soconnect(), using sendmsg(=
) instead - causing the socket itself to figure out that tcp can connect an=
d send data at the same time...

Best regards,

Richard Scheffenegger

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?SN4PR0601MB37281A4BB799FD169DD59CE986530>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation