Date: Sat, 29 Aug 2020 02:10:45 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: "Scheffenegger, Richard" <Richard.Scheffenegger@netapp.com> Cc: Michael Tuexen <tuexen@FreeBSD.org>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org> Subject: Re: TFO for NFS Message-ID: <QB1PR01MB3364F941467684F1F42FCC0BDD530@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM> In-Reply-To: <SN4PR0601MB37284C354A2C2870711948B786520@SN4PR0601MB3728.namprd06.prod.outlook.com> References: <SN4PR0601MB372838333EC1C96E7FEAC77186550@SN4PR0601MB3728.namprd06.prod.outlook.com> <QB1PR01MB3364668B0FFD5E3D17E38E3ADD520@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM>, <SN4PR0601MB37284C354A2C2870711948B786520@SN4PR0601MB3728.namprd06.prod.outlook.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Scheffenegger, Richard wrote:=0A= >I know, NFS TCP sessions are some of the most long-lived sessions in regul= ar use.=0A= Ok, so I'll admit I can't wrap my head around this.=0A= It is way out of my area of expertise (so I've added freebsd-net@ to the cc= ), but=0A= it seems to me that NFS is the about the least appropriate use fot TFO.=0A= =0A= It seems that, for TFO to be useful, the application needs to be doing freq= uent=0A= short lived TCP connections and often across WAN/Internet.=0A= NFS mounts do neither of the above.=0A= - They, as we've noted, only normally do a TCP connect at mount time.=0A= Usually run on low latency LAN environments. (High latency connections= =0A= hammer NFS performance, due to its frequent small RPCs that the client= =0A= must wait for replies to sychronously.)=0A= =0A= All you might save is one RTT. Take a look at how many RPCs (each with a RT= T)=0A= happen on an active NFS mount.=0A= =0A= >My rationale is two-fold:=0A= >=0A= >First, having a relatively high-profile use of the TFO option in the core = OS modules >will definitely expose that feature to at least some use.=0A= Well, I don't think it is NFS's job to expose a feature that is not useful = for it.=0A= (If you were to implement this and benchmarking showed a significant=0A= improvement in elapsed time to do an NFS mount, then that could be a=0A= different story.)=0A= =0A= >Second, in case of a network disconnect (or, something with my company doe= s, >that would be most comparable to unassigning and reassigning the server= IP >address between different physical ports), while there is IO load, TFO= may reduce >(ever so slightly) the latency impact of the enqueued IOs.=0A= I'm not sure I understand this. NFS always uses port# 2049.=0A= If you are referring to the host IP address, then wouldn't that be handled = via.=0A= Arp and routing? (Does this require a fresh TCP connection to the same serv= er=0A= IP address?)=0A= =0A= >My plan is first to simply enable the socket option - that should result i= n TFO to >get negotiated for, but no actual latency improvement, while the = traditional >connect() sequence to set up a TCP session is done., from the = client side; the >server side will not need to change, and can send out ini= tial data right away with >the syn/ack (at least in theory, if the syn cont= ained a full NFS request that can be >responded to).=0A= >=0A= >Changing the client to make use of the SYN+data facilities would be a 2nd = step.=0A= Well, during an NFS mount, there is first a TCP connection made by=0A= mount_nfs im userspace and it is only used for a single Null RPC.=0A= --> This checks that the server is up and running.=0A= Then mount_nfs does nmount(2), which will create a second TCP connection=0A= which is normally used until unmount.=0A= --> All you save is the RTT for the one first RPC of many.=0A= =0A= >Also, I shall make this a configurable, since some network devices may inh= ibit TFO >packets, incurring a delay (but that's mostly public internet, no= t private networks >where NFS is being used). Ideally with TFO default to o= n (once it's working >properly), but able to explicitly disable it for cert= ain mounts.=0A= NFS suffered TSO related bugs for several (> 5) years (and I wouldn't be su= rprised=0A= if there are still net device drivers broken such that TSO must be disabled= to make=0A= NFS work ok on them.=0A= =0A= As such, I get very nervous about this kind of thing.=0A= =0A= Reliability always trumps performance when it comes to file system work.=0A= =0A= Now, if you are interested in improving NFS performance over TCP, that=0A= could be a very interesting project, but I doubt TFO would be relevant.=0A= Especially when you look at long fat pipes (TCP connections with a large=0A= delay * bandwidth), there is probably a lot that could be done.=0A= --> Read-ahead, write-back algorithm changes. Read/Write data size.=0A= Throttling/congestion avoidance/window sizing in TCP.=0A= And the list goes on and on...=0A= =0A= I do hope that NFS over TLS allows more use of NFS across the Internet,=0A= so performance work related to NFS running on WAN/Internet connections=0A= would be a great thing to do. (I'm not conversant with the current TCP stac= k,=0A= so I'm not the guy to tackle this.)=0A= =0A= rick=0A= =0A= Richard Scheffenegger=0A= =0A= =0A= -----Original Message-----=0A= From: Rick Macklem <rmacklem@uoguelph.ca>=0A= Sent: Freitag, 28. August 2020 04:35=0A= To: Scheffenegger, Richard <Richard.Scheffenegger@netapp.com>; rmacklem@fre= ebsd.org=0A= Cc: Michael Tuexen <tuexen@freebsd.org>=0A= Subject: Re: TFO for NFS=0A= =0A= NetApp Security WARNING: This is an external email. Do not click links or o= pen attachments unless you recognize the sender and know the content is saf= e.=0A= =0A= =0A= =0A= =0A= Well, you'll find the soconnect() stuff in sys/rpc/clnt_vc.c.=0A= If you just want to play around with it, have fun.=0A= =0A= As for this being useful in practice, that seems unlikely.=0A= When the kernel RPC code uses TCP it establishes one TCP connection at moun= t time and uses that connection until unmount unless the connection breaks = somehow.=0A= (A server will often disconnect after about 5 minutes of no activity on th= e connection. This almost never happens for NFSv4, since the NFSv4 client = does an RPC every 30sec to maintain the lease against the server.)=0A= --> A new TCP connection usually only happens after a=0A= network partitioning heals.=0A= (There was a bug that caused reconnects during certain cases of signal han= dling, but that was fixed about 3 years ago.)=0A= =0A= rick=0A= =0A= ________________________________________=0A= From: Scheffenegger, Richard <Richard.Scheffenegger@netapp.com>=0A= Sent: Thursday, August 27, 2020 6:29 PM=0A= To: rmacklem@freebsd.org=0A= Cc: Michael Tuexen=0A= Subject: TFO for NFS=0A= =0A= CAUTION: This email originated from outside of the University of Guelph. Do= not click links or open attachments unless you recognize the sender and kn= ow the content is safe. If in doubt, forward suspicious emails to IThelp@uo= guelph.ca=0A= =0A= Hi Rick,=0A= =0A= I've seen you are very active with the fbsd nfs code, having branched the n= fs-over-tls project.=0A= =0A= Is anyone else contributing to this project yet?=0A= =0A= After some discussion in todays freebsd-transport call with tuexen@ , I was= wondering if the TCP Fast Open Option could be added as a proof-of-concept= to the in-kernel RPC handler. It may also be a nice augmentation of nfs-ov= er-tls when available, to absorb some of the added tls connection setup lat= ency when available...=0A= =0A= Right now, I am quite unfamiliar with all the rpc code, which appears to ha= ndle all the basic plumbing of NFS;=0A= =0A= Would you be interested in helping me with advice and reviews, in order to = try and get something around TFO working?=0A= =0A= (The reduction in time-to-first-IO by 1 RTT may be helpful in some scenario= s, or when TLS 1.2 instead of 1.3 is in use, where speeding up the tls hand= shake would potentially also be a nice property.=0A= =0A= =0A= Having said all this, for a client to actually make use of TFO, it is likel= y the slight changes / additions need to be done, in order to send out the = initial data (TLS or RPC) right away before any soconnect(), using sendmsg(= ) instead - causing the socket itself to figure out that tcp can connect an= d send data at the same time...=0A= =0A= Best regards,=0A= =0A= Richard Scheffenegger=0A= =0A= =0A=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?QB1PR01MB3364F941467684F1F42FCC0BDD530>