Date: Mon, 27 Jul 2020 13:21:27 -0400 From: Andrew Gallatin <gallatin@cs.duke.edu> To: Rick Macklem <rmacklem@uoguelph.ca>, "freebsd-current@FreeBSD.org" <freebsd-current@FreeBSD.org> Cc: "jhb@FreeBSD.org" <jhb@FreeBSD.org>, "gallatin@freebsd.org" <gallatin@freebsd.org>, Gleb Smirnoff <glebius@freebsd.org> Subject: Re: RFC: ktls and krpc using M_EXTPG mbufs Message-ID: <319c92f4-4157-74a3-2bec-8f40e3979261@cs.duke.edu> In-Reply-To: <QB1PR01MB33640E7C88BA0E27A89587DFDD7A0@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM> References: <QB1PR01MB33640E7C88BA0E27A89587DFDD7A0@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2020-07-19 19:34, Rick Macklem wrote: > I spent a little time chasing a problem in the nfs-over-tls code, where it > would sometimes end up with corrupted data in the file(s) of a mirrored > pNFS configuration. > > I think the problem was that the code filled the data to be written into > anonymous page M_EXTPG mbufs, then did a m_copym() { copy by > reference } and used the copies for the mirrored writes. > --> In ktls_encrypt(), the encryption was done to the same pages and, > sometimes, the encrypted data got encrypted again during the > sosend() of the other copy. > > Although I haven't reproduced it, a regular kernel write RPC could suffer the > same consequences if the RPC is retried (it keeps an m_copym() copy > of the request in the krpc for an RPC retry). > > At this time, the code in projects/nfs-over-tls works correctly, since it > always fills the data to be written into mbuf clusters, m_copym()s those > and then copies those { real copying using memcpy() } via > mb_mapped_to_unmapped() just before calling sosend(). > --> This works, but it would be nice to avoid the mb_mapped_to_unmapped() > copying for all the data being written via an NFS over TLS connection. > > For the TCP_TLS_MODE_SW case: > --> The NFS code can fill the written data into anonymous pages on M_EXTPG > mbufs. > Then, the ktls_encrypt() could be modified to > allocate a new set of anonymous pages for the destination side of > the encryption (it already does this for the sendfile case) and put those > in a new mbuf list. > --> This would result in new anonymous pages and mbufs being allocated, > but would not do memcpy()s. > After encryption, it would just do a m_freem() on the unencrypted list. > --> For the krpc client case, this call would only decrement the reference > count on the unencrypted list and it could be used for a retry by the krpc > and then be free'd { m_freem() call } after a reply is received. > > If doing this for all the sosend()s of anonymous page M_EXTPG mbufs seems > like unnecessary overhead, the above could be enabled via a setsockopt() > on the socket. > > What do others think of this? Several comments: mb_mapped_to_unmapped() is surprisingly inexpensive. It was less than 5% before I converted iflib to M_NOMAP aware. It seems like NFS should be constructing mbufs like sendfile does, and pointing mbufs at its pages. This would cause the crypto code to allocate a new set of pages upon encryption. > For the hardware offload case: > - Can I assume that the anonymous pages in M_EXTPG mbufs will remain > unchanged? > --> If so, and it won't change to TCP_TLS_MODE_SW, the NFS code could > fill the data to be written into M_EXTPG mbufs safely. > > - And, if so, can I safely use the ktls_session mode field to decide if offload > is happening? > I see the TCP_TXTLS_MODE socket opt which seems to > switch the mode to TCP_TLS_MODE_SW. > When does this happen? Or, can this happen to a session once in use? Yes. The intent is to allow something (TCP stack, smart user daemon) to look at a connection & move it from hardware to software, if it has a lot of TCP re-transmits. Drew
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?319c92f4-4157-74a3-2bec-8f40e3979261>