Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Jul 2020 23:51:22 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Andrew Gallatin <gallatin@cs.duke.edu>, "freebsd-current@FreeBSD.org" <freebsd-current@FreeBSD.org>
Cc:        "jhb@FreeBSD.org" <jhb@FreeBSD.org>, "gallatin@freebsd.org" <gallatin@freebsd.org>, Gleb Smirnoff <glebius@freebsd.org>
Subject:   Re: RFC: ktls and krpc using M_EXTPG mbufs
Message-ID:  <QB1PR01MB336482096431BA9BCC1DAAB2DD720@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <319c92f4-4157-74a3-2bec-8f40e3979261@cs.duke.edu>
References:  <QB1PR01MB33640E7C88BA0E27A89587DFDD7A0@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM>, <319c92f4-4157-74a3-2bec-8f40e3979261@cs.duke.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
Andrew Gallatin wrote:=0A=
>On 2020-07-19 19:34, Rick Macklem wrote:=0A=
>> I spent a little time chasing a problem in the nfs-over-tls code, where =
it=0A=
>> would sometimes end up with corrupted data in the file(s) of a mirrored=
=0A=
>> pNFS configuration.=0A=
>>=0A=
>> I think the problem was that the code filled the data to be written into=
=0A=
>> anonymous page M_EXTPG mbufs, then did a m_copym() { copy by=0A=
>> reference } and used the copies for the mirrored writes.=0A=
>> --> In ktls_encrypt(), the encryption was done to the same pages and,=0A=
>>         sometimes, the encrypted data got encrypted again during the=0A=
>>         sosend() of the other copy.=0A=
>>=0A=
>> Although I haven't reproduced it, a regular kernel write RPC could suffe=
r the=0A=
>> same consequences if the RPC is retried (it keeps an m_copym() copy=0A=
>> of the request in the krpc for an RPC retry).=0A=
>>=0A=
>> At this time, the code in projects/nfs-over-tls works correctly, since i=
t=0A=
>> always fills the data to be written into mbuf clusters, m_copym()s those=
=0A=
>> and then copies those { real copying using memcpy() } via=0A=
>> mb_mapped_to_unmapped() just before calling sosend().=0A=
>> --> This works, but it would be nice to avoid the mb_mapped_to_unmapped(=
)=0A=
>>        copying for all the data being written via an NFS over TLS connec=
tion.=0A=
>>=0A=
>> For the TCP_TLS_MODE_SW case:=0A=
>> --> The NFS code can fill the written data into anonymous pages on M_EXT=
PG=0A=
>>         mbufs.=0A=
>> Then, the ktls_encrypt() could be modified to=0A=
>> allocate a new set of anonymous pages for the destination side of=0A=
>> the encryption (it already does this for the sendfile case) and put thos=
e=0A=
>> in a new mbuf list.=0A=
>> --> This would result in new anonymous pages and mbufs being allocated,=
=0A=
>>         but would not do memcpy()s.=0A=
>> After encryption, it would just do a m_freem() on the unencrypted list.=
=0A=
>> --> For the krpc client case, this call would only decrement the referen=
ce=0A=
>>        count on the unencrypted list and it could be used for a retry by=
 the krpc=0A=
>>        and then be free'd { m_freem() call } after a reply is received.=
=0A=
>>=0A=
>> If doing this for all the sosend()s of anonymous page M_EXTPG mbufs seem=
s=0A=
>> like unnecessary overhead, the above could be enabled via a setsockopt()=
=0A=
>> on the socket.=0A=
>>=0A=
>> What do others think of this?=0A=
>=0A=
>Several comments:=0A=
>=0A=
>mb_mapped_to_unmapped() is surprisingly inexpensive.  It was less than=0A=
>5% before I converted iflib to M_NOMAP aware.=0A=
Hmm. Just wondering what the 5% refers to?=0A=
5% difference in throughput for a data stream=0A=
5% increase in CPU overheads=0A=
or ???=0A=
=0A=
I do agree that, with multiple cores these days, avoiding the memcpy()s in=
=0A=
the client isn't that big a deal.=0A=
--> This issue is client side only. The NFS server can generate read and re=
addir=0A=
      replies (the only big ones) in anonymous ext_pgs mbufs now.=0A=
=0A=
>It seems like NFS should be constructing mbufs like sendfile does, and=0A=
>pointing mbufs at its pages.  This would cause the crypto code to=0A=
>allocate a new set of pages upon encryption.=0A=
I suppose the ideal would be to use the pages that already hold the data=0A=
in the buffer cache, but I haven't even looked at what it might take to=0A=
do that? (The buffer cache block would have to remain busied until the=0A=
mbuf is free'd or something like that.)=0A=
I kinda plan on looking at this someday...=0A=
=0A=
I suppose I could "pretend" they aren't anonymous pages by not=0A=
setting the EPG_ANON_FLAG, but that still wouldn't be enough to=0A=
fix this problem.=0A=
--> Not only does ktls_encrypt() need to use different pages, it needs=0A=
      to allocate new mbuf(s) for them, so that the unencrypted pages=0A=
      will still be associated with the mbuf list passed in.=0A=
(I don't really see "pretending" the pages aren't anonymous makes much=0A=
 difference?)=0A=
=0A=
>> For the hardware offload case:=0A=
>> - Can I assume that the anonymous pages in M_EXTPG mbufs will remain=0A=
>>     unchanged?=0A=
>> --> If so, and it won't change to TCP_TLS_MODE_SW, the NFS code could=0A=
>>         fill the data to be written into M_EXTPG mbufs safely.=0A=
>>=0A=
>> - And, if so, can I safely use the ktls_session mode field to decide if =
offload=0A=
>>    is happening?=0A=
>>    I see the TCP_TXTLS_MODE socket opt which seems to=0A=
>>    switch the mode to TCP_TLS_MODE_SW.=0A=
>>    When does this happen? Or, can this happen to a session once in use?=
=0A=
>=0A=
>Yes.  The intent is to allow something (TCP stack, smart user daemon) to=
=0A=
>look at a connection & move it from hardware to software, if it has a=0A=
>lot of TCP re-transmits.=0A=
Ok, so I don't think the NFS code should assume the pages will remain=0A=
unencrypted, even if it appears hardware assist is being used, unless the=
=0A=
software case is changed.=0A=
=0A=
As you note, just using mb_mapped_to_unmapped() works pretty well,=0A=
so I don't think this is something critical to do. (I have a non-working=0A=
patch. If I happen to get it working, I'll try and see what performance=0A=
difference I get.)=0A=
=0A=
>Drew=0A=
Thanks for the comments, rick=0A=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?QB1PR01MB336482096431BA9BCC1DAAB2DD720>