From owner-freebsd-current@freebsd.org Mon Jul 27 23:51:35 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id A6DF7373198 for ; Mon, 27 Jul 2020 23:51:35 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-QB1-obe.outbound.protection.outlook.com (mail-qb1can01on062b.outbound.protection.outlook.com [IPv6:2a01:111:f400:fe5c::62b]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "GlobalSign Organization Validation CA - SHA256 - G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4BFxRB4hhFz4W2n; Mon, 27 Jul 2020 23:51:34 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=X3H8ymS3gaC7OYM26t1hKlSe1H2T2eBW7mqTuoCWy8Mj8FKoySWj7ygQcjNE7lUAkVV3w9Yj2kBlTJ1968UI4SxjUwlA1wWh407FvICT89USObro+x9ZXleup23d9xL5mMmSPiBigpqhAHaWcU7RHucpcNhskUdPFf5lvF3QkribM5jPpRo5l250Oh94BbQclQIGSxrhjbnnXNyZTW02Re+4o6zrGouFkRWbkwJCveyeDaOdfiYKXA7I/d/WoTT4Bs4ogH71y+jzdNYvLoma8rxaq5k7+FUVETDfT2N/j+3hlFQtD1d9nYKQE9xNvWRTkOUdDoD2NJkXY0SFEPruLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=N8YUPDq8jOepuGR7hH9nsNAnKb8jVVBCwVWnhg/yVlE=; b=VUTFBRn6OD6uEuZD4UFcEjXdJggV/gVWa7SRPe78uuUmoW5OWmhXTqEERJOZhJXR6YIHT6C+2i5f3WnPQjEN2c+lHu90uTSvJEzEe5vVHqNQ3YK0aP2OguYkq7zPKIdsJ0Wh5O1BJIZs0siEli4DkkCZeHvSjbrtRWUqV4VJG+2ELqaj7sZyusPf/1XLw8Ek3b9gqn62AcSVLDwoBShYuw/eAjdm9xPbD9VCr7/CeSsAS9z2uVxMjSm8vQHf9JrdF4BwcFPVDMoqUa79QUP2rN462MH1B4vA6BDfBZpMfeeveYBjJXdZykCIv61GOBiDDFKCHjZ/kN5Y7y1WUYkZwQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=uoguelph.ca; dmarc=pass action=none header.from=uoguelph.ca; dkim=pass header.d=uoguelph.ca; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=uoguelph.ca; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=N8YUPDq8jOepuGR7hH9nsNAnKb8jVVBCwVWnhg/yVlE=; b=fwjzbP7R2NURjbtxa39SMEZXhAkY/0HyodrMh7p6gsi7aYByG4VaX+4Fzv1LAT80OajxaCeDNoo7Oh3+AiJTtvNUBFWkRbQTdIfOYoF8kFqEs9DrguQv6SUO4f3xjAvAMwQgvnErsRL7DzRj11WA6d5p02grYAwCxMrZ4bR6I3EeVvtwH/9QZ90UC79sRAnHgETKcDomlc/gsrcPe2Lf8s7O+PQ64wrkDX1CmGTJ8B5Y/CVGl30Y5VFQc5ouQQjioEE5a9VH9gE2n3bmnUeWnSPqZiqgl2M3sRPG1aPJ0tVGLDehr3qbULhMRklenwAGw4gaQKlNt83ud4OjkwyW/w== Received: from QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c00:38::14) by YQBPR0101MB1122.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c00:6::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3216.24; Mon, 27 Jul 2020 23:51:32 +0000 Received: from QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM ([fe80::60f3:4ca2:8a4a:1e91]) by QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM ([fe80::60f3:4ca2:8a4a:1e91%7]) with mapi id 15.20.3216.033; Mon, 27 Jul 2020 23:51:22 +0000 From: Rick Macklem To: Andrew Gallatin , "freebsd-current@FreeBSD.org" CC: "jhb@FreeBSD.org" , "gallatin@freebsd.org" , Gleb Smirnoff Subject: Re: RFC: ktls and krpc using M_EXTPG mbufs Thread-Topic: RFC: ktls and krpc using M_EXTPG mbufs Thread-Index: AQHWXiFLFxOPU111VUSvS2ovnXgBP6kbuHWAgABlGLQ= Date: Mon, 27 Jul 2020 23:51:22 +0000 Message-ID: References: , <319c92f4-4157-74a3-2bec-8f40e3979261@cs.duke.edu> In-Reply-To: <319c92f4-4157-74a3-2bec-8f40e3979261@cs.duke.edu> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: ac7e98d4-435c-44dd-2b52-08d83287f78a x-ms-traffictypediagnostic: YQBPR0101MB1122: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:7691; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: WwOkKmYX8tNet4X12NXAHX9tnL/Q08Qwft6VdGzXrMK/wf3gYW4NtJRbQctvSDCBAqjdsfjMO6jfNmP/DFkaryY7P5VBHRlg8Cal5Bqx2V0lRYgJHkAG0nMgypj8xM2Y8ZYRSLNj9VEL+VMZOOt0ogoqqMMfLjpDtPQ6CGf5zYi6SA4+kGVaTBJzYDfJW08IcoVu7DBqn5m9svpY3YiIghh48e9DY6PO655OzG0uFcRnKOq564lY+2s0z1XId6bDfPq8SDQYW+z8U6AkE6B6Y0SYsXrz2BdoT1c6IoRX4gM5UOuVtW/UW5GYUyyShiGz8gWtF0TxhJfUEA3dp/QO3w== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM; PTR:; CAT:NONE; SFTY:; SFS:(136003)(346002)(376002)(39860400002)(396003)(366004)(5660300002)(52536014)(64756008)(66556008)(66446008)(66476007)(76116006)(91956017)(8936002)(33656002)(71200400001)(478600001)(66946007)(83380400001)(186003)(6506007)(2906002)(7696005)(4326008)(86362001)(8676002)(316002)(786003)(9686003)(55016002)(54906003)(110136005); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: R73Jb+SXVhjdxxvS6KalQXCpITqEORpkkvyA9hMXu1BK0lK+vt6RQxhzuhKm/yfm0kBxRefbjDsfgau0gJ0C1+TCynFqvqy3TUndfy+l3L0yYLeS6ekfT/bJAcC8Zz964s74BTGiGRSAHMh3Qj/gjqFLPoVB+VrHstySX1wT77/eZJ8ZGpg2xVfZHWYKRreDeNytJHlFYCgRx9VDxPrmGByei3vVbtBJMj966wfBCg0uu31cGOT6/Fsx4bTcqZYu+Ft7DIEeq0vPIIowyV75zENo2/CgEQWBq7IrLeVK0u83BCnPvvTgOZRAQJyYKJ0J6XTe1U+xvZlVe8eZoYhiwxz7mqccijamYSh6euk75aLG17C1v++nIymoLG9jPTEBl+gTr79gX7eGPAANYyqyzhTy+XPGnpybXY7ZKJ55osQIvKkkZwfRh9xpZLoZWV/faw5d/qPlaSMTl0EuJngKjLPvQAEkqlA7rNk+i+O0KBAmF6dqR5LMswVzqZx+SR3zbVsXtKABFQFEmHzaun6Do1frEAb8mO8htCIpjn5b5LMLPkZv6uqc6qctwuT3dn1r x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-Network-Message-Id: ac7e98d4-435c-44dd-2b52-08d83287f78a X-MS-Exchange-CrossTenant-originalarrivaltime: 27 Jul 2020 23:51:22.7516 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: g/4owrbKKX5ufWjRcvllITYgYIHtTk7O8/e6K2YRKG9kG11BYx7Fkw/08GCZhGwflxhA/ru8pHc6bWeAkZKiYg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: YQBPR0101MB1122 X-Rspamd-Queue-Id: 4BFxRB4hhFz4W2n X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=uoguelph.ca header.s=selector1 header.b=fwjzbP7R; dmarc=none; spf=pass (mx1.freebsd.org: domain of rmacklem@uoguelph.ca designates 2a01:111:f400:fe5c::62b as permitted sender) smtp.mailfrom=rmacklem@uoguelph.ca X-Spamd-Result: default: False [-5.51 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; NEURAL_HAM_MEDIUM(-1.03)[-1.028]; R_DKIM_ALLOW(-0.20)[uoguelph.ca:s=selector1]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2a01:111:f400::/48]; NEURAL_HAM_LONG(-1.03)[-1.028]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[uoguelph.ca]; RCPT_COUNT_FIVE(0.00)[5]; DWL_DNSWL_LOW(-1.00)[uoguelph.ca:dkim]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[uoguelph.ca:+]; NEURAL_HAM_SHORT(-0.96)[-0.957]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:8075, ipnet:2a01:111:f000::/36, country:US]; ARC_ALLOW(-1.00)[microsoft.com:s=arcselector9901:i=1] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Jul 2020 23:51:35 -0000 Andrew Gallatin wrote:=0A= >On 2020-07-19 19:34, Rick Macklem wrote:=0A= >> I spent a little time chasing a problem in the nfs-over-tls code, where = it=0A= >> would sometimes end up with corrupted data in the file(s) of a mirrored= =0A= >> pNFS configuration.=0A= >>=0A= >> I think the problem was that the code filled the data to be written into= =0A= >> anonymous page M_EXTPG mbufs, then did a m_copym() { copy by=0A= >> reference } and used the copies for the mirrored writes.=0A= >> --> In ktls_encrypt(), the encryption was done to the same pages and,=0A= >> sometimes, the encrypted data got encrypted again during the=0A= >> sosend() of the other copy.=0A= >>=0A= >> Although I haven't reproduced it, a regular kernel write RPC could suffe= r the=0A= >> same consequences if the RPC is retried (it keeps an m_copym() copy=0A= >> of the request in the krpc for an RPC retry).=0A= >>=0A= >> At this time, the code in projects/nfs-over-tls works correctly, since i= t=0A= >> always fills the data to be written into mbuf clusters, m_copym()s those= =0A= >> and then copies those { real copying using memcpy() } via=0A= >> mb_mapped_to_unmapped() just before calling sosend().=0A= >> --> This works, but it would be nice to avoid the mb_mapped_to_unmapped(= )=0A= >> copying for all the data being written via an NFS over TLS connec= tion.=0A= >>=0A= >> For the TCP_TLS_MODE_SW case:=0A= >> --> The NFS code can fill the written data into anonymous pages on M_EXT= PG=0A= >> mbufs.=0A= >> Then, the ktls_encrypt() could be modified to=0A= >> allocate a new set of anonymous pages for the destination side of=0A= >> the encryption (it already does this for the sendfile case) and put thos= e=0A= >> in a new mbuf list.=0A= >> --> This would result in new anonymous pages and mbufs being allocated,= =0A= >> but would not do memcpy()s.=0A= >> After encryption, it would just do a m_freem() on the unencrypted list.= =0A= >> --> For the krpc client case, this call would only decrement the referen= ce=0A= >> count on the unencrypted list and it could be used for a retry by= the krpc=0A= >> and then be free'd { m_freem() call } after a reply is received.= =0A= >>=0A= >> If doing this for all the sosend()s of anonymous page M_EXTPG mbufs seem= s=0A= >> like unnecessary overhead, the above could be enabled via a setsockopt()= =0A= >> on the socket.=0A= >>=0A= >> What do others think of this?=0A= >=0A= >Several comments:=0A= >=0A= >mb_mapped_to_unmapped() is surprisingly inexpensive. It was less than=0A= >5% before I converted iflib to M_NOMAP aware.=0A= Hmm. Just wondering what the 5% refers to?=0A= 5% difference in throughput for a data stream=0A= 5% increase in CPU overheads=0A= or ???=0A= =0A= I do agree that, with multiple cores these days, avoiding the memcpy()s in= =0A= the client isn't that big a deal.=0A= --> This issue is client side only. The NFS server can generate read and re= addir=0A= replies (the only big ones) in anonymous ext_pgs mbufs now.=0A= =0A= >It seems like NFS should be constructing mbufs like sendfile does, and=0A= >pointing mbufs at its pages. This would cause the crypto code to=0A= >allocate a new set of pages upon encryption.=0A= I suppose the ideal would be to use the pages that already hold the data=0A= in the buffer cache, but I haven't even looked at what it might take to=0A= do that? (The buffer cache block would have to remain busied until the=0A= mbuf is free'd or something like that.)=0A= I kinda plan on looking at this someday...=0A= =0A= I suppose I could "pretend" they aren't anonymous pages by not=0A= setting the EPG_ANON_FLAG, but that still wouldn't be enough to=0A= fix this problem.=0A= --> Not only does ktls_encrypt() need to use different pages, it needs=0A= to allocate new mbuf(s) for them, so that the unencrypted pages=0A= will still be associated with the mbuf list passed in.=0A= (I don't really see "pretending" the pages aren't anonymous makes much=0A= difference?)=0A= =0A= >> For the hardware offload case:=0A= >> - Can I assume that the anonymous pages in M_EXTPG mbufs will remain=0A= >> unchanged?=0A= >> --> If so, and it won't change to TCP_TLS_MODE_SW, the NFS code could=0A= >> fill the data to be written into M_EXTPG mbufs safely.=0A= >>=0A= >> - And, if so, can I safely use the ktls_session mode field to decide if = offload=0A= >> is happening?=0A= >> I see the TCP_TXTLS_MODE socket opt which seems to=0A= >> switch the mode to TCP_TLS_MODE_SW.=0A= >> When does this happen? Or, can this happen to a session once in use?= =0A= >=0A= >Yes. The intent is to allow something (TCP stack, smart user daemon) to= =0A= >look at a connection & move it from hardware to software, if it has a=0A= >lot of TCP re-transmits.=0A= Ok, so I don't think the NFS code should assume the pages will remain=0A= unencrypted, even if it appears hardware assist is being used, unless the= =0A= software case is changed.=0A= =0A= As you note, just using mb_mapped_to_unmapped() works pretty well,=0A= so I don't think this is something critical to do. (I have a non-working=0A= patch. If I happen to get it working, I'll try and see what performance=0A= difference I get.)=0A= =0A= >Drew=0A= Thanks for the comments, rick=0A=