From owner-freebsd-current@freebsd.org Tue Aug 11 03:10:43 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 07B963B6B27 for ; Tue, 11 Aug 2020 03:10:43 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-QB1-obe.outbound.protection.outlook.com (mail-eopbgr660087.outbound.protection.outlook.com [40.107.66.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "GlobalSign Organization Validation CA - SHA256 - G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4BQdBV0hh5z3dtQ for ; Tue, 11 Aug 2020 03:10:41 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=GBcyeULdbuH2QitZQcJOAOVoPZx5nAb6/UUuk6QLKoSqqZ1vK9a+/epuEO7DPRZrCx4YQWXYDLTviHW3uCc3gIPjq/jl6/64b1f3KuwbAXsvhLuO/kYcwJKPEcxIlRejtkM6eH1wYKLogqNDDNF1UTIJKU8e3grgOiXMb2TlvIY8RolplXQv/sktIi2M+FhO6ESG3ovv6vBKB3ZTl/M4i5JrOWR5aTOv70VaBjRHVoUawcCcwMZMZ5Va0CFrd4GhvsBAR4MiF+sT5mzJphddfxuDsOgm8NQFw74iK9qBxsmpc7YrEkAJDJ4FTk3z3kvhu/t40za9Ezgo/m2iOgsXWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=L4jPcyQtQ7MlpZWRhkgUBpf+ucphztEMeJS6OBBCFos=; b=FNXyploTICOsy11Qmid3PSiBRzBcDCKntiYC/TEIydBsIgRrVpswdBVxlKCyvY0jKtOzMZBLWULZ/FJ9xaopSe727ubfMqXK7HiLmZ7bT09BrRhvDtU95iqyxMrEc+yCfc/cP0G6V87kp4Rgzq4rAFBuVqMqBHoUeA82uP7tkeKxOFaAUBH6mQYw9oCiRGKKxKaZyLt9csL1VNHublpdiSutjjXbx66eh+NKrtv1ZyaQbwHvH2ms7YIZRXxyFqZ/75vCEHG4fqXPlz4aQOw9OYat92TrtzF1N6ivKGgvmVq8AXNH1nEUqhOkMMwJAOcAsv0P4cThy+mYyCa+lTzdGg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=uoguelph.ca; dmarc=pass action=none header.from=uoguelph.ca; dkim=pass header.d=uoguelph.ca; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=uoguelph.ca; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=L4jPcyQtQ7MlpZWRhkgUBpf+ucphztEMeJS6OBBCFos=; b=Z02JvtQniXMIHqD7TMNa2UHeCP2h6ECpf6r8hc/rMUBllKtFexJ5cJZXLH1Rjhg8KJVm30Ch7LIki/yMeDGPBCFZu1scJSTFoi3eRfBrwCwYNKmnNFAqrhN1JNhrOILT18XkfgGov/0yXIVKOBggdNEqhRFf/v57I4gSWnPg41stNPO3d1YOPWC7tqyB/4CGDhL0bvgFDNSFVCVBooCOJNPCuF4pSGf5qIn6gMqr8ZEDN57YmxxD2/Poxy43OyzmkhMmlKiriG+kiEvLicpcpjYj8l4xgrMiiSFhu55eJ33FMyWCRAjx14qmc07l6VYQcVZ8L3XqB8vh8Q1yQX45IQ== Received: from QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c00:38::14) by YQXPR0101MB0840.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c00:25::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3261.19; Tue, 11 Aug 2020 03:10:40 +0000 Received: from QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM ([fe80::e89a:a655:91ca:4e63]) by QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM ([fe80::e89a:a655:91ca:4e63%5]) with mapi id 15.20.3261.024; Tue, 11 Aug 2020 03:10:39 +0000 From: Rick Macklem To: Konstantin Belousov CC: Kirk McKusick , "freebsd-current@FreeBSD.org" Subject: Re: can buffer cache pages be used in ext_pgs mbufs? Thread-Topic: can buffer cache pages be used in ext_pgs mbufs? Thread-Index: AQHWbT5NsHegiwo1zUiUbiF8t3J50KkuSUkAgAIrdK6AASLqAIAAoXdL Date: Tue, 11 Aug 2020 03:10:39 +0000 Message-ID: References: <202008080443.0784hEfh084650@chez.mckusick.com> <20200808144040.GD2551@kib.kiev.ua> , <20200810170956.GL2551@kib.kiev.ua> In-Reply-To: <20200810170956.GL2551@kib.kiev.ua> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 758a1a16-9035-43ba-4693-08d83da4205c x-ms-traffictypediagnostic: YQXPR0101MB0840: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:4941; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: G6lkPFmideJclRO+7dIK0tqXwfOZPl/q3XMT3vqPIW7jPyW1aaEwa5/B5mviOAVFXPnWWiLhkhnDjUw1xF6TE1rzLIB0A++XTe8eAy5Qv+seU4Z1aOiAd2InilTdyuhfY6gMmSsyGZQu2H0cPiUpzz3CfJIMul/6/WRWYBIXBEH2E72ht792DEEp8DhJ8RWr9b3ioRhVfesJB8yjSDakTDWKoKtrbnt2t0S4/2Yl66+fcS2aULcIHsOTOEJH04jS/HAk61TEmVA3q57iy/jOGoN3wj4t4EEUvb/Fwp065bfkNLk8kBCjTzBbFoIcPzI4EQ4kY7HLxT3xlsN6lnv/cw== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM; PTR:; CAT:NONE; SFTY:; SFS:(39860400002)(396003)(136003)(366004)(346002)(376002)(786003)(6916009)(316002)(2906002)(7696005)(6506007)(186003)(478600001)(54906003)(9686003)(55016002)(5660300002)(52536014)(4326008)(8676002)(71200400001)(91956017)(33656002)(83380400001)(8936002)(66476007)(76116006)(64756008)(66946007)(86362001)(66556008)(66446008); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: YCDG6e7aqSAcjsymMT0StevnvKbkuExm3jZ0x6JuX55iG59sQT9nRswwREVwp2BYjNQGexI+k/isBrp0l8NS+UKRNLkHi3BUaBL8/bNPlulmCyNFDpfrR4ORS0vR0OaE/Q0sWGgo3AbhLVZBUF9oWEf61Fk0Fq1mWtRA1QxwZyeUjtw5KOK5ofMGNZ8bSIa5qNFjDz6AKw6pi2QiqKKW8dmc0Cm6YNDQr/0e/OOnq7rsLxkiuECfkr5r54lBZ15QWoVbeYpQTuew7mJB+RhrhjtwpaFR2snfz8cFxTpV+KeFJxcIcfjSCn8aVjp84GuXmu7KZ9XVb6Z0yr5h0lgwas8AaXP/MbeWnstwvya8fij+3fJDR5DPX6BsV76Z0PBc9B5EWwfBx8bvQaGnXFDNQYY7r+6xPPYRN6aiRgk9WOZv2Y47t3mQRoEJunAZXaCyL8Ri4+9D24FBDlTGHOr2BFuA2/c1nUvNudql87yWA+VegUUL2p6ys88TqNnvgCjPCjD3d+/QOZ+HHDdsf5LciGCI4wbRsCep5b2Y90t25z66es3JoLstm97sZnMd92cVw0KKC921/131q5Yqs/uJJX7Qx457Haag98xjt2s9leFdkSqx7EabkYkTla7gIY6WsXBr3e9JcFWSTROo90s+sV3/sOZyduhmFI6oMvXIVW5vdgdXtiqcftIiT8ywej8IPzsDDcifNsbsZYlYGjGkBQ== x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-Network-Message-Id: 758a1a16-9035-43ba-4693-08d83da4205c X-MS-Exchange-CrossTenant-originalarrivaltime: 11 Aug 2020 03:10:39.8215 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: pJqgRS1zP2pH6ul8R2q9gxexqdm5022taOJ+c+K6/gwfc5QmAzMgdeskOu3NQtynfwKaaC00w9x6WsKrWg4oSg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: YQXPR0101MB0840 X-Rspamd-Queue-Id: 4BQdBV0hh5z3dtQ X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=uoguelph.ca header.s=selector1 header.b=Z02JvtQn; dmarc=none; spf=pass (mx1.freebsd.org: domain of rmacklem@uoguelph.ca designates 40.107.66.87 as permitted sender) smtp.mailfrom=rmacklem@uoguelph.ca X-Spamd-Result: default: False [-4.32 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; NEURAL_HAM_MEDIUM(-1.01)[-1.008]; R_DKIM_ALLOW(-0.20)[uoguelph.ca:s=selector1]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:40.107.0.0/16]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[uoguelph.ca]; NEURAL_HAM_LONG(-0.99)[-0.995]; ARC_ALLOW(-1.00)[microsoft.com:s=arcselector9901:i=1]; DWL_DNSWL_LOW(-1.00)[uoguelph.ca:dkim]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[uoguelph.ca:+]; NEURAL_HAM_SHORT(-0.82)[-0.818]; RCVD_IN_DNSWL_NONE(0.00)[40.107.66.87:from]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; SUBJECT_ENDS_QUESTION(1.00)[]; ASN(0.00)[asn:8075, ipnet:40.104.0.0/14, country:US]; RCVD_TLS_LAST(0.00)[]; RWL_MAILSPIKE_POSSIBLE(0.00)[40.107.66.87:from] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Aug 2020 03:10:43 -0000 Konstantin Belousov wrote:=0A= >On Mon, Aug 10, 2020 at 12:46:00AM +0000, Rick Macklem wrote:=0A= >> Konstantin Belousov wrote:=0A= >> >On Fri, Aug 07, 2020 at 09:43:14PM -0700, Kirk McKusick wrote:=0A= >> >> I do not have the answer to your question, but I am copying Kostik=0A= >> >> as if anyone knows the answer, it is probably him.=0A= >> >>=0A= >> >> ~Kirk=0A= >> >>=0A= >> >> =3D-=3D-=3D=0A= >> >I do not know the exact answer, this is why I did not followed up on th= e=0A= >> >original question on current@. In particular, I have no idea about the= =0A= >> >ext_pgs mechanism.=0A= >> >=0A= >> >Still I can point one semi-obvious aspect of your proposal.=0A= >> >=0A= >> >When the buffer is written (with bwrite()), its pages are sbusied and= =0A= >> >the write mappings of them are invalidated. The end effect is that no= =0A= >> >modifications to the pages are possible until they are unbusied. This,= =0A= >> >together with the lock of the buffer that holds the pages, effectively= =0A= >> >stops all writes either through write(2) or by mmaped regions.=0A= >> >=0A= >> >In other words, any access for write to the range of file designated by= =0A= >> >the buffer, causes the thread to block until the pages are unbusied and= =0A= >> >the buffer is unlocked. Which in described case would mean, until NFS= =0A= >> >server responds.=0A= >> >=0A= >> >If this is fine, then ok.=0A= >> For what I am thinking of, I would say that is fine, since the ktls code= reads=0A= >> the pages to encrypt/send them, but can use other allocated pages for=0A= >> the encrypted data.=0A= >>=0A= >> >Rick, do you know anything about the vm page lifecycle as mb_ext_pgs ?= =0A= >> Well, the anonymous pages (the only ones I've been using sofar) are=0A= >> allocated with:=0A= >> vm_page_alloc(NULL, 0, VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ |=0A= >> VM_ALLOC_NODUMP | VM_ALLOC_WIRED);=0A= >>=0A= >> and then the m_ext_ext_free function (mb_free_mext_pgs()) does:=0A= >> vm_page_unwire_noq(pg);=0A= >> vm_page_free(pg);=0A= >> on each of them.=0A= >>=0A= >> m->m_ext_ext_free() is called in tls_encrypt() when it no longer wants t= he=0A= >> pages, but is normally called via m_free(m), which calls mb_free_extpg(m= ),=0A= >> although there are a few other places.=0A= >>=0A= >> Since m_ext_ext_free is whatever function you want to make it, I suppose= the=0A= >> answer is "until your m_ext.ext_free" function is called.=0A= >>=0A= >> At this time, for ktls, if you are using software encryption, the call t= o ktls_encrypt(),=0A= >> which is done before passing the mbufs down to TCP is when it is done wi= th the=0A= >> unencrypted data pages. (I suppose there is no absolute guarantee that t= his=0A= >> happens before the kernel RPC layer times out waiting for an RPC reply, = but it=0A= >> is almost inconceivable, since this happens before the RPC request is pa= ssed=0A= >> down to TCP.)=0A= >>=0A= >> The case I now think is more problematic is the "hardware assist" case. = Although=0A= >> no hardware/driver yet does this afaik, I suspect that the unencrypted d= ata page=0A= >> mbufs could end up stuck in TCP for a long time, in case a retransmit is= needed.=0A= >>=0A= >> So, I now think I might need to delay the bufdone() call until the m_ext= _ext_free()=0A= >> call has been done for the pages, if they are buffer cache pages?=0A= >> --> Usually I would expect the m_ext_ext_free() call for the mbuf(s) tha= t=0A= >> hold the data to be written to the server to be done long before= =0A= >> bufdone() would be called for the buffer that is being written,= =0A= >> but there is no guarantee.=0A= >>=0A= >> Am I correct in assuming that the pages for the buffer will remain valid= and=0A= >> readable through the direct map until bufdone() is called?=0A= >> If I am correct w.r.t. this, it should work so long as the m_ext_ext_fre= e() calls=0A= >> for the pages happen before the bufdone() call on the bp, I think?=0A= >=0A= >I think there is further complication with non-anonymous pages.=0A= >You want (or perhaps need) the page content to be immutable and not=0A= >changed while you pass pages around and give the for ktls sw or hw=0A= >processing. Otherwise it could not pass the TLS authentification if=0A= >page was changed in process.=0A= >=0A= >Similar issue exists when normal buffer writes are scheduled through=0A= >the strategy(), and you can see that bufwrite() does vfs_busy_pages()=0A= >with clear_modify=3D1, which does two things:=0A= >- sbusy the pages (sbusy pages can get new read-only mappings, but cannot= =0A= > be mapped rw)=0A= >- pmap_remove_write() on the pages to invalidate all current writeable=0A= > mappings.=0A= >=0A= >This state should be kept until ktls is completely done with the pages.=0A= I am now thinking that this is done exactly as you describe above and=0A= doesn't require any changes.=0A= =0A= The change I am planning is below the strategy routine in the function=0A= that does the write RPC.=0A= It currently copies the data from the buffer into mbuf clusters.=0A= After this change, it would put the physical page #s for the buffer in the= =0A= mbuf(s) and then wait for them all to be m_ext_ext_free()d before calling= =0A= bufdone().=0A= --> The only difference is the wait before the bufdone() call in the RPC la= yer=0A= below the strategy routine. (bufdone() is the only call the NFS clie= nt=0A= seems to do below the strategy routine, so I assume it ends the stat= e=0A= you describe above?)=0A= =0A= rick=0A= =0A= =0A= =0A=