From owner-freebsd-current@freebsd.org Fri Aug 14 05:54:27 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id EA7833B54DE for ; Fri, 14 Aug 2020 05:54:27 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-QB1-obe.outbound.protection.outlook.com (mail-eopbgr660053.outbound.protection.outlook.com [40.107.66.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "GlobalSign Organization Validation CA - SHA256 - G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4BSXh23JTgz4HFq for ; Fri, 14 Aug 2020 05:54:25 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=BWGd8QQKfxmnQd94C8zDFh8QXw5Zr5F1ltLFuJ2k+LnvGqjD/GIdy65eai3luTBG/+9AYUjxalQVem7AfolFU0yTp3Ro6XKukOj2Ui755tVo2YBBNYYKdzv1axooi73WMTCWFSI1AxCBMhnTZ+aqZfDedC0dwpIaRQH3jPEtJ+UuGW1zPXCoyuh5T7AxuNoLf8NG76U4geVQJZAul8ObLY7wdTe5MHpNEgFK2DlVkm0IWe4zNJ3/gR06IVw5NyyeEP3YLECCW5tveoaxb3LDNlwou/haBF/M/QVV+aXIwIza5/93eneHsdIKuUYHSF5DIq8hUSFwhA2+l185TM915w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=JKWszosgS9s5D/8+RIKroPUaCRxCN3DEFbjrOlifdqo=; b=mzRyofsjtBvvY4ZUrgeUUih5sl2850vmyIe3XrTtffTxeyqQMIsVUh3+b7tOjWJ1GFEg2XEIy4QGQPO9Fa+PRnyqn8l9lxTIJ4wfB+kd6cDrwrabFt/2K2vhDXoDSDq5TCixuCOvbECG5W+4R/rL57SHDdy2XH8fIjOtFX3h/+lk6BqdaoLHjXowfp8QyDLvbALr011h+IpT/1VMdga0X6XbzeulUSXw/kbrOGs6qQvBd9dr33RgFrkk5/RJswlw+SyWXqh0s/W5fRni64RQQbeFXTydgmA0jgkwgZYtbi0Ib1PFq43WRu2AeiyRXSYry2TeWkRB0Lyo8CUu20azIA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=uoguelph.ca; dmarc=pass action=none header.from=uoguelph.ca; dkim=pass header.d=uoguelph.ca; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=uoguelph.ca; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=JKWszosgS9s5D/8+RIKroPUaCRxCN3DEFbjrOlifdqo=; b=H9YdGmR+FKWapFykBAsb8CkQLB0cSkE9zsydpD2PBggfbAXGq+/i3aSlcAnYh3mmITmxuMttzEQyQMduI/xsavtuDug9+3VwrScVOGukLWEGYBtlM9bcWxicoWwe62HNH/VOCug/1abqBvRjO58m86PPTEOgSpLFkAxpxZc8BORcZm0YRRq5izkaqjqS6WFyp9NIwbJSImj3OBmg9XClfwX7ZiJBWMyjMoPkKpjMJhG/ldA5hvXrd5AeeClBUld28D9PL0YvyFBemao3kcq5UDG/Iq+fj1T7VzZrWIhYpMI4i9RSDx60258c/mcSfHjRiiv1YFAFVVf8bBJPF/yZ5Q== Received: from YTBPR01MB3375.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:b01:1e::27) by YTOPR0101MB1675.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:b00:1e::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3261.19; Fri, 14 Aug 2020 05:54:24 +0000 Received: from YTBPR01MB3375.CANPRD01.PROD.OUTLOOK.COM ([fe80::a8ee:601d:f368:a052]) by YTBPR01MB3375.CANPRD01.PROD.OUTLOOK.COM ([fe80::a8ee:601d:f368:a052%7]) with mapi id 15.20.3283.015; Fri, 14 Aug 2020 05:54:24 +0000 From: Rick Macklem To: Konstantin Belousov CC: Kirk McKusick , "freebsd-current@FreeBSD.org" Subject: Re: can buffer cache pages be used in ext_pgs mbufs? Thread-Topic: can buffer cache pages be used in ext_pgs mbufs? Thread-Index: AQHWbT5NsHegiwo1zUiUbiF8t3J50KkuSUkAgAIrdK6AASLqAIAAoXdLgAD9SACAA+uViA== Date: Fri, 14 Aug 2020 05:54:23 +0000 Message-ID: References: <202008080443.0784hEfh084650@chez.mckusick.com> <20200808144040.GD2551@kib.kiev.ua> <20200810170956.GL2551@kib.kiev.ua> , <20200811175422.GP2551@kib.kiev.ua> In-Reply-To: <20200811175422.GP2551@kib.kiev.ua> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 46573605-cafc-4c92-7e5c-08d840167f34 x-ms-traffictypediagnostic: YTOPR0101MB1675: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:6108; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: 9zhNRBzoIw1F4vgz/e8m+29AEMN94zBQGhoB4dpDl17nSB8BXwptWEiKujKWzlUjevAZ9rWGrARko0U+OPaKvNCTjNoqvTT5AiT/3L5rS5mzXJrpfPMS9XX/ttn/P1BMQVuDrSCUnzoAARiG2/PQyF1n+dWatO5EWE1Nrq92ewg2mMhxPmCkMrsoksuVD6oHRkPSYQmQG/AN9P5+S3Pb0b9XefZxlHKiFnyCNCnOblK/bdN5S3T3giNark8ifddrf4cRiTNeqGJhTjics3NVxYwYH/fsfNuBQ5NTv1KAHkxeYpgjYA57mSoXKxL7YjGp3s/ucC3sVkm7zfMisNQ5Ag== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:YTBPR01MB3375.CANPRD01.PROD.OUTLOOK.COM; PTR:; CAT:NONE; SFTY:; SFS:(396003)(39860400002)(366004)(346002)(376002)(136003)(4326008)(786003)(33656002)(9686003)(6916009)(316002)(55016002)(2906002)(186003)(54906003)(6506007)(86362001)(8936002)(478600001)(64756008)(66476007)(76116006)(91956017)(8676002)(5660300002)(52536014)(83380400001)(7696005)(66946007)(66446008)(66556008)(71200400001); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: Lgp1g3sMXAZUdy3YQTEsFCjRndfE59u1CBlWYVlbCQuDRbTO+28sImtiqojUhicFrD42hxQrpI9SfMq1YQovCuSSdfr3CETCmRI/r0qLYdTbXYB04/ENt8Tlx0rOSEdE8AAyUGOxJ69c2gEAnzrerBugdUeV7170x2f263BTmvr+9P4VCFkojPDsmINtS4s7f/w8VVJiszDHBWNr49xNam+34OIRbKMxw8TL395CXeE8z658oiPqEmteXzLJmQtoNf5ci2OwlIlWuKnUNFiJxxBwgIZSvHAJ6YEHfQi4Fsarpi79nIL23EB+qHZhwr6p1A+XDjmkl0ogWS96ye8cwBuUWijiN5HqI5LI/gABLx3Ofvlc0aXaAIAGo5NaheX8Q4ko1y4ppda+qSr9HhaNSsQUvNuCqkHkfD5D8Q1DQwoYnUDejwDNQJYKGaxW5yu8TE3+BW730cOSlKcybxPco9ESbfTDCBOlW1lu8jWmSCpM/m5MG+/TWl+Ns6Jn56H+YcfLo7pZ7uarciu2MmtnY/z1LPaqPtzy355jhhnKFkk0c7xJ1kHGmnsGA4wgZ5iWqY7nAV2lTnD6TtbSwjrc4B3+/RM/WTdNF2M6XbdWGMNuXHdQOh8NOFpiSCeVXvKXjE6bqHuHbDAuN1TKDdcwFXV2SaJWFFc6DspLUFd4Q6amHzr7N2z5yovIg4wO3ZslODXaIdQfveKL5ws/g3HPWQ== x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: YTBPR01MB3375.CANPRD01.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-Network-Message-Id: 46573605-cafc-4c92-7e5c-08d840167f34 X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Aug 2020 05:54:23.9180 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: MlxXEVK91wYLAjkkur6WBfc2SHV60pril84WG8RYrYIzEGiG4pazITyPV1wsk60OF0cTZ2DX6N47vqMilwdkyg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: YTOPR0101MB1675 X-Rspamd-Queue-Id: 4BSXh23JTgz4HFq X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=uoguelph.ca header.s=selector1 header.b=H9YdGmR+; dmarc=none; spf=pass (mx1.freebsd.org: domain of rmacklem@uoguelph.ca designates 40.107.66.53 as permitted sender) smtp.mailfrom=rmacklem@uoguelph.ca X-Spamd-Result: default: False [-4.48 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.999]; R_DKIM_ALLOW(-0.20)[uoguelph.ca:s=selector1]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:40.107.0.0/16]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[uoguelph.ca]; NEURAL_HAM_LONG(-1.02)[-1.016]; ARC_ALLOW(-1.00)[microsoft.com:s=arcselector9901:i=1]; DWL_DNSWL_LOW(-1.00)[uoguelph.ca:dkim]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[uoguelph.ca:+]; NEURAL_HAM_SHORT(-0.97)[-0.967]; RCVD_IN_DNSWL_NONE(0.00)[40.107.66.53:from]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; SUBJECT_ENDS_QUESTION(1.00)[]; ASN(0.00)[asn:8075, ipnet:40.104.0.0/14, country:US]; RCVD_TLS_LAST(0.00)[]; RWL_MAILSPIKE_POSSIBLE(0.00)[40.107.66.53:from] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Aug 2020 05:54:28 -0000 Konstantin Belousov wrote:=0A= >On Tue, Aug 11, 2020 at 03:10:39AM +0000, Rick Macklem wrote:=0A= >> Konstantin Belousov wrote:=0A= >> >On Mon, Aug 10, 2020 at 12:46:00AM +0000, Rick Macklem wrote:=0A= >> >> Konstantin Belousov wrote:=0A= >> >> >On Fri, Aug 07, 2020 at 09:43:14PM -0700, Kirk McKusick wrote:=0A= >> >> >> I do not have the answer to your question, but I am copying Kostik= =0A= >> >> >> as if anyone knows the answer, it is probably him.=0A= >> >> >>=0A= >> >> >> ~Kirk=0A= >> >> >>=0A= >> >> >> =3D-=3D-=3D=0A= >> >> >I do not know the exact answer, this is why I did not followed up on= the=0A= >> >> >original question on current@. In particular, I have no idea about = the=0A= >> >> >ext_pgs mechanism.=0A= >> >> >=0A= >> >> >Still I can point one semi-obvious aspect of your proposal.=0A= >> >> >=0A= >> >> >When the buffer is written (with bwrite()), its pages are sbusied an= d=0A= >> >> >the write mappings of them are invalidated. The end effect is that n= o=0A= >> >> >modifications to the pages are possible until they are unbusied. Thi= s,=0A= >> >> >together with the lock of the buffer that holds the pages, effective= ly=0A= >> >> >stops all writes either through write(2) or by mmaped regions.=0A= >> >> >=0A= >> >> >In other words, any access for write to the range of file designated= by=0A= >> >> >the buffer, causes the thread to block until the pages are unbusied = and=0A= >> >> >the buffer is unlocked. Which in described case would mean, until N= FS=0A= >> >> >server responds.=0A= >> >> >=0A= >> >> >If this is fine, then ok.=0A= >> >> For what I am thinking of, I would say that is fine, since the ktls c= ode reads=0A= >> >> the pages to encrypt/send them, but can use other allocated pages for= =0A= >> >> the encrypted data.=0A= >> >>=0A= >> >> >Rick, do you know anything about the vm page lifecycle as mb_ext_pgs= ?=0A= >> >> Well, the anonymous pages (the only ones I've been using sofar) are= =0A= >> >> allocated with:=0A= >> >> vm_page_alloc(NULL, 0, VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ |=0A= >> >> VM_ALLOC_NODUMP | VM_ALLOC_WIRED);=0A= >> >>=0A= >> >> and then the m_ext_ext_free function (mb_free_mext_pgs()) does:=0A= >> >> vm_page_unwire_noq(pg);=0A= >> >> vm_page_free(pg);=0A= >> >> on each of them.=0A= >> >>=0A= >> >> m->m_ext_ext_free() is called in tls_encrypt() when it no longer want= s the=0A= >> >> pages, but is normally called via m_free(m), which calls mb_free_extp= g(m),=0A= >> >> although there are a few other places.=0A= >> >>=0A= >> >> Since m_ext_ext_free is whatever function you want to make it, I supp= ose the=0A= >> >> answer is "until your m_ext.ext_free" function is called.=0A= >> >>=0A= >> >> At this time, for ktls, if you are using software encryption, the cal= l to ktls_encrypt(),=0A= >> >> which is done before passing the mbufs down to TCP is when it is done= with the=0A= >> >> unencrypted data pages. (I suppose there is no absolute guarantee tha= t this=0A= >> >> happens before the kernel RPC layer times out waiting for an RPC repl= y, but it=0A= >> >> is almost inconceivable, since this happens before the RPC request is= passed=0A= >> >> down to TCP.)=0A= >> >>=0A= >> >> The case I now think is more problematic is the "hardware assist" cas= e. Although=0A= >> >> no hardware/driver yet does this afaik, I suspect that the unencrypte= d data page=0A= >> >> mbufs could end up stuck in TCP for a long time, in case a retransmit= is needed.=0A= >> >>=0A= >> >> So, I now think I might need to delay the bufdone() call until the m_= ext_ext_free()=0A= >> >> call has been done for the pages, if they are buffer cache pages?=0A= >> >> --> Usually I would expect the m_ext_ext_free() call for the mbuf(s) = that=0A= >> >> hold the data to be written to the server to be done long befo= re=0A= >> >> bufdone() would be called for the buffer that is being written= ,=0A= >> >> but there is no guarantee.=0A= >> >>=0A= >> >> Am I correct in assuming that the pages for the buffer will remain va= lid and=0A= >> >> readable through the direct map until bufdone() is called?=0A= >> >> If I am correct w.r.t. this, it should work so long as the m_ext_ext_= free() calls=0A= >> >> for the pages happen before the bufdone() call on the bp, I think?=0A= >> >=0A= >> >I think there is further complication with non-anonymous pages.=0A= >> >You want (or perhaps need) the page content to be immutable and not=0A= >> >changed while you pass pages around and give the for ktls sw or hw=0A= >> >processing. Otherwise it could not pass the TLS authentification if=0A= >> >page was changed in process.=0A= >> >=0A= >> >Similar issue exists when normal buffer writes are scheduled through=0A= >> >the strategy(), and you can see that bufwrite() does vfs_busy_pages()= =0A= >> >with clear_modify=3D1, which does two things:=0A= >> >- sbusy the pages (sbusy pages can get new read-only mappings, but cann= ot=0A= >> > be mapped rw)=0A= >> >- pmap_remove_write() on the pages to invalidate all current writeable= =0A= >> > mappings.=0A= >> >=0A= >> >This state should be kept until ktls is completely done with the pages.= =0A= >> I am now thinking that this is done exactly as you describe above and=0A= >> doesn't require any changes.=0A= >>=0A= >> The change I am planning is below the strategy routine in the function= =0A= >> that does the write RPC.=0A= >> It currently copies the data from the buffer into mbuf clusters.=0A= >> After this change, it would put the physical page #s for the buffer in t= he=0A= >> mbuf(s) and then wait for them all to be m_ext_ext_free()d before callin= g=0A= >> bufdone().=0A= >> --> The only difference is the wait before the bufdone() call in the RPC= layer=0A= >> below the strategy routine. (bufdone() is the only call the NFS c= lient=0A= >> seems to do below the strategy routine, so I assume it ends the s= tate=0A= >> you describe above?)=0A= >>=0A= >As far as pages are put into mbuf clusters only after bwrite() that=0A= >did vfs_busy_pages(), and bufdone() is called not earlier than network=0A= >finished with the mbufs, it should be ok.=0A= I've coded it up and, at least for a little testing sofar, it seems to work= ok.=0A= =0A= Thanks for your comments, rick=0A=