From owner-freebsd-current@freebsd.org Tue Aug 11 17:54:31 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id A2FA93B3ADA for ; Tue, 11 Aug 2020 17:54:31 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4BR0pG3Mskz3b3s for ; Tue, 11 Aug 2020 17:54:30 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.16.1/8.16.1) with ESMTPS id 07BHsMkb059251 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Tue, 11 Aug 2020 20:54:25 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua 07BHsMkb059251 Received: (from kostik@localhost) by tom.home (8.16.1/8.16.1/Submit) id 07BHsM7v059250; Tue, 11 Aug 2020 20:54:22 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 11 Aug 2020 20:54:22 +0300 From: Konstantin Belousov To: Rick Macklem Cc: Kirk McKusick , "freebsd-current@FreeBSD.org" Subject: Re: can buffer cache pages be used in ext_pgs mbufs? Message-ID: <20200811175422.GP2551@kib.kiev.ua> References: <202008080443.0784hEfh084650@chez.mckusick.com> <20200808144040.GD2551@kib.kiev.ua> <20200810170956.GL2551@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on tom.home X-Rspamd-Queue-Id: 4BR0pG3Mskz3b3s X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=gmail.com (policy=none); spf=softfail (mx1.freebsd.org: 2001:470:d5e7:1::1 is neither permitted nor denied by domain of kostikbel@gmail.com) smtp.mailfrom=kostikbel@gmail.com X-Spamd-Result: default: False [-0.01 / 15.00]; ARC_NA(0.00)[]; TO_DN_EQ_ADDR_SOME(0.00)[]; DMARC_POLICY_SOFTFAIL(0.10)[gmail.com : No valid SPF, No valid DKIM,none]; NEURAL_HAM_MEDIUM(-0.25)[-0.247]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; FREEMAIL_FROM(0.00)[gmail.com]; NEURAL_HAM_LONG(-0.82)[-0.821]; MIME_GOOD(-0.10)[text/plain]; SUBJECT_ENDS_QUESTION(1.00)[]; HAS_XAW(0.00)[]; R_SPF_SOFTFAIL(0.00)[~all:c]; NEURAL_SPAM_SHORT(0.06)[0.062]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:6939, ipnet:2001:470::/32, country:US]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Aug 2020 17:54:31 -0000 On Tue, Aug 11, 2020 at 03:10:39AM +0000, Rick Macklem wrote: > Konstantin Belousov wrote: > >On Mon, Aug 10, 2020 at 12:46:00AM +0000, Rick Macklem wrote: > >> Konstantin Belousov wrote: > >> >On Fri, Aug 07, 2020 at 09:43:14PM -0700, Kirk McKusick wrote: > >> >> I do not have the answer to your question, but I am copying Kostik > >> >> as if anyone knows the answer, it is probably him. > >> >> > >> >> ~Kirk > >> >> > >> >> =-=-= > >> >I do not know the exact answer, this is why I did not followed up on the > >> >original question on current@. In particular, I have no idea about the > >> >ext_pgs mechanism. > >> > > >> >Still I can point one semi-obvious aspect of your proposal. > >> > > >> >When the buffer is written (with bwrite()), its pages are sbusied and > >> >the write mappings of them are invalidated. The end effect is that no > >> >modifications to the pages are possible until they are unbusied. This, > >> >together with the lock of the buffer that holds the pages, effectively > >> >stops all writes either through write(2) or by mmaped regions. > >> > > >> >In other words, any access for write to the range of file designated by > >> >the buffer, causes the thread to block until the pages are unbusied and > >> >the buffer is unlocked. Which in described case would mean, until NFS > >> >server responds. > >> > > >> >If this is fine, then ok. > >> For what I am thinking of, I would say that is fine, since the ktls code reads > >> the pages to encrypt/send them, but can use other allocated pages for > >> the encrypted data. > >> > >> >Rick, do you know anything about the vm page lifecycle as mb_ext_pgs ? > >> Well, the anonymous pages (the only ones I've been using sofar) are > >> allocated with: > >> vm_page_alloc(NULL, 0, VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ | > >> VM_ALLOC_NODUMP | VM_ALLOC_WIRED); > >> > >> and then the m_ext_ext_free function (mb_free_mext_pgs()) does: > >> vm_page_unwire_noq(pg); > >> vm_page_free(pg); > >> on each of them. > >> > >> m->m_ext_ext_free() is called in tls_encrypt() when it no longer wants the > >> pages, but is normally called via m_free(m), which calls mb_free_extpg(m), > >> although there are a few other places. > >> > >> Since m_ext_ext_free is whatever function you want to make it, I suppose the > >> answer is "until your m_ext.ext_free" function is called. > >> > >> At this time, for ktls, if you are using software encryption, the call to ktls_encrypt(), > >> which is done before passing the mbufs down to TCP is when it is done with the > >> unencrypted data pages. (I suppose there is no absolute guarantee that this > >> happens before the kernel RPC layer times out waiting for an RPC reply, but it > >> is almost inconceivable, since this happens before the RPC request is passed > >> down to TCP.) > >> > >> The case I now think is more problematic is the "hardware assist" case. Although > >> no hardware/driver yet does this afaik, I suspect that the unencrypted data page > >> mbufs could end up stuck in TCP for a long time, in case a retransmit is needed. > >> > >> So, I now think I might need to delay the bufdone() call until the m_ext_ext_free() > >> call has been done for the pages, if they are buffer cache pages? > >> --> Usually I would expect the m_ext_ext_free() call for the mbuf(s) that > >> hold the data to be written to the server to be done long before > >> bufdone() would be called for the buffer that is being written, > >> but there is no guarantee. > >> > >> Am I correct in assuming that the pages for the buffer will remain valid and > >> readable through the direct map until bufdone() is called? > >> If I am correct w.r.t. this, it should work so long as the m_ext_ext_free() calls > >> for the pages happen before the bufdone() call on the bp, I think? > > > >I think there is further complication with non-anonymous pages. > >You want (or perhaps need) the page content to be immutable and not > >changed while you pass pages around and give the for ktls sw or hw > >processing. Otherwise it could not pass the TLS authentification if > >page was changed in process. > > > >Similar issue exists when normal buffer writes are scheduled through > >the strategy(), and you can see that bufwrite() does vfs_busy_pages() > >with clear_modify=1, which does two things: > >- sbusy the pages (sbusy pages can get new read-only mappings, but cannot > > be mapped rw) > >- pmap_remove_write() on the pages to invalidate all current writeable > > mappings. > > > >This state should be kept until ktls is completely done with the pages. > I am now thinking that this is done exactly as you describe above and > doesn't require any changes. > > The change I am planning is below the strategy routine in the function > that does the write RPC. > It currently copies the data from the buffer into mbuf clusters. > After this change, it would put the physical page #s for the buffer in the > mbuf(s) and then wait for them all to be m_ext_ext_free()d before calling > bufdone(). > --> The only difference is the wait before the bufdone() call in the RPC layer > below the strategy routine. (bufdone() is the only call the NFS client > seems to do below the strategy routine, so I assume it ends the state > you describe above?) > As far as pages are put into mbuf clusters only after bwrite() that did vfs_busy_pages(), and bufdone() is called not earlier than network finished with the mbufs, it should be ok.