From owner-freebsd-arch@freebsd.org Sat Nov 14 19:43:08 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 6BB01464483 for ; Sat, 14 Nov 2020 19:43:08 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4CYQjm13GYz3sHR; Sat, 14 Nov 2020 19:43:07 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.16.1/8.16.1) with ESMTPS id 0AEJh09g078499 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Sat, 14 Nov 2020 21:43:03 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua 0AEJh09g078499 Received: (from kostik@localhost) by tom.home (8.16.1/8.16.1/Submit) id 0AEJh0TG078498; Sat, 14 Nov 2020 21:43:00 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 14 Nov 2020 21:43:00 +0200 From: Konstantin Belousov To: Scott Long Cc: Alexander Motin , "freebsd-arch@freebsd.org" Subject: Re: MAXPHYS bump for FreeBSD 13 Message-ID: References: <634E35E9-9E2B-40CE-9C70-BB130BD9D614@samsco.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <634E35E9-9E2B-40CE-9C70-BB130BD9D614@samsco.org> X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on tom.home X-Rspamd-Queue-Id: 4CYQjm13GYz3sHR X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Nov 2020 19:43:08 -0000 On Sat, Nov 14, 2020 at 11:48:34AM -0700, Scott Long wrote: > > > > On Nov 14, 2020, at 11:37 AM, Konstantin Belousov wrote: > > > > On Sat, Nov 14, 2020 at 10:01:05AM -0500, Alexander Motin wrote: > >> On Fri, 13 Nov 2020 21:09:37 +0200 Konstantin Belousov wrote: > >>> To put the specific numbers, for struct buf it means increase by 1792 > >>> bytes. For bio it does not, because it does not embed vm_page_t[] into > >>> the structure. > >>> > >>> Worse, typical struct buf addend for excess vm_page pointers is going > >>> to be unused, because normal size of the UFS block is 32K. It is > >>> going to be only used by clusters and physbufs. > >>> > >>> So I object against bumping this value without reworking buffers > >>> handling of b_pages[]. Most straightforward approach is stop using > >>> MAXPHYS to size this array, and use external array for clusters. > >>> Pbufs can embed large array. > >> > >> I am not very familiar with struct buf usage, so I'd appreciate some > >> help there. > >> > >> Quickly looking on pbuf, it seems trivial to allocate external b_pages > >> array of any size in pbuf_init, that should easily satisfy all of pbuf > >> descendants. Cluster and vnode/swap pagers code are pbuf descendants > >> also. Vnode pager I guess may only need replacement for > >> nitems(bp->b_pages) in few places. > > I planned to look at making MAXPHYS a tunable. > > > > You are right, we would need: > > 1. move b_pages to the end of struct buf and declaring it as flexible. > > This would make KBI worse because struct buf depends on some debugging > > options, and than b_pages offset depends on config. > > > > Another option could be to change b_pages to pointer, if we are fine with > > one more indirection. But in my plan, real array is always allocated past > > struct buf, so flexible array is more correct even. > > > > I like this, and I was in the middle of writing up an email that described it. > There could be multiple malloc types or UMA zones of different sizes, > depending on the intended i/o size, or just a runtime change to the size of > a single allocation size. I do not think we need new/many zones. Queued (getnewbuf()) bufs come from buf_zone, and pbufs are allocated from pbuf_zone. That should be fixed alloc size, with small b_pages[] for buf_zone, and large (MAXPHYS) for pbuf. Everything else, if any, would need to pre-calculate malloc size. > > > 2. Preallocating both normal bufs and pbufs together with the arrays. > > > > 3. I considered adding B_SMALLPAGES flag to b_flags and use it to indicate > > that buffer has 'small' b_pages. All buffers rotated through getnewbuf()/ > > buf_alloc() should have it set. > > > > This would work nicely with a variable sized allocator, yes. > > > 4. There could be some places which either malloc() or allocate struct buf > > on stack (I tend to believe that I converted all later places to formed). > > They would need to get special handling. > > > > I couldn’t find any places that allocated a buf on the stack or embedded it > into another structure. As I said, I did a pass to eliminate stack allocations for bufs. As result, for instance flushbufqueues() mallocs struct buf, but it does not use b_pages[] of the allocated sentinel. > > > md(4) uses pbufs. > > > > 4. My larger concern is, in fact, cam and drivers. > > > > Can you describe your concern? My opinion is that during this work all uses of MAXPHYS should be reviewed, and there are a lot of drivers that reference the constant. From the past experience, I expect some evil ingenious (ab)use. Same for bufs, but apart from application of pbufs in cam_periph.c, I do not think drivers have much use of it. > > >> > >> Could you or somebody help with vfs/ffs code, where I suppose the > >> smaller page lists are used? > > Do you plan to work on this ? I can help, sure. > > > > Still, I wanted to make MAXPHYS (and 'small' MAXPHYS, this is not same as > > DFLPHYS), a tunable, in the scope of this work. > > Sounds great, thank you for looking at it. > > Scott >