From owner-freebsd-arch@FreeBSD.ORG Mon Nov 30 22:13:23 2009 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5611D106566C; Mon, 30 Nov 2009 22:13:23 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.freebsd.org (Postfix) with ESMTP id EF4D18FC0A; Mon, 30 Nov 2009 22:13:22 +0000 (UTC) Received: from [IPv6:::1] (pooker.samsco.org [168.103.85.57]) (authenticated bits=0) by pooker.samsco.org (8.14.2/8.14.2) with ESMTP id nAUMDJhZ063860; Mon, 30 Nov 2009 15:13:19 -0700 (MST) (envelope-from scottl@samsco.org) Mime-Version: 1.0 (Apple Message framework v1076) Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes From: Scott Long In-Reply-To: <200911301427.23166.jhb@freebsd.org> Date: Mon, 30 Nov 2009 15:13:19 -0700 Content-Transfer-Encoding: 7bit Message-Id: <02A7F7FF-EBBC-40F3-8EBB-BFD4E5BE5391@samsco.org> References: <200905191458.50764.jhb@freebsd.org> <3bbf2fe10911291429k54b4b7cfw9e40aefeca597307@mail.gmail.com> <66707B0F-D0AB-49DB-802F-13146F488E1A@samsco.org> <200911301427.23166.jhb@freebsd.org> To: John Baldwin X-Mailer: Apple Mail (2.1076) X-Spam-Status: No, score=-4.5 required=3.8 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on pooker.samsco.org Cc: Attilio Rao , arch@freebsd.org Subject: Re: sglist(9) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Nov 2009 22:13:23 -0000 On Nov 30, 2009, at 12:27 PM, John Baldwin wrote: > On Sunday 29 November 2009 6:27:49 pm Scott Long wrote: >>> On Wednesday 20 May 2009 2:49:30 pm Jeff Roberson wrote: >>>> On Tue, 19 May 2009, John Baldwin wrote: >>>> >>>> 2) I worry that if all users do sglist_count() followed by a >>>> dynamic >>>> allocation and then an _append() they will be very expensive. >>>> pmap_kextract() is much more expensive than it may first seem to >>>> be. Do >>>> you have a user of count already? >>> >>> The only one that does now is sglist_build() and nothing currently >>> uses that. >> >> Kinda silly to have it then. I also don't see the point of it; if >> the >> point of the sglist object is to avoid VA mappings, then why start >> with a VA mapping? But that aside, Jeff is correct, sglist_build() >> is >> horribly inefficient. > > It actually does get used by the nvidia driver, but so far in what I > have > done in my jhb_bio branch I have tried several different approaches > which > is why the API is as verbose as it is. > >>> VOP_GET/PUTPAGES would not need to do this since they could simply >>> append >>> the physical addresses extracted directly from vm_page_t's for >>> example. I'm >>> not sure this will be used very much now as originally I thought I >>> would be >>> changing all storage drivers to do all DMA operations using sglists >>> and this >>> sort of thing would have been used for non-bio requests like >>> firmware >>> commands; however, as expounded on below, it actually appears better >>> to >>> still treat bio's separate from non-bio requests for bus_dma so that >>> the >>> non-bio requests can continue to use bus_dmamap_load_buffer() as >>> they do >>> now. >>> >> >> I completely disagree with your approach to busdma, but I'll get into >> that later. What I really don't understand is, why have yet another >> page description format? Whether the I/O is disk buffers being >> pushed >> by a pager to a disk controller, or texture buffers being pushed to >> video hardware, they already have vm objects associated with them, >> no? Why translate to an intermediate format? I understand that >> you're creating another vm object format to deal directly with the >> needs of nvidia, but that's really a one-off case right now. What >> about when we want the pagedaemon to push unmapped i/o? Will it have >> to spend cycles translating its objects to sglists? This is really a >> larger question that I'm not prepared to answer, but would like to >> discuss. > > Textures do not already have objects associated, no. However, I did > not > design sglist with Nvidia in mind. I hacked on it in conjunction with > phk@, gibbs@, and jeff@ to work on unmapped bio support. That was the > only motivation for sglist(9). Only the OBJT_SG bits were specific to > Nvidia and that was added as an afterthought because sglist(9) already > existed in the jhb_bio branch. > > If you look at GETPAGES/PUTPAGES they already deal in terms of > vm_page_t's, > not VM objects, and vm_page_t's already provide a linear time way of > fetching > the physical address (m->phys_addr), so generating an sglist for > GETPAGES and > PUTPAGES will be very cheap. One of the original proposals from > phk@ was > actually to pass around arrays of vm_page_t's to describe I/O > buffers. When > Poul, Peter, and I talked about it we figured we had a choice > between passing > either the physical address or the vm_page_t. However, not all > physical > addresses have vm_page_t's, and it was deemed that GEOM's and disk > drivers did > not need any properties of the vm_page_t aside from the physical > address. For > those reasons, sglist(9) uses physical addresses. > >>>> In general I think this is a good idea. It'd be nice to work on >>>> replacing >>>> the buf layer's implementation with something like this that could >>>> be used >>>> directly by drivers. Have you considered a busdma operation to >>>> load from >>>> a sglist? >>> >>> So in regards to the bus_dma stuff, I did work on this a while ago >>> in my >>> jhb_bio branch. I do have a bus_dmamap_load_sglist() and I had >>> planned on >>> using that in storage drivers directly. However, I ended up >>> circling back >>> to preferring a bus_dmamap_load_bio() and adding a new 'bio_start' >>> field >>> to 'struct bio' that is an offset into an attached sglist. >> >> I strongly disagree with forcing busdma to have intimate knowledge of >> bio's. All of the needed information can be stored in sglist >> headers. > > The alternative is to teach every disk driver to handle the difference > as well as every GEOM module. Not only that, but it doesn't provide > any > easy transition path since you can't change any of the top-level code > to use an unmapped bio at until all the lower levels have been > converted > to handle both ways. Jeff had originally proposed having a > bus_dmamap_load_bio() and I tried to not use it but just have a > bus_dmamap_load_sglist() instead, but when I started looking at the > extra > work that would have to be duplicated in every driver to handle both > types > of bios, I concluded bus_dmamap_load_bio() would actually be a lot > simpler. You completely missed the part of my email where I talk about not having to update drivers for these new APIs. In any case, I still respectfully disagree with your approach to busdma and bio handling, and ask that you let Attilio and I work on our prototype. Once that's done, we can stop talking in hypotheticals. Scott