Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 30 Nov 2009 15:13:19 -0700
From:      Scott Long <scottl@samsco.org>
To:        John Baldwin <jhb@freebsd.org>
Cc:        Attilio Rao <attilio@freebsd.org>, arch@freebsd.org
Subject:   Re: sglist(9)
Message-ID:  <02A7F7FF-EBBC-40F3-8EBB-BFD4E5BE5391@samsco.org>
In-Reply-To: <200911301427.23166.jhb@freebsd.org>
References:  <200905191458.50764.jhb@freebsd.org> <3bbf2fe10911291429k54b4b7cfw9e40aefeca597307@mail.gmail.com> <66707B0F-D0AB-49DB-802F-13146F488E1A@samsco.org> <200911301427.23166.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Nov 30, 2009, at 12:27 PM, John Baldwin wrote:

> On Sunday 29 November 2009 6:27:49 pm Scott Long wrote:
>>> On Wednesday 20 May 2009 2:49:30 pm Jeff Roberson wrote:
>>>> On Tue, 19 May 2009, John Baldwin wrote:
>>>>
>>>> 2) I worry that if all users do sglist_count() followed by a  
>>>> dynamic
>>>> allocation and then an _append() they will be very expensive.
>>>> pmap_kextract() is much more expensive than it may first seem to
>>>> be.  Do
>>>> you have a user of count already?
>>>
>>> The only one that does now is sglist_build() and nothing currently
>>> uses that.
>>
>> Kinda silly to have it then.  I also don't see the point of it; if  
>> the
>> point of the sglist object is to avoid VA mappings, then why start
>> with a VA mapping?  But that aside, Jeff is correct, sglist_build()  
>> is
>> horribly inefficient.
>
> It actually does get used by the nvidia driver, but so far in what I  
> have
> done in my jhb_bio branch I have tried several different approaches  
> which
> is why the API is as verbose as it is.
>
>>> VOP_GET/PUTPAGES would not need to do this since they could simply
>>> append
>>> the physical addresses extracted directly from vm_page_t's for
>>> example.  I'm
>>> not sure this will be used very much now as originally I thought I
>>> would be
>>> changing all storage drivers to do all DMA operations using sglists
>>> and this
>>> sort of thing would have been used for non-bio requests like  
>>> firmware
>>> commands; however, as expounded on below, it actually appears better
>>> to
>>> still treat bio's separate from non-bio requests for bus_dma so that
>>> the
>>> non-bio requests can continue to use bus_dmamap_load_buffer() as
>>> they do
>>> now.
>>>
>>
>> I completely disagree with your approach to busdma, but I'll get into
>> that later.  What I really don't understand is, why have yet another
>> page description format?  Whether the I/O is disk buffers being  
>> pushed
>> by a pager to a disk controller, or texture buffers being pushed to
>> video hardware, they already have vm objects associated with them,
>> no?  Why translate to an intermediate format?  I understand that
>> you're creating another vm object format to deal directly with the
>> needs of nvidia, but that's really a one-off case right now.  What
>> about when we want the pagedaemon to push unmapped i/o?  Will it have
>> to spend cycles translating its objects to sglists?  This is really a
>> larger question that I'm not prepared to answer, but would like to
>> discuss.
>
> Textures do not already have objects associated, no.  However, I did  
> not
> design sglist with Nvidia in mind.  I hacked on it in conjunction with
> phk@, gibbs@, and jeff@ to work on unmapped bio support.  That was the
> only motivation for sglist(9).  Only the OBJT_SG bits were specific to
> Nvidia and that was added as an afterthought because sglist(9) already
> existed in the jhb_bio branch.
>
> If you look at GETPAGES/PUTPAGES they already deal in terms of  
> vm_page_t's,
> not VM objects, and vm_page_t's already provide a linear time way of  
> fetching
> the physical address (m->phys_addr), so generating an sglist for  
> GETPAGES and
> PUTPAGES will be very cheap.  One of the original proposals from  
> phk@ was
> actually to pass around arrays of vm_page_t's to describe I/O  
> buffers.  When
> Poul, Peter, and I talked about it we figured we had a choice  
> between passing
> either the physical address or the vm_page_t.  However, not all  
> physical
> addresses have vm_page_t's, and it was deemed that GEOM's and disk  
> drivers did
> not need any properties of the vm_page_t aside from the physical  
> address.  For
> those reasons, sglist(9) uses physical addresses.
>
>>>> In general I think this is a good idea.  It'd be nice to work on
>>>> replacing
>>>> the buf layer's implementation with something like this that could
>>>> be used
>>>> directly by drivers.  Have you considered a busdma operation to
>>>> load from
>>>> a sglist?
>>>
>>> So in regards to the bus_dma stuff, I did work on this a while ago
>>> in my
>>> jhb_bio branch.  I do have a bus_dmamap_load_sglist() and I had
>>> planned on
>>> using that in storage drivers directly.  However, I ended up
>>> circling back
>>> to preferring a bus_dmamap_load_bio() and adding a new 'bio_start'
>>> field
>>> to 'struct bio' that is an offset into an attached sglist.
>>
>> I strongly disagree with forcing busdma to have intimate knowledge of
>> bio's.  All of the needed information can be stored in sglist  
>> headers.
>
> The alternative is to teach every disk driver to handle the difference
> as well as every GEOM module.  Not only that, but it doesn't provide  
> any
> easy transition path since you can't change any of the top-level code
> to use an unmapped bio at until all the lower levels have been  
> converted
> to handle both ways.  Jeff had originally proposed having a
> bus_dmamap_load_bio() and I tried to not use it but just have a
> bus_dmamap_load_sglist() instead, but when I started looking at the  
> extra
> work that would have to be duplicated in every driver to handle both  
> types
> of bios, I concluded bus_dmamap_load_bio() would actually be a lot  
> simpler.

You completely missed the part of my email where I talk about not  
having to update drivers for these new APIs.

In any case, I still respectfully disagree with your approach to  
busdma and bio handling, and ask that you let Attilio and I work on  
our prototype.  Once that's done, we can stop talking in hypotheticals.

Scott




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?02A7F7FF-EBBC-40F3-8EBB-BFD4E5BE5391>