Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Feb 2010 01:16:30 -0700 (MST)
From:      Warner Losh <imp@bsdimp.com>
To:        rajatjain@juniper.net
Cc:        freebsd-ia32@FreeBSD.org, freebsd-new-bus@FreeBSD.org, freebsd-ppc@FreeBSD.org, freebsd-arch@FreeBSD.org
Subject:   Re: Strategy for PCI resource management (for supporting hot-plug)
Message-ID:  <20100223.011630.74715282.imp@bsdimp.com>
In-Reply-To: <8506939B503B404A84BBB12293FC45F606B88C39@emailbng3.jnpr.net>
References:  <8506939B503B404A84BBB12293FC45F606B88C39@emailbng3.jnpr.net>

next in thread | previous in thread | raw e-mail | index | archive | help
From: Rajat Jain <rajatjain@juniper.net>
Subject: Strategy for PCI resource management (for supporting hot-plug)
Date: Tue, 23 Feb 2010 12:46:40 +0530

> 
> Hi,
> 
> I'm trying to add PCI-E hotplug support to the FreeBSD. As a first step
> for the PCI-E hotplug support, I'm trying to decide on a resource
> management / allocation strategy for the PCI memory / IO and the bus
> numbers. Can you please comment on the following approach that I am
> considering for resource allocation:
> 
> PROBLEM STATEMENT:
> ------------------
> Given a memory range [A->B], IO range [C->D], and limited (256) bus
> numbers, enumerate the PCI tree of a system, leaving enough "holes" in
> between to allow addition of future devices.
> 
> PROPOSED STRATEGY:
> ------------------
> 1) When booting, start enumerating in a depth-first-search order. While
> enumeration, always keep track of:
> 
>  * The next bus number (x) that can be allocated
> 
>  * The next Memory space pointer (A + y) starting which allocation can
> be 
>    done. ("y" is the memory already allocated).
> 
>  * The next IO Space pointer (C + z) starting which allocation can be
> done.
>    ("z" is the IO space already allocated).
> 
> Keep incrementing the above as the resources are allocated.

IO space and memory space are bus addresses, which may have a mapping
to another domain.

> 2) Allocate bus numbers sequentially while traversing down from root to
> a leaf node (end point). When going down traversing a bridge:
> 
>  * Allocate the next available bus number (x) to the secondary bus of 
>    bridge.
> 
>  * Temporarily mark the subordinate bridge as 0xFF (to allow discovery
> of 
>    maximum buses).
> 
>  * Temporarily assign all the remaining available memory space to bridge
> 
>    [(A+x) -> B]. Ditto for IO space.

I'm sure this is wise.

> 3) When a leaf node (End point) is reached, allocate the memory / IO
> resource requested by the device, and increment the pointers. 

keep in mind that devices may not have drivers allocataed to them at
bus enumeration of time.  with hot-plug devices, you might not even
know all the devices that are there or could be there.

> 4) While passing a bridge in the upward direction, tweak the bridge
> registers such that its resources are ONLY ENOUGH to address the needs
> of all the PCI tree below it, and if it has its own internal memory
> mapped registers, some memory for it as well.

How does one deal with adding a device that has a bridge on it?  I
think that the only enough part is likely going to lead to prroblems
as you'll need to move other resources if a new device arrives here.

> The above is the standard depth-first algorithm for resource allocation.
> Here is the addition to support hot-plug:

the above won't quite work for cardbus :)  But that's a hot-plug
device...

> At each bridge that supports hot-plug, in addition to the resources that
> would have normally been allocated to this bridge, additionally
> pre-allocate and assign to bridge (in anticipation of any new devices
> that may be added later):

In addition, or total?  if it were total, you could more easily
allocate memory or io space ranges in a more determnistic way when you
have to deal with booting with or without a device that's present.

> a) "RSRVE_NUM_BUS" number of busses, to cater to any bridges, PCI trees 
>    present on the device plugged.

This one might make sense, but if we have multiple levels then you'll
run out.  if you have 4 additional bridges, you can't allocate X
additional busses at the root, then you can only (X-4)/4 at each
level.

> b) "RSRVE_MEM" amount of memory space, to cater to all the PCI devices
> that 
>    may be attached later on.
> 
> c) "RESRVE_IO" amount of IO space, to cater to all PCI devices that may
> be 
>    attached later on.

similar comments apply here.

> Please note that the above RSRVE* are constants defining the amount of
> resources to be set aside for /below each HOT-PLUGGABLE bridge; their
> values may be tweaked via a compile time option or via a sysctl. 
> 
> FEW COMMENTS
> ------------
>  
> 1) The strategy is fairly generic and tweak-able since it does not waste
> a lot of resources (The developer neds to pick up a smart bvalue for
> howmuch resources to reserve at each hot-pluggable slot):
>
>    * The reservations shall be done only for hot-pluggable bridges
> 
>    * The developer can tweak the values (even disable it) for how much 
>      Resources shall be allocated for each hot-pluggable bridge.

I'd like to understand the details of this better.  especially when
you have multiple layers where devices that have bridges are
hot-plugged into the system.

For example, three's a cardbus to pci bridge, which has 3 PCI slots
behind it.  These slots may have, say, a quad ethernet card which has
a pci bridge to allow the 4 pci nics behind it.  New while this
example may be dated, newer pci-e also allows for it...

> 2) One point of debate is what happens if there are too much resource
> demands in the system (too many devices or the developer configures too
> many resources to be allocated for each hot-pluggable devices). For e.g.
> consider that while enumeration we find that all the resources are
> already allocated, while there are more devices that need resources. So
> do we simply do not enumerate them? Etc...

How is this different than normal resource failure?  And how will you
know at initial enumearation what devices will be plugged in?

> Overall, how does the above look?

In general, it looks fairly good.  I'm just worried about the multiple
layer case :)

Warner

> Thanks & Best Regards,
> 
> Rajat Jain
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100223.011630.74715282.imp>