Date: Tue, 23 Feb 2010 01:16:30 -0700 (MST) From: Warner Losh <imp@bsdimp.com> To: rajatjain@juniper.net Cc: freebsd-ia32@FreeBSD.org, freebsd-new-bus@FreeBSD.org, freebsd-ppc@FreeBSD.org, freebsd-arch@FreeBSD.org Subject: Re: Strategy for PCI resource management (for supporting hot-plug) Message-ID: <20100223.011630.74715282.imp@bsdimp.com> In-Reply-To: <8506939B503B404A84BBB12293FC45F606B88C39@emailbng3.jnpr.net> References: <8506939B503B404A84BBB12293FC45F606B88C39@emailbng3.jnpr.net>
next in thread | previous in thread | raw e-mail | index | archive | help
From: Rajat Jain <rajatjain@juniper.net> Subject: Strategy for PCI resource management (for supporting hot-plug) Date: Tue, 23 Feb 2010 12:46:40 +0530 > > Hi, > > I'm trying to add PCI-E hotplug support to the FreeBSD. As a first step > for the PCI-E hotplug support, I'm trying to decide on a resource > management / allocation strategy for the PCI memory / IO and the bus > numbers. Can you please comment on the following approach that I am > considering for resource allocation: > > PROBLEM STATEMENT: > ------------------ > Given a memory range [A->B], IO range [C->D], and limited (256) bus > numbers, enumerate the PCI tree of a system, leaving enough "holes" in > between to allow addition of future devices. > > PROPOSED STRATEGY: > ------------------ > 1) When booting, start enumerating in a depth-first-search order. While > enumeration, always keep track of: > > * The next bus number (x) that can be allocated > > * The next Memory space pointer (A + y) starting which allocation can > be > done. ("y" is the memory already allocated). > > * The next IO Space pointer (C + z) starting which allocation can be > done. > ("z" is the IO space already allocated). > > Keep incrementing the above as the resources are allocated. IO space and memory space are bus addresses, which may have a mapping to another domain. > 2) Allocate bus numbers sequentially while traversing down from root to > a leaf node (end point). When going down traversing a bridge: > > * Allocate the next available bus number (x) to the secondary bus of > bridge. > > * Temporarily mark the subordinate bridge as 0xFF (to allow discovery > of > maximum buses). > > * Temporarily assign all the remaining available memory space to bridge > > [(A+x) -> B]. Ditto for IO space. I'm sure this is wise. > 3) When a leaf node (End point) is reached, allocate the memory / IO > resource requested by the device, and increment the pointers. keep in mind that devices may not have drivers allocataed to them at bus enumeration of time. with hot-plug devices, you might not even know all the devices that are there or could be there. > 4) While passing a bridge in the upward direction, tweak the bridge > registers such that its resources are ONLY ENOUGH to address the needs > of all the PCI tree below it, and if it has its own internal memory > mapped registers, some memory for it as well. How does one deal with adding a device that has a bridge on it? I think that the only enough part is likely going to lead to prroblems as you'll need to move other resources if a new device arrives here. > The above is the standard depth-first algorithm for resource allocation. > Here is the addition to support hot-plug: the above won't quite work for cardbus :) But that's a hot-plug device... > At each bridge that supports hot-plug, in addition to the resources that > would have normally been allocated to this bridge, additionally > pre-allocate and assign to bridge (in anticipation of any new devices > that may be added later): In addition, or total? if it were total, you could more easily allocate memory or io space ranges in a more determnistic way when you have to deal with booting with or without a device that's present. > a) "RSRVE_NUM_BUS" number of busses, to cater to any bridges, PCI trees > present on the device plugged. This one might make sense, but if we have multiple levels then you'll run out. if you have 4 additional bridges, you can't allocate X additional busses at the root, then you can only (X-4)/4 at each level. > b) "RSRVE_MEM" amount of memory space, to cater to all the PCI devices > that > may be attached later on. > > c) "RESRVE_IO" amount of IO space, to cater to all PCI devices that may > be > attached later on. similar comments apply here. > Please note that the above RSRVE* are constants defining the amount of > resources to be set aside for /below each HOT-PLUGGABLE bridge; their > values may be tweaked via a compile time option or via a sysctl. > > FEW COMMENTS > ------------ > > 1) The strategy is fairly generic and tweak-able since it does not waste > a lot of resources (The developer neds to pick up a smart bvalue for > howmuch resources to reserve at each hot-pluggable slot): > > * The reservations shall be done only for hot-pluggable bridges > > * The developer can tweak the values (even disable it) for how much > Resources shall be allocated for each hot-pluggable bridge. I'd like to understand the details of this better. especially when you have multiple layers where devices that have bridges are hot-plugged into the system. For example, three's a cardbus to pci bridge, which has 3 PCI slots behind it. These slots may have, say, a quad ethernet card which has a pci bridge to allow the 4 pci nics behind it. New while this example may be dated, newer pci-e also allows for it... > 2) One point of debate is what happens if there are too much resource > demands in the system (too many devices or the developer configures too > many resources to be allocated for each hot-pluggable devices). For e.g. > consider that while enumeration we find that all the resources are > already allocated, while there are more devices that need resources. So > do we simply do not enumerate them? Etc... How is this different than normal resource failure? And how will you know at initial enumearation what devices will be plugged in? > Overall, how does the above look? In general, it looks fairly good. I'm just worried about the multiple layer case :) Warner > Thanks & Best Regards, > > Rajat Jain > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100223.011630.74715282.imp>