Date: Thu, 15 Dec 2016 21:25:05 -0800 From: Warner Losh <imp@bsdimp.com> To: Justin Hibbits <chmeeedalf@gmail.com> Cc: =?UTF-8?Q?Roger_Pau_Monn=C3=A9?= <roger.pau@citrix.com>, FreeBSD Arch <freebsd-arch@freebsd.org> Subject: Re: Order of device suspend/resume Message-ID: <CANCZdfp_Hx7AXT1N5zjjwMxBMvRM5F_VdkJo%2B=XmUhXj_zxhNA@mail.gmail.com> In-Reply-To: <CDAA6577-C325-4691-9317-8CB0CE30959D@gmail.com> References: <20161215114033.r33nt3fqhnfi7hqw@dhcp-3-221.uk.xensource.com> <7469755.xT5lfhErkd@ralph.baldwin.cx> <CDAA6577-C325-4691-9317-8CB0CE30959D@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Dec 15, 2016 at 8:34 PM, Justin Hibbits <chmeeedalf@gmail.com> wrot= e: > > On Dec 15, 2016, at 3:38 PM, John Baldwin wrote: > >> On Thursday, December 15, 2016 11:40:33 AM Roger Pau Monn=C3=A9 wrote: >>> >>> Hello, >>> >>> I'm currently dealing with a bug in the Xen suspend/resume sequence, an= d >>> I've >>> found that lacking a way to order device priority during suspend/resume >>> is >>> proving quite harmful for Xen (and maybe other systems too). The curren= t >>> suspend/resume code simply scans the root bus, and suspends/resumes eve= ry >>> device >>> based on the order they are attached to their parents. The problem here >>> is that >>> there's no way to tell that some devices should be resumed before other= s, >>> for >>> example the event timers/time counters/uarts should definitely be resum= e >>> before >>> other devices, but that's seems to happens mostly out of chance. >>> >>> Currently most time related devices are attached directly to the nexus, >>> which >>> means they will get resumed first, but for example the uart is currentl= y >>> attached to the pci bus IIRC, which means it gets resumed quite late. O= n >>> Xen >>> systems, this is even worse. The Xen PV bus (that contains all >>> Xen-related >>> devices) is attached the last one (because it tends to pick up unused >>> memory >>> regions for it's own usage) and this bus also contains the PV timecount= er >>> which >>> should be resumed _before_ other devices, or else timecounting will be >>> completely screwed and things can get stuck in indefinitely long loops >>> (due to >>> the fact that the timecounter is implemented based on the uptime of the >>> host, >>> and that changes from host-to-host). >>> >>> In order to solve this I could add a hack to the Xen resume process >>> (which is >>> already different from the ACPI one), but this looks gross. I could als= o >>> attach >>> the Xen PV timer to the nexus directly (as it was done before), but I >>> also >>> prefer to keep all Xen-related devices in the same bus for coherency. >>> Last >>> option would be to add some kind of suspend/resume priorities to the >>> devices, >>> and do more than one suspend/resume pass. This is more complex and >>> requires more >>> changes, so I would like to know if it would be helpful for other >>> systems, or if >>> someone has already attempted to do it. >> >> >> I think Justin Hibbits had some patches to make use of the boot-time >> new-bus >> passes for suspend and resume which I think would help with this. You >> suspend >> things in the reverse order of boot and resume operates in the same orde= r >> as >> boot. >> >> -- >> John Baldwin > > > John is right. I have a (somewhat abandoned due to time and focus) branc= h, > https://svnweb.freebsd.org/base/projects/pmac_pmu/ which has the necessar= y > code working mostly on PowerPC. The diff can be found at > https://reviews.freebsd.org/D203 too. Cool. Does it have a mechanism similar to the attach code that lets you run again at each pass? Warner
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfp_Hx7AXT1N5zjjwMxBMvRM5F_VdkJo%2B=XmUhXj_zxhNA>