Date: Thu, 22 Dec 2016 14:37:04 -0500 From: Justin Hibbits <chmeeedalf@gmail.com> To: Warner Losh <imp@bsdimp.com> Cc: =?ISO-8859-1?Q?Roger_Pau_Monn=E9?= <roger.pau@citrix.com>, FreeBSD Arch <freebsd-arch@freebsd.org> Subject: Re: Order of device suspend/resume Message-ID: <6C1FBD30-8301-4C6D-8C8B-653C6C096A93@gmail.com> In-Reply-To: <CANCZdfp_Hx7AXT1N5zjjwMxBMvRM5F_VdkJo%2B=XmUhXj_zxhNA@mail.gmail.com> References: <20161215114033.r33nt3fqhnfi7hqw@dhcp-3-221.uk.xensource.com> <7469755.xT5lfhErkd@ralph.baldwin.cx> <CDAA6577-C325-4691-9317-8CB0CE30959D@gmail.com> <CANCZdfp_Hx7AXT1N5zjjwMxBMvRM5F_VdkJo%2B=XmUhXj_zxhNA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Dec 16, 2016, at 12:25 AM, Warner Losh wrote: > On Thu, Dec 15, 2016 at 8:34 PM, Justin Hibbits =20 > <chmeeedalf@gmail.com> wrote: >> >> On Dec 15, 2016, at 3:38 PM, John Baldwin wrote: >> >>> On Thursday, December 15, 2016 11:40:33 AM Roger Pau Monn=E9 wrote: >>>> >>>> Hello, >>>> >>>> I'm currently dealing with a bug in the Xen suspend/resume =20 >>>> sequence, and >>>> I've >>>> found that lacking a way to order device priority during suspend/=20= >>>> resume >>>> is >>>> proving quite harmful for Xen (and maybe other systems too). The =20= >>>> current >>>> suspend/resume code simply scans the root bus, and suspends/=20 >>>> resumes every >>>> device >>>> based on the order they are attached to their parents. The =20 >>>> problem here >>>> is that >>>> there's no way to tell that some devices should be resumed before =20= >>>> others, >>>> for >>>> example the event timers/time counters/uarts should definitely be =20= >>>> resume >>>> before >>>> other devices, but that's seems to happens mostly out of chance. >>>> >>>> Currently most time related devices are attached directly to the =20= >>>> nexus, >>>> which >>>> means they will get resumed first, but for example the uart is =20 >>>> currently >>>> attached to the pci bus IIRC, which means it gets resumed quite =20 >>>> late. On >>>> Xen >>>> systems, this is even worse. The Xen PV bus (that contains all >>>> Xen-related >>>> devices) is attached the last one (because it tends to pick up =20 >>>> unused >>>> memory >>>> regions for it's own usage) and this bus also contains the PV =20 >>>> timecounter >>>> which >>>> should be resumed _before_ other devices, or else timecounting =20 >>>> will be >>>> completely screwed and things can get stuck in indefinitely long =20= >>>> loops >>>> (due to >>>> the fact that the timecounter is implemented based on the uptime =20= >>>> of the >>>> host, >>>> and that changes from host-to-host). >>>> >>>> In order to solve this I could add a hack to the Xen resume process >>>> (which is >>>> already different from the ACPI one), but this looks gross. I =20 >>>> could also >>>> attach >>>> the Xen PV timer to the nexus directly (as it was done before), =20 >>>> but I >>>> also >>>> prefer to keep all Xen-related devices in the same bus for =20 >>>> coherency. >>>> Last >>>> option would be to add some kind of suspend/resume priorities to =20= >>>> the >>>> devices, >>>> and do more than one suspend/resume pass. This is more complex and >>>> requires more >>>> changes, so I would like to know if it would be helpful for other >>>> systems, or if >>>> someone has already attempted to do it. >>> >>> >>> I think Justin Hibbits had some patches to make use of the boot-time >>> new-bus >>> passes for suspend and resume which I think would help with this. =20= >>> You >>> suspend >>> things in the reverse order of boot and resume operates in the =20 >>> same order >>> as >>> boot. >>> >>> -- >>> John Baldwin >> >> >> John is right. I have a (somewhat abandoned due to time and focus) =20= >> branch, >> https://svnweb.freebsd.org/base/projects/pmac_pmu/ which has the =20 >> necessary >> code working mostly on PowerPC. The diff can be found at >> https://reviews.freebsd.org/D203 too. > > Cool. Does it have a mechanism similar to the attach code that lets > you run again at each pass? > > Warner Not exactly. The code will call the BUS_SUSPEND_CHILD() as it rolls =20 back the pass levels, and stop on errors. The meat is in a rewrite of =20= bus_generic_suspend() in that review. - Justin=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6C1FBD30-8301-4C6D-8C8B-653C6C096A93>