Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 15 Dec 2016 21:25:05 -0800
From:      Warner Losh <imp@bsdimp.com>
To:        Justin Hibbits <chmeeedalf@gmail.com>
Cc:        =?UTF-8?Q?Roger_Pau_Monn=C3=A9?= <roger.pau@citrix.com>,  FreeBSD Arch <freebsd-arch@freebsd.org>
Subject:   Re: Order of device suspend/resume
Message-ID:  <CANCZdfp_Hx7AXT1N5zjjwMxBMvRM5F_VdkJo%2B=XmUhXj_zxhNA@mail.gmail.com>
In-Reply-To: <CDAA6577-C325-4691-9317-8CB0CE30959D@gmail.com>
References:  <20161215114033.r33nt3fqhnfi7hqw@dhcp-3-221.uk.xensource.com> <7469755.xT5lfhErkd@ralph.baldwin.cx> <CDAA6577-C325-4691-9317-8CB0CE30959D@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Dec 15, 2016 at 8:34 PM, Justin Hibbits <chmeeedalf@gmail.com> wrot=
e:
>
> On Dec 15, 2016, at 3:38 PM, John Baldwin wrote:
>
>> On Thursday, December 15, 2016 11:40:33 AM Roger Pau Monn=C3=A9 wrote:
>>>
>>> Hello,
>>>
>>> I'm currently dealing with a bug in the Xen suspend/resume sequence, an=
d
>>> I've
>>> found that lacking a way to order device priority during suspend/resume
>>> is
>>> proving quite harmful for Xen (and maybe other systems too). The curren=
t
>>> suspend/resume code simply scans the root bus, and suspends/resumes eve=
ry
>>> device
>>> based on the order they are attached to their parents. The problem here
>>> is that
>>> there's no way to tell that some devices should be resumed before other=
s,
>>> for
>>> example the event timers/time counters/uarts should definitely be resum=
e
>>> before
>>> other devices, but that's seems to happens mostly out of chance.
>>>
>>> Currently most time related devices are attached directly to the nexus,
>>> which
>>> means they will get resumed first, but for example the uart is currentl=
y
>>> attached to the pci bus IIRC, which means it gets resumed quite late. O=
n
>>> Xen
>>> systems, this is even worse. The Xen PV bus (that contains all
>>> Xen-related
>>> devices) is attached the last one (because it tends to pick up unused
>>> memory
>>> regions for it's own usage) and this bus also contains the PV timecount=
er
>>> which
>>> should be resumed _before_ other devices, or else timecounting will be
>>> completely screwed and things can get stuck in indefinitely long loops
>>> (due to
>>> the fact that the timecounter is implemented based on the uptime of the
>>> host,
>>> and that changes from host-to-host).
>>>
>>> In order to solve this I could add a hack to the Xen resume process
>>> (which is
>>> already different from the ACPI one), but this looks gross. I could als=
o
>>> attach
>>> the Xen PV timer to the nexus directly (as it was done before), but I
>>> also
>>> prefer to keep all Xen-related devices in the same bus for coherency.
>>> Last
>>> option would be to add some kind of suspend/resume priorities to the
>>> devices,
>>> and do more than one suspend/resume pass. This is more complex and
>>> requires more
>>> changes, so I would like to know if it would be helpful for other
>>> systems, or if
>>> someone has already attempted to do it.
>>
>>
>> I think Justin Hibbits had some patches to make use of the boot-time
>> new-bus
>> passes for suspend and resume which I think would help with this.  You
>> suspend
>> things in the reverse order of boot and resume operates in the same orde=
r
>> as
>> boot.
>>
>> --
>> John Baldwin
>
>
> John is right.  I have a (somewhat abandoned due to time and focus) branc=
h,
> https://svnweb.freebsd.org/base/projects/pmac_pmu/ which has the necessar=
y
> code working mostly on PowerPC.  The diff can be found at
> https://reviews.freebsd.org/D203 too.

Cool. Does it have a mechanism similar to the attach code that lets
you run again at each pass?

Warner



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfp_Hx7AXT1N5zjjwMxBMvRM5F_VdkJo%2B=XmUhXj_zxhNA>