From owner-freebsd-arch@freebsd.org Thu Dec 15 11:41:15 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6117DC81CBC for ; Thu, 15 Dec 2016 11:41:15 +0000 (UTC) (envelope-from prvs=150a29c11=roger.pau@citrix.com) Received: from SMTP.EU.CITRIX.COM (smtp.ctxuk.citrix.com [185.25.65.24]) (using TLSv1.2 with cipher RC4-SHA (128/128 bits)) (Client CN "mail.citrix.com", Issuer "DigiCert SHA2 Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B8A01A90; Thu, 15 Dec 2016 11:41:14 +0000 (UTC) (envelope-from prvs=150a29c11=roger.pau@citrix.com) X-IronPort-AV: E=Sophos;i="5.33,351,1477958400"; d="scan'208";a="36869369" Date: Thu, 15 Dec 2016 11:40:33 +0000 From: Roger Pau =?iso-8859-1?Q?Monn=E9?= To: Subject: Order of device suspend/resume Message-ID: <20161215114033.r33nt3fqhnfi7hqw@dhcp-3-221.uk.xensource.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline User-Agent: NeoMutt/20161126 (1.7.1) X-ClientProxiedBy: AMSPEX02CAS01.citrite.net (10.69.22.112) To AMSPEX02CL02.citrite.net (10.69.22.126) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Dec 2016 11:41:15 -0000 Hello, I'm currently dealing with a bug in the Xen suspend/resume sequence, and I've found that lacking a way to order device priority during suspend/resume is proving quite harmful for Xen (and maybe other systems too). The current suspend/resume code simply scans the root bus, and suspends/resumes every device based on the order they are attached to their parents. The problem here is that there's no way to tell that some devices should be resumed before others, for example the event timers/time counters/uarts should definitely be resume before other devices, but that's seems to happens mostly out of chance. Currently most time related devices are attached directly to the nexus, which means they will get resumed first, but for example the uart is currently attached to the pci bus IIRC, which means it gets resumed quite late. On Xen systems, this is even worse. The Xen PV bus (that contains all Xen-related devices) is attached the last one (because it tends to pick up unused memory regions for it's own usage) and this bus also contains the PV timecounter which should be resumed _before_ other devices, or else timecounting will be completely screwed and things can get stuck in indefinitely long loops (due to the fact that the timecounter is implemented based on the uptime of the host, and that changes from host-to-host). In order to solve this I could add a hack to the Xen resume process (which is already different from the ACPI one), but this looks gross. I could also attach the Xen PV timer to the nexus directly (as it was done before), but I also prefer to keep all Xen-related devices in the same bus for coherency. Last option would be to add some kind of suspend/resume priorities to the devices, and do more than one suspend/resume pass. This is more complex and requires more changes, so I would like to know if it would be helpful for other systems, or if someone has already attempted to do it. Thanks, Roger.