Date: Wed, 14 Dec 2016 16:47:05 +0000 From: Roger Pau =?iso-8859-1?Q?Monn=E9?= <royger@freebsd.org> To: Dimitry Andric <dim@FreeBSD.org> Cc: Colin Percival <cperciva@FreeBSD.org>, src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r310013 - head/sys/dev/xen/blkfront Message-ID: <20161214164704.grn5gvsm3u6difs2@dhcp-3-221.uk.xensource.com> In-Reply-To: <A5CE6877-FDB2-4B07-A611-29FAD91FC9F2@FreeBSD.org> References: <201612130654.uBD6sDtF073358@repo.freebsd.org> <A5CE6877-FDB2-4B07-A611-29FAD91FC9F2@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Dec 13, 2016 at 09:28:21PM +0100, Dimitry Andric wrote: > On 13 Dec 2016, at 07:54, Colin Percival <cperciva@FreeBSD.org> wrote: > > > > Author: cperciva > > Date: Tue Dec 13 06:54:13 2016 > > New Revision: 310013 > > URL: https://svnweb.freebsd.org/changeset/base/310013 > > > > Log: > > Check that blkfront devices have a non-zero number of sectors and a > > non-zero sector size. Such a device would be a virtual disk of zero > > bytes; clearly not useful, and not something we should try to attach. > > > > As a fortuitous side effect, checking that these values are non-zero > > here results in them not *becoming* zero later on the function. This > > odd behaviour began with r309124 (clang 3.9.0) but is challenging to > > debug; making any changes to this function whatsoever seems to affect > > the llvm optimizer behaviour enough to make the unexpected zeroing of > > the sector_size variable cease. > > I've been having some fun debugging a kernel under Xen, and what I found > is that sector_size changes from 512 to 0 because of an xs_gather() call > in xbd_connect(). The function starts as follows: > > 1222 static void > 1223 xbd_connect(struct xbd_softc *sc) > 1224 { > 1225 device_t dev = sc->xbd_dev; > 1226 unsigned long sectors, sector_size, phys_sector_size; > 1227 unsigned int binfo; > 1228 int err, feature_barrier, feature_flush; > ... > 1237 err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev), > 1238 "sectors", "%lu", §ors, > 1239 "info", "%u", &binfo, > 1240 "sector-size", "%lu", §or_size, > 1241 NULL); > > After this first call to xs_gather(), sectors is 44040322 (in my case of > a ~21G disk), binfo is zero, and sector_size is 512. During execution > of the following few lines, sector_size stays 512, until just after the > fourth call to xs_gather(): > > 1259 err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev), > 1260 "feature-flush-cache", "%lu", &feature_flush, > 1261 NULL); > > and then it becomes zero! This is because the %lu format scans a 64 bit > unsigned integer, while the feature_flush variable is a 32 bit signed > integer, thus overwriting an adjacent location on the stack, which > happens to contain sector_size: in the assembly output, sector_size is > at -88(%rbp), while feature_flush is at -92(%rbp). > > There is another instance of such an incorrect format, a few lines > before this, but it doesn't seem to scribble over important data (or we > didn't panic because of it, yet): > > 1253 err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev), > 1254 "feature-barrier", "%lu", &feature_barrier, > 1255 NULL); > > Here, feature_barrier is a 32 bit signed integer, while %lu again scans > a 64 bit unsigned integer. > > In short, I think the following change would also fix the problem: > > Index: sys/dev/xen/blkfront/blkfront.c > =================================================================== > --- sys/dev/xen/blkfront/blkfront.c (revision 309817) > +++ sys/dev/xen/blkfront/blkfront.c (working copy) > @@ -1251,13 +1251,13 @@ xbd_connect(struct xbd_softc *sc) > if (err || phys_sector_size <= sector_size) > phys_sector_size = 0; > err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev), > - "feature-barrier", "%lu", &feature_barrier, > + "feature-barrier", "%d", &feature_barrier, > NULL); > if (err == 0 && feature_barrier != 0) > sc->xbd_flags |= XBDF_BARRIER; > > err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev), > - "feature-flush-cache", "%lu", &feature_flush, > + "feature-flush-cache", "%d", &feature_flush, > NULL); > if (err == 0 && feature_flush != 0) > sc->xbd_flags |= XBDF_FLUSH; > > However, I have some difficulty getting a custom-built kernel booting on > my very rudimentary Xen setup. I am using a snapshot raw image, with no > idea how to insert a kernel into it. > > E.g, can you please try this out, instead of the zero-check fix? I am > very curious whether that fixes the panics. :) Yes, this indeed fixes the panic, I've reverted Colin's patch and applied yours, and everything seems fine. Please go ahead and commit it. xs_gather should really go away and we should use something that we can sanitize at compile time using __scanflike & friends. Thanks, Roger.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161214164704.grn5gvsm3u6difs2>