Date: Tue, 13 Dec 2016 21:28:21 +0100 From: Dimitry Andric <dim@FreeBSD.org> To: Colin Percival <cperciva@FreeBSD.org> Cc: src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, svn-src-head@freebsd.org, =?utf-8?Q?Roger_Pau_Monn=C3=A9?= <royger@freebsd.org> Subject: Re: svn commit: r310013 - head/sys/dev/xen/blkfront Message-ID: <A5CE6877-FDB2-4B07-A611-29FAD91FC9F2@FreeBSD.org> In-Reply-To: <201612130654.uBD6sDtF073358@repo.freebsd.org> References: <201612130654.uBD6sDtF073358@repo.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--Apple-Mail=_CE0A3362-B11A-4518-AE43-BF90CD47FFBB Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii On 13 Dec 2016, at 07:54, Colin Percival <cperciva@FreeBSD.org> wrote: > > Author: cperciva > Date: Tue Dec 13 06:54:13 2016 > New Revision: 310013 > URL: https://svnweb.freebsd.org/changeset/base/310013 > > Log: > Check that blkfront devices have a non-zero number of sectors and a > non-zero sector size. Such a device would be a virtual disk of zero > bytes; clearly not useful, and not something we should try to attach. > > As a fortuitous side effect, checking that these values are non-zero > here results in them not *becoming* zero later on the function. This > odd behaviour began with r309124 (clang 3.9.0) but is challenging to > debug; making any changes to this function whatsoever seems to affect > the llvm optimizer behaviour enough to make the unexpected zeroing of > the sector_size variable cease. I've been having some fun debugging a kernel under Xen, and what I found is that sector_size changes from 512 to 0 because of an xs_gather() call in xbd_connect(). The function starts as follows: 1222 static void 1223 xbd_connect(struct xbd_softc *sc) 1224 { 1225 device_t dev = sc->xbd_dev; 1226 unsigned long sectors, sector_size, phys_sector_size; 1227 unsigned int binfo; 1228 int err, feature_barrier, feature_flush; ... 1237 err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev), 1238 "sectors", "%lu", §ors, 1239 "info", "%u", &binfo, 1240 "sector-size", "%lu", §or_size, 1241 NULL); After this first call to xs_gather(), sectors is 44040322 (in my case of a ~21G disk), binfo is zero, and sector_size is 512. During execution of the following few lines, sector_size stays 512, until just after the fourth call to xs_gather(): 1259 err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev), 1260 "feature-flush-cache", "%lu", &feature_flush, 1261 NULL); and then it becomes zero! This is because the %lu format scans a 64 bit unsigned integer, while the feature_flush variable is a 32 bit signed integer, thus overwriting an adjacent location on the stack, which happens to contain sector_size: in the assembly output, sector_size is at -88(%rbp), while feature_flush is at -92(%rbp). There is another instance of such an incorrect format, a few lines before this, but it doesn't seem to scribble over important data (or we didn't panic because of it, yet): 1253 err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev), 1254 "feature-barrier", "%lu", &feature_barrier, 1255 NULL); Here, feature_barrier is a 32 bit signed integer, while %lu again scans a 64 bit unsigned integer. In short, I think the following change would also fix the problem: Index: sys/dev/xen/blkfront/blkfront.c =================================================================== --- sys/dev/xen/blkfront/blkfront.c (revision 309817) +++ sys/dev/xen/blkfront/blkfront.c (working copy) @@ -1251,13 +1251,13 @@ xbd_connect(struct xbd_softc *sc) if (err || phys_sector_size <= sector_size) phys_sector_size = 0; err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev), - "feature-barrier", "%lu", &feature_barrier, + "feature-barrier", "%d", &feature_barrier, NULL); if (err == 0 && feature_barrier != 0) sc->xbd_flags |= XBDF_BARRIER; err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev), - "feature-flush-cache", "%lu", &feature_flush, + "feature-flush-cache", "%d", &feature_flush, NULL); if (err == 0 && feature_flush != 0) sc->xbd_flags |= XBDF_FLUSH; However, I have some difficulty getting a custom-built kernel booting on my very rudimentary Xen setup. I am using a snapshot raw image, with no idea how to insert a kernel into it. E.g, can you please try this out, instead of the zero-check fix? I am very curious whether that fixes the panics. :) -Dimitry P.S.: We really need AddressSanitizer support in the kernel... --Apple-Mail=_CE0A3362-B11A-4518-AE43-BF90CD47FFBB Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.30 iEYEARECAAYFAlhQWfIACgkQsF6jCi4glqNjoACg8Bm/ggGf+Pbrm9wIuajXxm6Y 2qoAoOl3Kob3ZGUUyfZaLS6y/lZgx27I =RooA -----END PGP SIGNATURE----- --Apple-Mail=_CE0A3362-B11A-4518-AE43-BF90CD47FFBB--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A5CE6877-FDB2-4B07-A611-29FAD91FC9F2>