Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 21 Oct 2012 09:33:22 -0700
From:      David Wolfskill <david@catwhisker.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        stable@freebsd.org
Subject:   Re: stable/9 @r241776 panic: REDZONE: Buffer underflow detected...
Message-ID:  <20121021163322.GB1730@albert.catwhisker.org>
In-Reply-To: <20121021121356.GJ35915@deviant.kiev.zoral.com.ua>
References:  <20121020141019.GW1817@albert.catwhisker.org> <20121021121356.GJ35915@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help

--St7VIuEGZ6dlpu13
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Oct 21, 2012 at 03:13:56PM +0300, Konstantin Belousov wrote:
> On Sat, Oct 20, 2012 at 07:10:19AM -0700, David Wolfskill wrote:
> > This seems ... fairly weird to me.
> >=20
> > Yesterday, I built & booted:
> >=20
> > FreeBSD g1-227.catwhisker.org 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #27=
4 241726M: Fri Oct 19 05:40:05 PDT 2012     root@g1-227.catwhisker.org:/usr=
/obj/usr/src/sys/CANARY  i386
> >=20
> > and used the machine all day; nothing unusual (including various
> > reboots (e.g. when I disembarked the train for the final leg of my
> > commute home, so I powered the laptop off).
> >=20
> > This morning, I built:
> >=20
> > FreeBSD g1-227.catwhisker.org 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #27=
5 241776M: Sat Oct 20 04:34:45 PDT 2012     root@g1-227.catwhisker.org:/usr=
/obj/usr/src/sys/CANARY  i386
> >=20
> > and on first reboot, I got a panic.
> >=20
> > After a bit of experimentation, it appears that I get a panic @r241776
> > if I attempt a normal boot into multi-user mode, but if I first boot to
> > single-user mode, then exit single-user mode, it comes up without a
> > problem.
> >=20
> > I don't have a serial console, so I started to write down some of the
> > panic information, but my patience ran a bit short.  Here's whet I
> > recorded (warning: hand-transcripted -- twice!):
> >=20
> > ...
> > Starting devd.
> > REDZONE: Buffer underflow detected.  1 byte corrupted before 0xced40080=
 (4294966796 bytes allocated).
> > Allocation backtrace:
> > #0 0xc0ceac8f at redzone_setup+0xcf
> > #1 0xc0a5d5c9 at malloc+0x1d9
> > ...[about 20 more such lines I didn't record]...
> >=20
> > > bt
> > Tracing pid 901 tid 100106 td 0xd2b99000
> > kdb_enter(...)
> > panic(...)
> > free(...)
> > devread(ce8c2d00,f7274c0c,0,c0b1e4f0,d279e380,...) at devread+0x1a6
> > giant_read(...) at giant_read+0x87
> > devfs_read(...) at devfs_read+0xc6
> > dofileread(...) at dofileread+0x99
> > sys_read(...) at sys_read+0x98
> > syscall(f7274d08) at syscall+0x387
> >=20
> > Within the bounds described above, this appears to be quite reproducible
> > -- on my laptop.  My build machine (updated in parallel, at the same
> > GRNs) does not exhibit the panic.
> >=20
> > I was unable to get a crash dump; I have
> >=20
> > dumpdev=3D"AUTO"
> >=20
> > in /etc/rc.conf, and the panic was occurring well after swap was
> > enabled.  (Yes, I know I have swap over-allocated.  I plan to do
> > something about it at some point.)
> >=20
> > I've attached a copy of dmesg.boot.
> >=20
> > Anyone else seeing this?  Any ideas how to diagnose it?
>=20
> devread is the method of devctl(4) which passes devd notifications from
> the kernel to userland (to devd, specifically). There were no changes to
> devctl(4) for quite a time.
>=20
> The corruption is, most likely, in some unrelated piece of code. Could
> you try to bisect the stable to catch the offender ? The bisect is not
> guaranteed to work, obviously, since the random corruption effects are
> unpredictable.

[Lack of trimming is deliberate, in this case, as I found a reversion
that appears to address the issue, and I wanted folks looking at this to
have the bulk of the symptoms readily at hand. -- dhw]

The range of GRNs in question is 241726 - 241776, only 5 of which appliy
to stable/9.  Here's a list, with the affected files listed:

241742
	sys/dev/sound/pci/hda/hdaa_patches.c
241749
	sys/cam/cam_queue.c
241762
	sys/dev/tws/tws.c
	sys/dev/tws/tws.h
	sys/dev/tws/tws_cam.c
	sys/dev/tws/tws_hdm.h
	sys/dev/tws/tws_user.c
241767
	usr.bin/make/var.c
241769
	sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c


I had actually tried reverting 241742 yesterday, to no effect.  I don't
use ZFS, and I have a pretty hard time understanding how 241767 would
break one machine and leave 4 others unscathed.  (Yes, I completed my
weekly updates, as well, by now.)  I don't have tws(4) devices --
certainly not on the laptop.

So I tried reverting 241749 ... and I failed to reproduce the problem.

Well, one boot out of one, at least.  I'll try a few more reality
checks, and report back if a correction is in order.  But (for now, at
least), it looks to me as if 241749 is presenting a problem on this
laptop.

For folks investigating, I attached a dmesg.boot to the initial post in
the thread; I'll be happy to provide more information, should it be
requested (& specified).

Peace,
david
--=20
David H. Wolfskill				david@catwhisker.org
Taliban: Evil men with guns afraid of truth from a 14-year old girl.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.

--St7VIuEGZ6dlpu13
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iEYEARECAAYFAlCEI9EACgkQmprOCmdXAD0w+QCfTT7c0aL8L76liKKa/bP8/VO8
gXcAnjz+0l68d21fkp7ewnmXco86jd+2
=gn7W
-----END PGP SIGNATURE-----

--St7VIuEGZ6dlpu13--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121021163322.GB1730>