Date: Sun, 3 Nov 2013 08:51:57 -0800 From: Jason Evans <jasone@freebsd.org> To: Diane Bruce <db@db.net> Cc: Tim Kientzle <tim@kientzle.com>, freebsd-arm@FreeBSD.org, Ian Lepore <ian@FreeBSD.org>, Howard Su <howard0su@gmail.com> Subject: Re: sshd crash Message-ID: <2F2E1775-A459-4D0F-A464-F41B8A7EAB9B@freebsd.org> In-Reply-To: <20131102153953.GA39106@night.db.net> References: <CAAvnz_rj43Ww6=mMfnp2u5TA2pWb20vWOqyAtuK08wgzy0dH6A@mail.gmail.com> <1383313834.31172.65.camel@revolution.hippie.lan> <CAHNYxxMMF_GJv10drYuQFO%2Bav%2BTdp8OBvJfFZObEZ=tgaBovSA@mail.gmail.com> <1383328423.31172.92.camel@revolution.hippie.lan> <CAHNYxxNiuKP8wfTaZuL%2BBXiLcYA9eU3LBb-659ZBYr-WBSmZeQ@mail.gmail.com> <1383343354.31172.102.camel@revolution.hippie.lan> <EB18203F-C516-4917-9AA4-DBA6E66DAAB6@kientzle.com> <1383399220.31172.116.camel@revolution.hippie.lan> <20131102153953.GA39106@night.db.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Nov 2, 2013, at 8:39 AM, Diane Bruce <db@db.net> wrote: > On Sat, Nov 02, 2013 at 07:33:40AM -0600, Ian Lepore wrote: >>=20 >> I'm not sure it's a mundane stray-write either. The routine that's >> asserting is checking to see if the contents of a page are all-zero >> because a jemalloc internal flag is set that says it should be. I = had >> the routine print the non-zero data it found, and it looks like this: >>=20 >> not-zero at 0 0x20c99000 =3D 0x20800a00 >> not-zero at 1 0x20c99004 =3D 0x00000001 >> not-zero at 2 0x20c99008 =3D 0x0000002f >> not-zero at 3 0x20c9900c =3D 0xffffffff >> not-zero at 4 0x20c99010 =3D 0x00007fff >> not-zero at 5 0x20c99014 =3D 0x00000003 >> not-zero at 96 0x20c99180 =3D 0x5a5a5a5a >> not-zero at 97 0x20c99184 =3D 0x5a5a5a5a >> not-zero at 98 0x20c99188 =3D 0x5a5a5a5a >>=20 >> The 0x5a continues to the end of the page. So jemalloc has metadata >> that says it thinks the page is all-zeroes, and the page is a mix of >> data and some zeroes and the 5a junk-fill byte. It seems more like = the >> metadata is in error somehow. (Maybe a stray write hit the = metadata.) This looks to me like the sort of thing that would happen if the chunk = page map were corrupted. This could happen due to a double free, = freeing an interior pointer of a multi-page allocation, or a variety of = more complicated errors. The page is filled with 0x5a bytes, yet = jemalloc thinks the page should contain 0x00 bytes, and that implies = that the chunk page table claims this is the first use of the page since = it was mapped. Does this problem reproduce on amd64? If so, I'll dig in and figure out = if jemalloc is to blame. If not on amd64, given enough hand holding re: = hardware acquisition and configuration I can probably be convinced to = set up an ARM system. Thanks, Jason=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2F2E1775-A459-4D0F-A464-F41B8A7EAB9B>