From owner-freebsd-arm@FreeBSD.ORG  Sun Nov  3 16:57:21 2013
Return-Path: <owner-freebsd-arm@FreeBSD.ORG>
Delivered-To: freebsd-arm@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 4A269F1B;
 Sun,  3 Nov 2013 16:57:21 +0000 (UTC)
 (envelope-from jasone@freebsd.org)
Received: from canonware.com (canonware.com [204.109.63.53])
 (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 267512758;
 Sun,  3 Nov 2013 16:57:20 +0000 (UTC)
Received: from [192.168.168.18]
 (70-91-206-178-BusName-SFBA.hfc.comcastbusiness.net [70.91.206.178])
 (using TLSv1 with cipher AES128-SHA (128/128 bits))
 (No client certificate requested)
 by canonware.com (Postfix) with ESMTPSA id 361A42842D;
 Sun,  3 Nov 2013 08:51:58 -0800 (PST)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
Subject: Re: sshd crash
From: Jason Evans <jasone@freebsd.org>
In-Reply-To: <20131102153953.GA39106@night.db.net>
Date: Sun, 3 Nov 2013 08:51:57 -0800
Content-Transfer-Encoding: quoted-printable
Message-Id: <2F2E1775-A459-4D0F-A464-F41B8A7EAB9B@freebsd.org>
References: <CAAvnz_rj43Ww6=mMfnp2u5TA2pWb20vWOqyAtuK08wgzy0dH6A@mail.gmail.com>
 <1383313834.31172.65.camel@revolution.hippie.lan>
 <CAHNYxxMMF_GJv10drYuQFO+av+Tdp8OBvJfFZObEZ=tgaBovSA@mail.gmail.com>
 <1383328423.31172.92.camel@revolution.hippie.lan>
 <CAHNYxxNiuKP8wfTaZuL+BXiLcYA9eU3LBb-659ZBYr-WBSmZeQ@mail.gmail.com>
 <1383343354.31172.102.camel@revolution.hippie.lan>
 <EB18203F-C516-4917-9AA4-DBA6E66DAAB6@kientzle.com>
 <1383399220.31172.116.camel@revolution.hippie.lan>
 <20131102153953.GA39106@night.db.net>
To: Diane Bruce <db@db.net>
X-Mailer: Apple Mail (2.1508)
Cc: Tim Kientzle <tim@kientzle.com>, freebsd-arm@FreeBSD.org,
 Ian Lepore <ian@FreeBSD.org>, Howard Su <howard0su@gmail.com>
X-BeenThere: freebsd-arm@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Porting FreeBSD to the StrongARM Processor <freebsd-arm.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arm>
List-Post: <mailto:freebsd-arm@freebsd.org>
List-Help: <mailto:freebsd-arm-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Nov 2013 16:57:21 -0000

On Nov 2, 2013, at 8:39 AM, Diane Bruce <db@db.net> wrote:
> On Sat, Nov 02, 2013 at 07:33:40AM -0600, Ian Lepore wrote:
>>=20
>> I'm not sure it's a mundane stray-write either.  The routine that's
>> asserting is checking to see if the contents of a page are all-zero
>> because a jemalloc internal flag is set that says it should be.  I =
had
>> the routine print the non-zero data it found, and it looks like this:
>>=20
>> not-zero at 0 0x20c99000 =3D 0x20800a00
>> not-zero at 1 0x20c99004 =3D 0x00000001
>> not-zero at 2 0x20c99008 =3D 0x0000002f
>> not-zero at 3 0x20c9900c =3D 0xffffffff
>> not-zero at 4 0x20c99010 =3D 0x00007fff
>> not-zero at 5 0x20c99014 =3D 0x00000003
>> not-zero at 96 0x20c99180 =3D 0x5a5a5a5a
>> not-zero at 97 0x20c99184 =3D 0x5a5a5a5a
>> not-zero at 98 0x20c99188 =3D 0x5a5a5a5a
>>=20
>> The 0x5a continues to the end of the page.  So jemalloc has metadata
>> that says it thinks the page is all-zeroes, and the page is a mix of
>> data and some zeroes and the 5a junk-fill byte.  It seems more like =
the
>> metadata is in error somehow.  (Maybe a stray write hit the =
metadata.)

This looks to me like the sort of thing that would happen if the chunk =
page map were corrupted.  This could happen due to a double free, =
freeing an interior pointer of a multi-page allocation, or a variety of =
more complicated errors.  The page is filled with 0x5a bytes, yet =
jemalloc thinks the page should contain 0x00 bytes, and that implies =
that the chunk page table claims this is the first use of the page since =
it was mapped.

Does this problem reproduce on amd64?  If so, I'll dig in and figure out =
if jemalloc is to blame.  If not on amd64, given enough hand holding re: =
hardware acquisition and configuration I can probably be convinced to =
set up an ARM system.

Thanks,
Jason=