From owner-freebsd-arm@FreeBSD.ORG Sun Nov 3 16:57:21 2013 Return-Path: Delivered-To: freebsd-arm@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 4A269F1B; Sun, 3 Nov 2013 16:57:21 +0000 (UTC) (envelope-from jasone@freebsd.org) Received: from canonware.com (canonware.com [204.109.63.53]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 267512758; Sun, 3 Nov 2013 16:57:20 +0000 (UTC) Received: from [192.168.168.18] (70-91-206-178-BusName-SFBA.hfc.comcastbusiness.net [70.91.206.178]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by canonware.com (Postfix) with ESMTPSA id 361A42842D; Sun, 3 Nov 2013 08:51:58 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: sshd crash From: Jason Evans In-Reply-To: <20131102153953.GA39106@night.db.net> Date: Sun, 3 Nov 2013 08:51:57 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: <2F2E1775-A459-4D0F-A464-F41B8A7EAB9B@freebsd.org> References: <1383313834.31172.65.camel@revolution.hippie.lan> <1383328423.31172.92.camel@revolution.hippie.lan> <1383343354.31172.102.camel@revolution.hippie.lan> <1383399220.31172.116.camel@revolution.hippie.lan> <20131102153953.GA39106@night.db.net> To: Diane Bruce X-Mailer: Apple Mail (2.1508) Cc: Tim Kientzle , freebsd-arm@FreeBSD.org, Ian Lepore , Howard Su X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Porting FreeBSD to the StrongARM Processor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Nov 2013 16:57:21 -0000 On Nov 2, 2013, at 8:39 AM, Diane Bruce wrote: > On Sat, Nov 02, 2013 at 07:33:40AM -0600, Ian Lepore wrote: >>=20 >> I'm not sure it's a mundane stray-write either. The routine that's >> asserting is checking to see if the contents of a page are all-zero >> because a jemalloc internal flag is set that says it should be. I = had >> the routine print the non-zero data it found, and it looks like this: >>=20 >> not-zero at 0 0x20c99000 =3D 0x20800a00 >> not-zero at 1 0x20c99004 =3D 0x00000001 >> not-zero at 2 0x20c99008 =3D 0x0000002f >> not-zero at 3 0x20c9900c =3D 0xffffffff >> not-zero at 4 0x20c99010 =3D 0x00007fff >> not-zero at 5 0x20c99014 =3D 0x00000003 >> not-zero at 96 0x20c99180 =3D 0x5a5a5a5a >> not-zero at 97 0x20c99184 =3D 0x5a5a5a5a >> not-zero at 98 0x20c99188 =3D 0x5a5a5a5a >>=20 >> The 0x5a continues to the end of the page. So jemalloc has metadata >> that says it thinks the page is all-zeroes, and the page is a mix of >> data and some zeroes and the 5a junk-fill byte. It seems more like = the >> metadata is in error somehow. (Maybe a stray write hit the = metadata.) This looks to me like the sort of thing that would happen if the chunk = page map were corrupted. This could happen due to a double free, = freeing an interior pointer of a multi-page allocation, or a variety of = more complicated errors. The page is filled with 0x5a bytes, yet = jemalloc thinks the page should contain 0x00 bytes, and that implies = that the chunk page table claims this is the first use of the page since = it was mapped. Does this problem reproduce on amd64? If so, I'll dig in and figure out = if jemalloc is to blame. If not on amd64, given enough hand holding re: = hardware acquisition and configuration I can probably be convinced to = set up an ARM system. Thanks, Jason=