From owner-freebsd-arm@FreeBSD.ORG Sun Nov 3 18:06:29 2013 Return-Path: Delivered-To: freebsd-arm@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 389E391D; Sun, 3 Nov 2013 18:06:29 +0000 (UTC) (envelope-from ian@FreeBSD.org) Received: from mho-02-ewr.mailhop.org (mho-02-ewr.mailhop.org [204.13.248.72]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 0A96F2AB3; Sun, 3 Nov 2013 18:06:28 +0000 (UTC) Received: from c-24-8-230-52.hsd1.co.comcast.net ([24.8.230.52] helo=damnhippie.dyndns.org) by mho-02-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.72) (envelope-from ) id 1Vd246-0007Ol-66; Sun, 03 Nov 2013 18:06:22 +0000 Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240]) by damnhippie.dyndns.org (8.14.3/8.14.3) with ESMTP id rA3I6IXl059493; Sun, 3 Nov 2013 11:06:18 -0700 (MST) (envelope-from ian@FreeBSD.org) X-Mail-Handler: Dyn Standard SMTP by Dyn X-Originating-IP: 24.8.230.52 X-Report-Abuse-To: abuse@dyndns.com (see http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse reporting information) X-MHO-User: U2FsdGVkX1/0eXkYqHUo8j+eFIbK44E+ Subject: Re: sshd crash From: Ian Lepore To: Jason Evans In-Reply-To: <2F2E1775-A459-4D0F-A464-F41B8A7EAB9B@freebsd.org> References: <1383313834.31172.65.camel@revolution.hippie.lan> <1383328423.31172.92.camel@revolution.hippie.lan> <1383343354.31172.102.camel@revolution.hippie.lan> <1383399220.31172.116.camel@revolution.hippie.lan> <20131102153953.GA39106@night.db.net> <2F2E1775-A459-4D0F-A464-F41B8A7EAB9B@freebsd.org> Content-Type: text/plain; charset="us-ascii" Date: Sun, 03 Nov 2013 11:06:18 -0700 Message-ID: <1383501978.31172.127.camel@revolution.hippie.lan> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit Cc: Tim Kientzle , freebsd-arm@FreeBSD.org, Howard Su X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Porting FreeBSD to the StrongARM Processor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Nov 2013 18:06:29 -0000 On Sun, 2013-11-03 at 08:51 -0800, Jason Evans wrote: > On Nov 2, 2013, at 8:39 AM, Diane Bruce wrote: > > On Sat, Nov 02, 2013 at 07:33:40AM -0600, Ian Lepore wrote: > >> > >> I'm not sure it's a mundane stray-write either. The routine that's > >> asserting is checking to see if the contents of a page are all-zero > >> because a jemalloc internal flag is set that says it should be. I had > >> the routine print the non-zero data it found, and it looks like this: > >> > >> not-zero at 0 0x20c99000 = 0x20800a00 > >> not-zero at 1 0x20c99004 = 0x00000001 > >> not-zero at 2 0x20c99008 = 0x0000002f > >> not-zero at 3 0x20c9900c = 0xffffffff > >> not-zero at 4 0x20c99010 = 0x00007fff > >> not-zero at 5 0x20c99014 = 0x00000003 > >> not-zero at 96 0x20c99180 = 0x5a5a5a5a > >> not-zero at 97 0x20c99184 = 0x5a5a5a5a > >> not-zero at 98 0x20c99188 = 0x5a5a5a5a > >> > >> The 0x5a continues to the end of the page. So jemalloc has metadata > >> that says it thinks the page is all-zeroes, and the page is a mix of > >> data and some zeroes and the 5a junk-fill byte. It seems more like the > >> metadata is in error somehow. (Maybe a stray write hit the metadata.) > > This looks to me like the sort of thing that would happen if the chunk page map were corrupted. This could happen due to a double free, freeing an interior pointer of a multi-page allocation, or a variety of more complicated errors. The page is filled with 0x5a bytes, yet jemalloc thinks the page should contain 0x00 bytes, and that implies that the chunk page table claims this is the first use of the page since it was mapped. > > Does this problem reproduce on amd64? If so, I'll dig in and figure out if jemalloc is to blame. If not on amd64, given enough hand holding re: hardware acquisition and configuration I can probably be convinced to set up an ARM system. > FWIW, I noticed when re-examining that data yesterday that the 0x5a doesn't continue to the end of the page, it continues until word 328, then the rest of the page is zeroes. I assume that's still consistant with a double-free and other such usage errors. An interesting part of this problem is that the changeset that introduced this problem is the one that makes the malloc-related symbols in libc weak references to the jemalloc implementation. Diane sees some evidence in gdb that there is a non-jemalloc implementation of malloc present in the process. I wonder if we've got something like a mix of statically and dynamically linked code and thus two mallocs somehow? Would allocating a block from one malloc implementation then freeing it to the other be consistant with that asserted data above? I think if this happened on x86 we'd be hearing from a LOT of folks about it. I wonder if it reproduces in an arm emulation environment? I don't know anything about using emulation, but others here do. -- Ian