From owner-freebsd-current@freebsd.org Wed Mar 15 01:19:00 2017 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2B608D0A1D0 for ; Wed, 15 Mar 2017 01:19:00 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-211-181.reflexion.net [208.70.211.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CED29117C for ; Wed, 15 Mar 2017 01:18:59 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 31086 invoked from network); 15 Mar 2017 01:19:47 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 15 Mar 2017 01:19:47 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v8.30.2) with SMTP; Tue, 14 Mar 2017 21:18:58 -0400 (EDT) Received: (qmail 13043 invoked from network); 15 Mar 2017 01:18:58 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 15 Mar 2017 01:18:58 -0000 Received: from [192.168.1.111] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id 40CE3EC7652; Tue, 14 Mar 2017 18:18:57 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!] From: Mark Millard In-Reply-To: <20170314234437.GA25820@cicely7.cicely.de> Date: Tue, 14 Mar 2017 18:18:56 -0700 Cc: Andrew Turner , freebsd-arm , FreeBSD Current , FreeBSD-STABLE Mailing List Content-Transfer-Encoding: 7bit Message-Id: <830B8C57-C9B0-4902-94D2-A8E7F1CB9ADB@dsl-only.net> References: <01735A68-FED6-4E63-964F-0820FE5C446C@dsl-only.net> <16B3D614-62E1-4E58-B409-8DB9DBB35BCB@dsl-only.net> <5BEAFC6C-DA80-4D7B-AB55-977E585D1ACC@dsl-only.net> <20170314234437.GA25820@cicely7.cicely.de> To: ticso@cicely.de X-Mailer: Apple Mail (2.3259) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Mar 2017 01:19:00 -0000 On 2017-Mar-14, at 4:44 PM, Bernd Walter wrote: > On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote: >> [test_check() between the fork and the wait/sleep prevents the >> failure from occurring. Even a small access to the memory at >> that stage prevents the failure. Details follow.] > > Maybe a stupid question, since you might have written it somewhere. > What medium do you swap to? > I've seen broken firmware on microSD cards doing silent data > corruption for some access patterns. The root filesystem is on a USB SSD on a powered hub. Only the kernel is from the microSD card. I have several examples of the USB SSD model and have never observed such problems in any other context. The original issue that started this investigation has been reported by several people on the lists: Failed assertion: "tsd_booted" on arm64 specifically, no other contexts so far as I know. Earlier I had discovered that: A) I could use a swap-in to cause the messages from instances of sh or su that had swapped out earlier. B) The core dumps showed that a large memory region containing the global tsd_booted had all turned to be zero bytes. The assert is just exposing one of those zeros. (tsd_booted is from jemalloc that is in a .so that is loaded.) This prompted me to look for simpler contexts involving swapping that also show memory corruption. So far I've only managed to produce corrupted memory when fork and later swapping are both involved. Being a shared library global is not a requirement for the problem, although such contexts can have an issue. I've not made a simpler example of that yet, although I tried. I have not explored vfork, rfork, or any other alternatives. So far all failure examples end up with zeroed memory when the memory does not match the original pattern from before the fork. At least that is what the core dumps show for all examples that I've looked at. See bugzilla 217138 and 217239. In some respects this example is more analogous to the 217239 context as I remember. My tests on amd64, armv6 (really -mcpu=cortex-a7 so armv7), and powerpc64 have never produced any problems, including never getting the failed assertion. Only arm64. (But I've access to only one arm64 system, a Pine64+ 2GB.) Prior to this I tracked down a different arm64 problem to the fork_trampline code (for the child process) modifying a system register but in a place allowing interrupts that could also change the value. Andrew Turner fixed that one at the time. For this fork/swapping kind of issue I'm not sure that I'll be able to do more than provide the simpler example and the steps that I used. My isolating the internal stage(s) and specific problem(s) at the code level of detail does not seem likely. But whatever is found needs to be able to explain the contrast with an access after the fork but before the swap preventing the failing behavior. So what I've got so far hopefully does provide some hints to someone. === Mark Millard markmi at dsl-only.net