From owner-freebsd-stable@freebsd.org Mon Mar 13 12:06:51 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8D90CCCC83C for ; Mon, 13 Mar 2017 12:06:51 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 7AF7F131C for ; Mon, 13 Mar 2017 12:06:51 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: by mailman.ysv.freebsd.org (Postfix) id 77485CCC83A; Mon, 13 Mar 2017 12:06:51 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 76EC6CCC838 for ; Mon, 13 Mar 2017 12:06:51 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from constantine.ingresso.co.uk (unknown [IPv6:2a02:b90:3002:411::3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 47C20131B for ; Mon, 13 Mar 2017 12:06:50 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from dilbert.london-internal.ingresso.co.uk ([10.64.50.6] helo=dilbert.ingresso.co.uk) by constantine.ingresso.co.uk with esmtps (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.88 (FreeBSD)) (envelope-from ) id 1cnOkS-000EBX-L3 for stable@freebsd.org; Mon, 13 Mar 2017 12:06:48 +0000 Received: from petefrench by dilbert.ingresso.co.uk with local (Exim 4.88 (FreeBSD)) (envelope-from ) id 1cnOkS-0000oL-Ia for stable@freebsd.org; Mon, 13 Mar 2017 12:06:48 +0000 To: stable@freebsd.org Subject: moutnroot failing on zpools in Azure after upgrade from 10 to 11 due to lack of waiting for da0 Message-Id: From: Pete French Date: Mon, 13 Mar 2017 12:06:48 +0000 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Mar 2017 12:06:51 -0000 I have a number of machines in Azure, all booting from ZFS and, until the weekend, running 10.3 perfectly happily. I started upgrading these to 11. The first went fine, the second would not boot. Looking at the boot diagnistics it is having problems finding the root pool to mount. I see this is the diagnostic output: storvsc0: on vmbus0 Solaris: NOTICE: Cannot find the pool label for 'rpool' Mounting from zfs:rpool/ROOT/default failed with error 5. Root mount waiting for: storvsc (probe0:blkvsc0:0:storvsc1: 0:0): on vmbus0 storvsc scsi_status = 2 (da0:blkvsc0:0:0:0): UNMAPPED (probe1:blkvsc1:0:1:0): storvsc scsi_status = 2 hvheartbeat0: on vmbus0 da0 at blkvsc0 bus 0 scbus2 target 0 lun 0 As you can see, the drive da0 only appears after it has tried, and failed, to mount the root pool. Normally I would just stick in a big 'vfs.mountroot.timeout' but that variable doesnt not appear to exist under 11 - or at least it doesnt show up in sysctl. I have one machine which boots fine. I can take the drive of this machine, clone it, and attach to a new VM, and that VM fails to boot! Am now a bit scared to reboot that virtual machine in case it doesnt come back. Can anyone offer any suggestions ? Just being able to delay the mount might be enough if there is a variable which can do that. I do rather need to get these machines back online.... thanks, -pete. From owner-freebsd-stable@freebsd.org Mon Mar 13 12:31:53 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C3F30D05760 for ; Mon, 13 Mar 2017 12:31:53 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id B17BB1870 for ; Mon, 13 Mar 2017 12:31:53 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: by mailman.ysv.freebsd.org (Postfix) id B0DD0D0575F; Mon, 13 Mar 2017 12:31:53 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B08D9D0575E for ; Mon, 13 Mar 2017 12:31:53 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from constantine.ingresso.co.uk (unknown [IPv6:2a02:b90:3002:411::3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 70DF6186D for ; Mon, 13 Mar 2017 12:31:53 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from dilbert.london-internal.ingresso.co.uk ([10.64.50.6] helo=dilbert.ingresso.co.uk) by constantine.ingresso.co.uk with esmtps (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.88 (FreeBSD)) (envelope-from ) id 1cnP8h-000EXc-Jx; Mon, 13 Mar 2017 12:31:51 +0000 Received: from petefrench by dilbert.ingresso.co.uk with local (Exim 4.88 (FreeBSD)) (envelope-from ) id 1cnP8h-0000sD-I5; Mon, 13 Mar 2017 12:31:51 +0000 To: petefrench@ingresso.co.uk, stable@freebsd.org Subject: Re: moutnroot failing on zpools in Azure after upgrade from 10 to 11 due to lack of waiting for da0 In-Reply-To: Message-Id: From: Pete French Date: Mon, 13 Mar 2017 12:31:51 +0000 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Mar 2017 12:31:53 -0000 One extra datapoint - the machines which do not fail are the small DS1_v2 instances. These seem to boot fine, but if I move to the DS2 size then the problem shows up. -pete. From owner-freebsd-stable@freebsd.org Mon Mar 13 13:26:03 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A0EAED083EA for ; Mon, 13 Mar 2017 13:26:03 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 8CBF312AC for ; Mon, 13 Mar 2017 13:26:03 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: by mailman.ysv.freebsd.org (Postfix) id 891E4D083E9; Mon, 13 Mar 2017 13:26:03 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 88C61D083E8 for ; Mon, 13 Mar 2017 13:26:03 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from smtprelay03.ispgateway.de (smtprelay03.ispgateway.de [80.67.31.26]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 51E6F12AB for ; Mon, 13 Mar 2017 13:26:02 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from [78.35.165.189] (helo=fabiankeil.de) by smtprelay03.ispgateway.de with esmtpsa (TLSv1.2:AES256-GCM-SHA384:256) (Exim 4.84) (envelope-from ) id 1cnPpy-0001mX-9y; Mon, 13 Mar 2017 14:16:34 +0100 Date: Mon, 13 Mar 2017 14:14:40 +0100 From: Fabian Keil To: Pete French Cc: stable@freebsd.org Subject: Re: moutnroot failing on zpools in Azure after upgrade from 10 to 11 due to lack of waiting for da0 Message-ID: <20170313141440.7ecf5bc5@fabiankeil.de> In-Reply-To: References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; boundary="Sig_/XdzvgM+tl/bMTg=SyLhYF3C"; protocol="application/pgp-signature" X-Df-Sender: Nzc1MDY3 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Mar 2017 13:26:03 -0000 --Sig_/XdzvgM+tl/bMTg=SyLhYF3C Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Pete French wrote: > I have a number of machines in Azure, all booting from ZFS and, until > the weekend, running 10.3 perfectly happily. >=20 > I started upgrading these to 11. The first went fine, the second would > not boot. Looking at the boot diagnistics it is having problems finding > the root pool to mount. I see this is the diagnostic output: >=20 > storvsc0: on vmbus0 > Solaris: NOTICE: Cannot find the pool label for 'rpool' > Mounting from zfs:rpool/ROOT/default failed with error 5. > Root mount waiting for: storvsc > (probe0:blkvsc0:0:storvsc1: 0: Interface>0): on vmbus0 storvsc scsi_status =3D 2 > (da0:blkvsc0:0:0:0): UNMAPPED > (probe1:blkvsc1:0:1:0): storvsc scsi_status =3D 2 > hvheartbeat0: on vmbus0 > da0 at blkvsc0 bus 0 scbus2 target 0 lun 0 >=20 > As you can see, the drive da0 only appears after it has tried, and > failed, to mount the root pool. >=20 > Normally I would just stick in a big 'vfs.mountroot.timeout' but that > variable doesnt not appear to exist under 11 - or at least it doesnt > show up in sysctl. The variable still exists but is ignored when using ZFS. It's a known issue. You could try this patch: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D208882#c3 Manually specifying the root pool should workaround the issue. sysctl(8) does not show the variable as it's only a tunable. This is unrelated to the update. Fabian --Sig_/XdzvgM+tl/bMTg=SyLhYF3C Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQTKUNd6H/m3+ByGULIFiohV/3dUnQUCWMabQAAKCRAFiohV/3dU nd/LAKCBAjofhEtrU10g5Szt/rBjCu4MFQCgl63d/r9PNgIX2E4fPi9Dfqg5S0E= =c6iF -----END PGP SIGNATURE----- --Sig_/XdzvgM+tl/bMTg=SyLhYF3C-- From owner-freebsd-stable@freebsd.org Mon Mar 13 13:35:30 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 40AC2D0892E for ; Mon, 13 Mar 2017 13:35:30 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 2DD9718EB for ; Mon, 13 Mar 2017 13:35:30 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: by mailman.ysv.freebsd.org (Postfix) id 2D3A8D0892D; Mon, 13 Mar 2017 13:35:30 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2CDCFD0892C for ; Mon, 13 Mar 2017 13:35:30 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from constantine.ingresso.co.uk (unknown [IPv6:2a02:b90:3002:411::3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id F138818EA for ; Mon, 13 Mar 2017 13:35:29 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from dilbert.london-internal.ingresso.co.uk ([10.64.50.6] helo=dilbert.ingresso.co.uk) by constantine.ingresso.co.uk with esmtps (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.88 (FreeBSD)) (envelope-from ) id 1cnQ8F-000FTa-PD; Mon, 13 Mar 2017 13:35:27 +0000 Received: from petefrench by dilbert.ingresso.co.uk with local (Exim 4.88 (FreeBSD)) (envelope-from ) id 1cnQ8F-0001AI-Nd; Mon, 13 Mar 2017 13:35:27 +0000 To: freebsd-listen@fabiankeil.de, petefrench@ingresso.co.uk Subject: Re: moutnroot failing on zpools in Azure after upgrade from 10 to 11 due to lack of waiting for da0 Cc: stable@freebsd.org In-Reply-To: <20170313141440.7ecf5bc5@fabiankeil.de> Message-Id: From: Pete French Date: Mon, 13 Mar 2017 13:35:27 +0000 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Mar 2017 13:35:30 -0000 > The variable still exists but is ignored when using ZFS. > > It's a known issue. You could try this patch: > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D208882#c3 Ah, OK, thanks... > Manually specifying the root pool should workaround the issue. Interesting, I didnt think of trying that. Mainly because it appears to have set the variable correctly, or at least it has the output: Trying to mount root from zfs:rpool/ROOT/default But I shall try that. As soon as I can get one of them up and running again :-( I now have zero machines booting! My diagnosis that it was due to the size of the Vm was wrong, and I rebooted them on the smallest size thinking it would be fine. It wasn't. *sigh* Thankyou! From owner-freebsd-stable@freebsd.org Mon Mar 13 14:43:11 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 52295D0AEA0 for ; Mon, 13 Mar 2017 14:43:11 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3BEAD1FAD for ; Mon, 13 Mar 2017 14:43:11 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v2DEhBkr009530 for ; Mon, 13 Mar 2017 14:43:11 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-stable@FreeBSD.org Subject: [Bug 213903] Kernel crashes from turnstile_broadcast (/usr/src/sys/kern/subr_turnstile.c:837) Date: Mon, 13 Mar 2017 14:43:11 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: crash X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: peixoto.cassiano@gmail.com X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Mar 2017 14:43:11 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D213903 Cassiano Peixoto changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |freebsd-stable@FreeBSD.org, | |peixoto.cassiano@gmail.com --- Comment #11 from Cassiano Peixoto --- Guys, I'm having the same issue here on FreeBSD 10.3-STABLE. I'm using Atom C2758= as well. It has began after 10.3 update. It's very serious issue because many production servers are crashing. Can someone take a look please? Thanks. --=20 You are receiving this mail because: You are on the CC list for the bug.= From owner-freebsd-stable@freebsd.org Mon Mar 13 14:51:57 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 28234D0A1ED for ; Mon, 13 Mar 2017 14:51:57 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 17C8C14DE for ; Mon, 13 Mar 2017 14:51:57 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v2DEpuTg028318 for ; Mon, 13 Mar 2017 14:51:56 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-stable@FreeBSD.org Subject: [Bug 213903] Kernel crashes from turnstile_broadcast (/usr/src/sys/kern/subr_turnstile.c:837) Date: Mon, 13 Mar 2017 14:51:57 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: crash X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: franco@opnsense.org X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Mar 2017 14:51:57 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D213903 --- Comment #12 from Franco Fichtner --- r301157 was backported to 10-STABLE, but 10.3 is unaffected. There is no 10.3-STABLE. Which one did you mean? >From our experience r301157 is the bad commit as the panics have disappeare= d in our latest OPNsense version which reverted the rwlock bits of this particul= ar patch. --=20 You are receiving this mail because: You are on the CC list for the bug.= From owner-freebsd-stable@freebsd.org Mon Mar 13 17:16:58 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0D823D0AFAE for ; Mon, 13 Mar 2017 17:16:58 +0000 (UTC) (envelope-from cordeiro@cert.br) Received: from woq.cert.br (woq.cert.br [IPv6:2001:12ff:0:7000::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9C6091FEB for ; Mon, 13 Mar 2017 17:16:57 +0000 (UTC) (envelope-from cordeiro@cert.br) Received: from luinil.cert.br (luinil.cert.br [IPv6:2001:12ff:0:7001::67]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by woq.cert.br (Postfix) with ESMTPS id 7138741ABD2; Mon, 13 Mar 2017 14:16:53 -0300 (BRT) Received: from luinil.cert.br (luinil.cert.br [IPv6:2001:12ff:0:7001::67]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: cordeiro) by luinil.cert.br (Postfix) with ESMTPSA id BC50EC1D2D7; Mon, 13 Mar 2017 14:16:45 -0300 (BRT) From: Luiz Eduardo Roncato Cordeiro To: freebsd-stable@freebsd.org, misc Subject: Re: [misc] Acessos ao CERT.br: 2017-03-06 a 2017-03-12 Date: Mon, 13 Mar 2017 14:16:29 -0300 Message-ID: <4811266.hiosAuiACX@cert.br> Organization: CERT.br/NIC.br In-Reply-To: <201703131412.02178.chicofig@cert.br> References: <201703131412.02178.chicofig@cert.br> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" User-Agent: NONE X-Virus-Scanned: clamav-milter 0.99.2 at luinil.cert.br X-Virus-Status: Clean X-URL: http://www.cert.br/ X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Mar 2017 17:16:58 -0000 On 13-03-2017 14:12:00 Francisco Jose Candeias Figueiredo wrote: > > ### > ### Access Denied > ### > > #Status Date Time Door Card Name,Dep > Card Rejected 2017-03-10 16:57:46 NU--06P-02-CERT 146 Welinton Lima,HELP DESK > > > ### > ### Access Granted > ### > > [empty] > > ### > ### Access Granted (CERT'ers) > ### > > #Status Date Time Door Card Name,Dep > Card Admitted 2017-03-06 07:54:45 NU--06P-02-CERT 234 Renato Medeiros Junior,CERT > Card Admitted 2017-03-06 08:09:56 NU--06P-02-CERT 152 Dionathan Nakamura,CERT > Card Admitted 2017-03-06 08:15:53 NU--06P-02-CERT 190 Marcus Giraldi,CERT > Card Admitted 2017-03-06 08:17:48 NU--06P-02-CERT 152 Dionathan Nakamura,CERT > Card Admitted 2017-03-06 08:21:04 NU--06P-02-CERT 190 Marcus Giraldi,CERT > Card Admitted 2017-03-06 08:21:16 NU--06P-02-CERT 152 Dionathan Nakamura,CERT > Card Admitted 2017-03-06 08:50:49 NU--06P-02-CERT 152 Dionathan Nakamura,CERT > Card Admitted 2017-03-06 08:57:16 NU--06P-02-CERT 190 Marcus Giraldi,CERT > Card Admitted 2017-03-06 09:49:45 NU--06P-02-CERT 182 Luiz Cordeiro,CERT > Card Admitted 2017-03-06 09:52:09 NU--06P-02-CERT 179 Marcelo Chaves,CERT > Card Admitted 2017-03-06 09:56:06 NU--06P-02-CERT 176 Joao Ceron,CERT > Card Admitted 2017-03-06 09:57:32 NU--06P-02-CERT 139 Cristine Hoepers,CERT > Card Admitted 2017-03-06 10:00:16 NU--06P-02-CERT 234 Renato Medeiros Junior,CERT > Card Admitted 2017-03-06 10:10:52 NU--06P-02-CERT 179 Marcelo Chaves,CERT > Card Admitted 2017-03-06 10:11:53 NU--06P-02-CERT 139 Cristine Hoepers,CERT > Card Admitted 2017-03-06 10:12:18 NU--06P-02-CERT 152 Dionathan Nakamura,CERT > Card Admitted 2017-03-06 10:13:14 NU--06P-02-CERT 186 Klaus Jessen,CERT > Card Admitted 2017-03-06 10:13:48 NU--06P-02-CERT 178 Lucimara Desidera,CERT > Card Admitted 2017-03-06 10:37:14 NU--06P-02-CERT 152 Dionathan Nakamura,CERT > Card Admitted 2017-03-06 10:49:53 NU--06P-02-CERT 190 Marcus Giraldi,CERT > Card Admitted 2017-03-06 10:50:33 NU--06P-02-CERT 234 Renato Medeiros Junior,CERT > Card Admitted 2017-03-06 11:03:57 NU--06P-02-CERT 176 Joao Ceron,CERT > Card Admitted 2017-03-06 11:31:19 NU--06P-02-CERT 167 Francisco Figueiredo,CERT > Card Admitted 2017-03-06 12:01:13 NU--06P-02-CERT 176 Joao Ceron,CERT > Card Admitted 2017-03-06 12:18:26 NU--06P-02-CERT 182 Luiz Cordeiro,CERT > Card Admitted 2017-03-06 12:29:38 NU--06P-02-CERT 186 Klaus Jessen,CERT > Card Admitted 2017-03-06 12:58:19 NU--06P-02-CERT 178 Lucimara Desidera,CERT > Card Admitted 2017-03-06 13:00:37 NU--06P-02-CERT 179 Marcelo Chaves,CERT > Card Admitted 2017-03-06 13:14:56 NU--06P-02-CERT 190 Marcus Giraldi,CERT > Card Admitted 2017-03-06 13:17:59 NU--06P-02-CERT 167 Francisco Figueiredo,CERT > Card Admitted 2017-03-06 13:19:16 NU--06P-02-CERT 152 Dionathan Nakamura,CERT > Card Admitted 2017-03-06 13:25:37 NU--06P-02-CERT 176 Joao Ceron,CERT > Card Admitted 2017-03-06 13:28:05 NU--06P-02-CERT 179 Marcelo Chaves,CERT > Card Admitted 2017-03-06 14:05:23 NU--06P-02-CERT 176 Joao Ceron,CERT > Card Admitted 2017-03-06 14:13:28 NU--06P-02-CERT 234 Renato Medeiros Junior,CERT > Card Admitted 2017-03-06 14:19:43 NU--06P-02-CERT 234 Renato Medeiros Junior,CERT > Card Admitted 2017-03-06 14:22:19 NU--06P-02-CERT 176 Joao Ceron,CERT > Card Admitted 2017-03-06 14:42:44 NU--06P-02-CERT 179 Marcelo Chaves,CERT > Card Admitted 2017-03-06 14:43:31 NU--06P-02-CERT 190 Marcus Giraldi,CERT > Card Admitted 2017-03-06 14:51:23 NU--06P-02-CERT 139 Cristine Hoepers,CERT > Card Admitted 2017-03-06 14:58:17 NU--06P-02-CERT 167 Francisco Figueiredo,CERT > Card Admitted 2017-03-06 15:00:49 NU--06P-02-CERT 176 Joao Ceron,CERT > Card Admitted 2017-03-06 15:03:58 NU--06P-02-CERT 178 Lucimara Desidera,CERT > Card Admitted 2017-03-06 15:06:59 NU--06P-02-CERT 234 Renato Medeiros Junior,CERT > Card Admitted 2017-03-06 15:26:05 NU--06P-02-CERT 139 Cristine Hoepers,CERT > Card Admitted 2017-03-06 15:28:55 NU--06P-02-CERT 190 Marcus Giraldi,CERT > Card Admitted 2017-03-06 15:42:14 NU--06P-02-CERT 190 Marcus Giraldi,CERT > Card Admitted 2017-03-06 16:01:51 NU--06P-02-CERT 167 Francisco Figueiredo,CERT > Card Admitted 2017-03-06 16:07:46 NU--06P-02-CERT 176 Joao Ceron,CERT > Card Admitted 2017-03-06 16:13:01 NU--06P-02-CERT 234 Renato Medeiros Junior,CERT > Card Admitted 2017-03-06 16:13:13 NU--06P-02-CERT 179 Marcelo Chaves,CERT > Card Admitted 2017-03-06 16:13:34 NU--06P-02-CERT 182 Luiz Cordeiro,CERT > Card Admitted 2017-03-06 16:15:21 NU--06P-02-CERT 179 Marcelo Chaves,CERT > Card Admitted 2017-03-06 16:20:27 NU--06P-02-CERT 152 Dionathan Nakamura,CERT > Card Admitted 2017-03-06 17:11:52 NU--06P-02-CERT 186 Klaus Jessen,CERT > Card Admitted 2017-03-06 17:20:37 NU--06P-02-CERT 167 Francisco Figueiredo,CERT > Card Admitted 2017-03-06 18:07:19 NU--06P-02-CERT 176 Joao Ceron,CERT > Card Admitted 2017-03-06 18:43:30 NU--06P-02-CERT 176 Joao Ceron,CERT > > Card Admitted 2017-03-07 07:33:09 NU--06P-02-CERT 167 Francisco Figueiredo,CERT > Card Admitted 2017-03-07 07:52:05 NU--06P-02-CERT 234 Renato Medeiros Junior,CERT > Card Admitted 2017-03-07 08:14:59 NU--06P-02-CERT 152 Dionathan Nakamura,CERT > Card Admitted 2017-03-07 08:17:42 NU--06P-02-CERT 152 Dionathan Nakamura,CERT > Card Admitted 2017-03-07 08:53:28 NU--06P-02-CERT 234 Renato Medeiros Junior,CERT > Card Admitted 2017-03-07 08:59:05 NU--06P-02-CERT 176 Joao Ceron,CERT > Card Admitted 2017-03-07 09:01:25 NU--06P-02-CERT 152 Dionathan Nakamura,CERT > Card Admitted 2017-03-07 09:19:06 NU--06P-02-CERT 182 Luiz Cordeiro,CERT > Card Admitted 2017-03-07 09:23:00 NU--06P-02-CERT 139 Cristine Hoepers,CERT > Card Admitted 2017-03-07 09:25:10 NU--06P-02-CERT 167 Francisco Figueiredo,CERT > Card Admitted 2017-03-07 09:32:22 NU--06P-02-CERT 182 Luiz Cordeiro,CERT > Card Admitted 2017-03-07 09:34:06 NU--06P-02-CERT 167 Francisco Figueiredo,CERT > Card Admitted 2017-03-07 09:36:32 NU--06P-02-CERT 139 Cristine Hoepers,CERT > Card Admitted 2017-03-07 13:10:25 NU--06P-02-CERT 176 Joao Ceron,CERT > Card Admitted 2017-03-07 13:15:26 NU--06P-02-CERT 182 Luiz Cordeiro,CERT > Card Admitted 2017-03-07 13:27:20 NU--06P-02-CERT 152 Dionathan Nakamura,CERT > Nao foi erro, foi dia do 'open door'. :-) > Card Admitted 2017-03-08 07:35:56 NU--06P-02-CERT 167 Francisco Figueiredo,CERT > Card Admitted 2017-03-08 13:49:44 NU--06P-02-CERT 178 Lucimara Desidera,CERT > Card Admitted 2017-03-08 20:29:31 NU--06P-02-CERT 139 Cristine Hoepers,CERT > > Card Admitted 2017-03-09 07:30:04 NU--06P-02-CERT 234 Renato Medeiros Junior,CERT > Card Admitted 2017-03-09 07:41:06 NU--06P-02-CERT 181 Miriam Costa,CERT > Card Admitted 2017-03-09 07:50:55 NU--06P-02-CERT 234 Renato Medeiros Junior,CERT > Card Admitted 2017-03-09 08:51:18 NU--06P-02-CERT 139 Cristine Hoepers,CERT > Card Admitted 2017-03-09 13:50:34 NU--06P-02-CERT 182 Luiz Cordeiro,CERT > Card Admitted 2017-03-09 13:53:09 NU--06P-02-CERT 178 Lucimara Desidera,CERT > Card Admitted 2017-03-09 13:56:21 NU--06P-02-CERT 167 Francisco Figueiredo,CERT > Card Admitted 2017-03-09 13:59:33 NU--06P-02-CERT 152 Dionathan Nakamura,CERT > Card Admitted 2017-03-09 14:01:15 NU--06P-02-CERT 179 Marcelo Chaves,CERT > Card Admitted 2017-03-09 14:08:31 NU--06P-02-CERT 181 Miriam Costa,CERT > Card Admitted 2017-03-09 14:22:09 NU--06P-02-CERT 181 Miriam Costa,CERT > Card Admitted 2017-03-09 14:29:40 NU--06P-02-CERT 182 Luiz Cordeiro,CERT > Card Admitted 2017-03-09 14:31:14 NU--06P-02-CERT 176 Joao Ceron,CERT > Card Admitted 2017-03-09 14:40:04 NU--06P-02-CERT 179 Marcelo Chaves,CERT > Card Admitted 2017-03-09 14:43:01 NU--06P-02-CERT 182 Luiz Cordeiro,CERT > Card Admitted 2017-03-09 19:36:34 NU--06P-02-CERT 178 Lucimara Desidera,CERT > > Card Admitted 2017-03-10 07:30:11 NU--06P-02-CERT 167 Francisco Figueiredo,CERT > Card Admitted 2017-03-10 07:36:06 NU--06P-02-CERT 234 Renato Medeiros Junior,CERT > Card Admitted 2017-03-10 07:42:21 NU--06P-02-CERT 234 Renato Medeiros Junior,CERT > Card Admitted 2017-03-10 07:59:58 NU--06P-02-CERT 181 Miriam Costa,CERT > Card Admitted 2017-03-10 08:12:03 NU--06P-02-CERT 181 Miriam Costa,CERT > Card Admitted 2017-03-10 08:18:12 NU--06P-02-CERT 181 Miriam Costa,CERT > Card Admitted 2017-03-10 08:44:14 NU--06P-02-CERT 152 Dionathan Nakamura,CERT > Card Admitted 2017-03-10 08:55:04 NU--06P-02-CERT 181 Miriam Costa,CERT > Card Admitted 2017-03-10 08:56:20 NU--06P-02-CERT 234 Renato Medeiros Junior,CERT > Card Admitted 2017-03-10 08:56:42 NU--06P-02-CERT 152 Dionathan Nakamura,CERT > Card Admitted 2017-03-10 09:28:43 NU--06P-02-CERT 167 Francisco Figueiredo,CERT > Card Admitted 2017-03-10 09:46:55 NU--06P-02-CERT 182 Luiz Cordeiro,CERT > Card Admitted 2017-03-10 09:49:17 NU--06P-02-CERT 181 Miriam Costa,CERT > Card Admitted 2017-03-10 10:01:05 NU--06P-02-CERT 176 Joao Ceron,CERT > Card Admitted 2017-03-10 10:03:14 NU--06P-02-CERT 152 Dionathan Nakamura,CERT > Card Admitted 2017-03-10 10:03:58 NU--06P-02-CERT 182 Luiz Cordeiro,CERT > Card Admitted 2017-03-10 10:34:50 NU--06P-02-CERT 176 Joao Ceron,CERT > Card Admitted 2017-03-10 10:40:57 NU--06P-02-CERT 178 Lucimara Desidera,CERT > Card Admitted 2017-03-10 10:41:58 NU--06P-02-CERT 179 Marcelo Chaves,CERT > Card Admitted 2017-03-10 11:06:45 NU--06P-02-CERT 179 Marcelo Chaves,CERT > Card Admitted 2017-03-10 11:12:24 NU--06P-02-CERT 234 Renato Medeiros Junior,CERT > Card Admitted 2017-03-10 11:16:12 NU--06P-02-CERT 181 Miriam Costa,CERT > Card Admitted 2017-03-10 11:18:05 NU--06P-02-CERT 178 Lucimara Desidera,CERT > Card Admitted 2017-03-10 11:31:26 NU--06P-02-CERT 176 Joao Ceron,CERT > Card Admitted 2017-03-10 11:44:44 NU--06P-02-CERT 167 Francisco Figueiredo,CERT > Card Admitted 2017-03-10 12:06:03 NU--06P-02-CERT 179 Marcelo Chaves,CERT > Card Admitted 2017-03-10 12:10:07 NU--06P-02-CERT 176 Joao Ceron,CERT > Card Admitted 2017-03-10 12:23:15 NU--06P-02-CERT 152 Dionathan Nakamura,CERT > Card Admitted 2017-03-10 12:24:31 NU--06P-02-CERT 179 Marcelo Chaves,CERT > Card Admitted 2017-03-10 12:50:11 NU--06P-02-CERT 234 Renato Medeiros Junior,CERT > Card Admitted 2017-03-10 13:22:15 NU--06P-02-CERT 176 Joao Ceron,CERT > Card Admitted 2017-03-10 13:28:03 NU--06P-02-CERT 182 Luiz Cordeiro,CERT > Card Admitted 2017-03-10 13:29:21 NU--06P-02-CERT 179 Marcelo Chaves,CERT > Card Admitted 2017-03-10 13:46:15 NU--06P-02-CERT 167 Francisco Figueiredo,CERT > Card Admitted 2017-03-10 14:10:44 NU--06P-02-CERT 176 Joao Ceron,CERT > Card Admitted 2017-03-10 14:20:43 NU--06P-02-CERT 182 Luiz Cordeiro,CERT > Card Admitted 2017-03-10 14:23:23 NU--06P-02-CERT 178 Lucimara Desidera,CERT > Card Admitted 2017-03-10 14:28:27 NU--06P-02-CERT 181 Miriam Costa,CERT > Card Admitted 2017-03-10 14:36:45 NU--06P-02-CERT 182 Luiz Cordeiro,CERT > Card Admitted 2017-03-10 14:39:48 NU--06P-02-CERT 152 Dionathan Nakamura,CERT > Card Admitted 2017-03-10 14:46:18 NU--06P-02-CERT 178 Lucimara Desidera,CERT > Card Admitted 2017-03-10 14:55:12 NU--06P-02-CERT 182 Luiz Cordeiro,CERT > Card Admitted 2017-03-10 15:00:46 NU--06P-02-CERT 167 Francisco Figueiredo,CERT > Card Admitted 2017-03-10 15:05:40 NU--06P-02-CERT 178 Lucimara Desidera,CERT > Card Admitted 2017-03-10 15:31:40 NU--06P-02-CERT 234 Renato Medeiros Junior,CERT > Card Admitted 2017-03-10 15:48:49 NU--06P-02-CERT 181 Miriam Costa,CERT > Card Admitted 2017-03-10 16:20:31 NU--06P-02-CERT 234 Renato Medeiros Junior,CERT > Card Admitted 2017-03-10 16:23:59 NU--06P-02-CERT 178 Lucimara Desidera,CERT > Card Admitted 2017-03-10 16:40:11 NU--06P-02-CERT 234 Renato Medeiros Junior,CERT > Card Admitted 2017-03-10 16:42:11 NU--06P-02-CERT 176 Joao Ceron,CERT > Card Admitted 2017-03-10 16:54:25 NU--06P-02-CERT 167 Francisco Figueiredo,CERT > Card Admitted 2017-03-10 17:34:19 NU--06P-02-CERT 178 Lucimara Desidera,CERT > Card Admitted 2017-03-10 17:44:08 NU--06P-02-CERT 152 Dionathan Nakamura,CERT > Card Admitted 2017-03-10 18:13:53 NU--06P-02-CERT 176 Joao Ceron,CERT > > Card Admitted 2017-03-11 18:44:16 NU--06P-02-CERT 178 Lucimara Desidera,CERT > > _______________________________________________ > misc mailing list > misc@listas.cert.br > https://listas.cert.br/mailman/listinfo/misc From owner-freebsd-stable@freebsd.org Mon Mar 13 18:09:48 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4DAD7D0ADC9; Mon, 13 Mar 2017 18:09:48 +0000 (UTC) (envelope-from cordeiro@cert.br) Received: from woq.cert.br (woq.cert.br [IPv6:2001:12ff:0:7000::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0C4951718; Mon, 13 Mar 2017 18:09:48 +0000 (UTC) (envelope-from cordeiro@cert.br) Received: from luinil.cert.br (luinil.cert.br [IPv6:2001:12ff:0:7001::67]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by woq.cert.br (Postfix) with ESMTPS id 2FAD441AA68; Mon, 13 Mar 2017 15:09:46 -0300 (BRT) Received: from luinil.cert.br (luinil.cert.br [IPv6:2001:12ff:0:7001::67]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: cordeiro) by luinil.cert.br (Postfix) with ESMTPSA id F0F34C1D2E9; Mon, 13 Mar 2017 15:09:33 -0300 (BRT) From: Luiz Eduardo Roncato Cordeiro To: freebsd-stable@freebsd.org Subject: Re: [misc] Acessos ao CERT.br: 2017-03-06 a 2017-03-12 Date: Mon, 13 Mar 2017 15:09:32 -0300 Message-ID: <2477599.VO9nOTsx37@cert.br> Organization: CERT.br/NIC.br In-Reply-To: <4811266.hiosAuiACX@cert.br> References: <201703131412.02178.chicofig@cert.br> <4811266.hiosAuiACX@cert.br> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" User-Agent: NONE X-Virus-Scanned: clamav-milter 0.99.2 at luinil.cert.br X-Virus-Status: Clean X-URL: http://www.cert.br/ X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Mar 2017 18:09:48 -0000 Hello, I've sent an email with the subject above by mistake to FreeBSD Stable list, I'd like to ask the list administrator to delete the whole email thread from this list and archive. I'm sorry, Cordeiro From owner-freebsd-stable@freebsd.org Mon Mar 13 18:53:08 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E4CA3D0A0C9 for ; Mon, 13 Mar 2017 18:53:08 +0000 (UTC) (envelope-from stephane@dupille.org) Received: from mail.nospam.fr.eu.org (saloon.dalton-brothers.org [IPv6:2001:bc8:3ac3::beef:1]) by mx1.freebsd.org (Postfix) with ESMTP id B020D16D9 for ; Mon, 13 Mar 2017 18:53:08 +0000 (UTC) (envelope-from stephane@dupille.org) Received: from [192.168.1.25] (LStLambert-658-1-7-84.w193-248.abo.wanadoo.fr [193.248.42.84]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mail.nospam.fr.eu.org (Postfix) with ESMTPSA id 4E633141C; Mon, 13 Mar 2017 18:53:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dupille.org; s=default; t=1489431180; bh=0UTK4K9tQBic9pPZy/8axeRKFCJCJ04hDuoAjRn/1lQ=; h=Subject:From:In-Reply-To:Date:Cc:References:To; b=D/+2Bm9OqXM3gTjc4urQ97xda2Wr7WIrow0zVLcT1JQx7MgK3nQWljn4i74FnZvKS RgkSfhGG59aqqsb9fOQqWEQZXOoeRNp/kxcfRKPBYCxl6ZSd31K4spYapJsNWD21YJ iccJG/PoaOQamtko3Hycv/nSvO0pFo+wZP7Ey65c= Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: [misc] Acessos ao CERT.br: 2017-03-06 a 2017-03-12 From: =?utf-8?Q?St=C3=A9phane_Dupille?= In-Reply-To: <2477599.VO9nOTsx37@cert.br> Date: Mon, 13 Mar 2017 19:53:02 +0100 Cc: freebsd-stable@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <201703131412.02178.chicofig@cert.br> <4811266.hiosAuiACX@cert.br> <2477599.VO9nOTsx37@cert.br> To: Luiz Eduardo Roncato Cordeiro X-Mailer: Apple Mail (2.3124) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,SHORTCIRCUIT shortcircuit=ham autolearn=disabled version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on spamd X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Mar 2017 18:53:09 -0000 > Le 13 mars 2017 =C3=A0 19:09, Luiz Eduardo Roncato Cordeiro = a =C3=A9crit : >=20 > Hello, Hello, > I've sent an email with the subject above by mistake=20 > to FreeBSD Stable list, I'd like to ask the list=20 > administrator to delete the whole email thread from=20 > this list and archive. Unfortunately, as your email has been forwarded to everyone=E2=80=99s = mailbox, it is impossible to delete it. Hope there=E2=80=99s no secret = inside. From owner-freebsd-stable@freebsd.org Mon Mar 13 19:07:33 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1BDB6D0A4A9 for ; Mon, 13 Mar 2017 19:07:33 +0000 (UTC) (envelope-from etnapierala@gmail.com) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id EBC461E84 for ; Mon, 13 Mar 2017 19:07:32 +0000 (UTC) (envelope-from etnapierala@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id E8628D0A4A8; Mon, 13 Mar 2017 19:07:32 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E65E3D0A4A7 for ; Mon, 13 Mar 2017 19:07:32 +0000 (UTC) (envelope-from etnapierala@gmail.com) Received: from mail-wr0-x231.google.com (mail-wr0-x231.google.com [IPv6:2a00:1450:400c:c0c::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6B51D1E83 for ; Mon, 13 Mar 2017 19:07:32 +0000 (UTC) (envelope-from etnapierala@gmail.com) Received: by mail-wr0-x231.google.com with SMTP id u108so109963295wrb.3 for ; Mon, 13 Mar 2017 12:07:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:mail-followup-to :references:mime-version:content-disposition:in-reply-to:user-agent; bh=9traDHfJLg46HXA2SqAONCHCoFalcEYoMYizA7Mm5Hg=; b=ryPTFoYgZTIL552aOA4AqKff9L9gLUBMDv12HKjbA5R9nEAaF1hjSVvKtquWsmrDjg AR/jxnXz/VoRFS8Rh9YSD0BsezVx46CwE3P9X+6KCQlQcQq7CMUoL6MtvMHx2aE5LA3X qJIJQo+Ct3Hghs2mk5hXV+xGKdFhaSlIsVl+qiynqAqn/USjQBBsVf5FhXvlsQ7pG3f1 cMgPly/0GwBWMqg5hA5z1p2FzPKeY8lfv9jmMQCgOpctJ0KYya+cUXTXjob9sRYt5u3w Sh97IsYx21bow0zPN6XVCIjO5q757eV146hq2KmLA16WQJDSKF6580ulLYz7bbNMRcjY pzfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :in-reply-to:user-agent; bh=9traDHfJLg46HXA2SqAONCHCoFalcEYoMYizA7Mm5Hg=; b=JnMFiKE/3nw++rhQkMgSr1bxuVeOu3oyuOAe4/8zv7N9cRix9WnK3wrY8eCsf91ral PvivN1oFgUljxHTNii62bbffAcJ+8fhQayLyepEBVZxOXVZwrEKTPvDwKPnc8WexE3Ib NksCO6SQPq2x1UKHdUCRk2pW4K186Lu5E4ppCQrFW8r2K4TqbBR8PUxwS5c6CmTQDQqr bA6pgBTA44qsICoi10w8E0wb0NFYC0lKLaFUynxmAJVx1UN7HFWswcEA3tmXmAbwCLLC csa1g3lk8UskNDuubHyQrWxlOv+dyetUeQfpMvbqIxgrXudnacJORf6vESJq3pdlkdLA AjsQ== X-Gm-Message-State: AMke39k4KM8YAZ+D5Suq5QJoBKV7/dMpsEg4Yj2aE4YNHE2zOYdjsowndOdTaL/sMEiXsQ== X-Received: by 10.223.155.17 with SMTP id b17mr28163749wrc.181.1489432050903; Mon, 13 Mar 2017 12:07:30 -0700 (PDT) Received: from brick (cpc92310-cmbg19-2-0-cust934.5-4.cable.virginm.net. [82.9.227.167]) by smtp.gmail.com with ESMTPSA id k43sm26081062wrk.42.2017.03.13.12.07.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 13 Mar 2017 12:07:30 -0700 (PDT) Sender: =?UTF-8?Q?Edward_Tomasz_Napiera=C5=82a?= Date: Mon, 13 Mar 2017 19:07:28 +0000 From: Edward Tomasz =?utf-8?Q?Napiera=C5=82a?= To: Pete French Cc: stable@freebsd.org Subject: Re: moutnroot failing on zpools in Azure after upgrade from 10 to 11 due to lack of waiting for da0 Message-ID: <20170313190728.GA2967@brick> Mail-Followup-To: Pete French , stable@freebsd.org References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.8.0 (2017-02-23) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Mar 2017 19:07:33 -0000 On 0313T1206, Pete French wrote: > I have a number of machines in Azure, all booting from ZFS and, until > the weekend, running 10.3 perfectly happily. > > I started upgrading these to 11. The first went fine, the second would > not boot. Looking at the boot diagnistics it is having problems finding the > root pool to mount. I see this is the diagnostic output: > > storvsc0: on vmbus0 > Solaris: NOTICE: Cannot find the pool label for 'rpool' > Mounting from zfs:rpool/ROOT/default failed with error 5. > Root mount waiting for: storvsc > (probe0:blkvsc0:0:storvsc1: 0:0): on vmbus0 > storvsc scsi_status = 2 > (da0:blkvsc0:0:0:0): UNMAPPED > (probe1:blkvsc1:0:1:0): storvsc scsi_status = 2 > hvheartbeat0: on vmbus0 > da0 at blkvsc0 bus 0 scbus2 target 0 lun 0 > > As you can see, the drive da0 only appears after it has tried, and failed, > to mount the root pool. Are you sure the above transcript is right? There are three reasons I'm asking. First, you'll see the "Root mount waiting" message, which means the root mount code is, well, waiting for storvsc, exactly as expected. Second - there is no "Trying to mount root". But most of all - for some reason the "Mounting failed" is shown _before_ the "Root mount waiting", and I have no idea how this could ever happen. From owner-freebsd-stable@freebsd.org Mon Mar 13 19:34:47 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 47DBED0AD6B for ; Mon, 13 Mar 2017 19:34:47 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 376521F9C for ; Mon, 13 Mar 2017 19:34:47 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v2DJYkK9024853 for ; Mon, 13 Mar 2017 19:34:47 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-stable@FreeBSD.org Subject: [Bug 213903] Kernel crashes from turnstile_broadcast (/usr/src/sys/kern/subr_turnstile.c:837) Date: Mon, 13 Mar 2017 19:34:46 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: crash X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: peixoto.cassiano@gmail.com X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Mar 2017 19:34:47 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D213903 --- Comment #13 from Cassiano Peixoto --- (In reply to Franco Fichtner from comment #12) Hi Franco, I don't know exactly which svn version i'm using, because when i run uname = -a it doesn't show me. But anyway i updated my FreeBSD 10.3 on February 6th. I= s it makes sense to you? How can i revert this commit? Thanks. --=20 You are receiving this mail because: You are on the CC list for the bug.= From owner-freebsd-stable@freebsd.org Mon Mar 13 19:40:46 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3E1ECD0AF55 for ; Mon, 13 Mar 2017 19:40:46 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2D31D1201 for ; Mon, 13 Mar 2017 19:40:46 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v2DJejkr032527 for ; Mon, 13 Mar 2017 19:40:46 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-stable@FreeBSD.org Subject: [Bug 213903] Kernel crashes from turnstile_broadcast (/usr/src/sys/kern/subr_turnstile.c:837) Date: Mon, 13 Mar 2017 19:40:46 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: crash X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: franco@opnsense.org X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Mar 2017 19:40:46 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D213903 --- Comment #14 from Franco Fichtner --- Hi Cassiano, What's your output of uname -v? Can you make sure to include a backtrace here from ddb? type "bt" at the pr= ompt when the panic happens. It may be related but not the same code path. Cheers, Franco --=20 You are receiving this mail because: You are on the CC list for the bug.= From owner-freebsd-stable@freebsd.org Mon Mar 13 19:50:40 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3969AD0A57B for ; Mon, 13 Mar 2017 19:50:40 +0000 (UTC) (envelope-from peter@pean.org) Received: from system.jails.se (system.jails.se [IPv6:2001:470:6c08::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CCC051B8D for ; Mon, 13 Mar 2017 19:50:39 +0000 (UTC) (envelope-from peter@pean.org) Received: from system.jails.se (system.jails.se [172.31.20.14]) by system.jails.se (Postfix) with SMTP id A694950D9FC for ; Mon, 13 Mar 2017 20:50:36 +0100 (CET) Received: from [IPv6:2001:470:de59::63] (unknown [IPv6:2001:470:de59::63]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by system.jails.se (Postfix) with ESMTPSA id 2579350D9FA for ; Mon, 13 Mar 2017 20:50:36 +0100 (CET) From: =?utf-8?Q?Peter_Ankerst=C3=A5l?= Content-Type: multipart/signed; boundary="Apple-Mail=_48888FF0-0E19-4C9F-962A-578FEF9C40B8"; protocol="application/pkcs7-signature"; micalg=sha1 Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: Problem with snmp_wlan Message-Id: <225732EE-1565-46A7-9281-B62173D7EC62@pean.org> Date: Mon, 13 Mar 2017 20:50:32 +0100 To: FreeBSD Stable X-Mailer: Apple Mail (2.3259) X-DSPAM-Result: Innocent X-DSPAM-Processed: Mon Mar 13 20:50:36 2017 X-DSPAM-Confidence: 1.0000 X-DSPAM-Probability: 0.0023 X-DSPAM-Signature: 58c6f80c19031162117088 X-DSPAM-Factors: 27, A+general+failure, 0.40000, wlanIfaceRegDomain+#+#+etsi+3, 0.40000, wlanIfaceRegDomain, 0.40000, wlanIfaceRegDomain, 0.40000, could, 0.40000, 1765+iface+#+get+param, 0.40000, failure+#+#+#+BEGEMOT, 0.40000, Reason+genError+#+general+failure, 0.40000, 47, 0.40000, Error+in+#+#+genError, 0.40000, wlanIfaceRegDomain+#+and, 0.40000, get+param+ioctl+57, 0.40000, argument+#+#+what+could, 0.40000, failed, 0.40000, BEGEMOT+#+#+#+get, 0.40000, When+I+try+to, 0.40000, BEGEMOT+#+MIB, 0.40000, BEGEMOT+#+MIB, 0.40000, ioctl+57, 0.40000, serverside+#+#+#+47, 0.40000, general+#+occured+Failed, 0.40000, do+a+#+of+the, 0.40000, etsi+3+#+#+packet, 0.40000, INTEGER+#+#+Error+in, 0.40000, 13+#+47+25+gw, 0.40000, I+get+the+#+on, 0.40000, Anyone+know+what+could+be, 0.40000 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Mar 2017 19:50:40 -0000 --Apple-Mail=_48888FF0-0E19-4C9F-962A-578FEF9C40B8 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii When I try to do a snmpwalk of the BEGEMOT-WIRELESS-MIB I get the = follwing on the client side: BEGEMOT-WIRELESS-MIB::wlanIfaceRegDomain."wlan2" =3D INTEGER: etsi(3) Error in packet. Reason: (genError) A general failure occured Failed object: BEGEMOT-WIRELESS-MIB::wlanIfaceRegDomain."wlan2" and on the serverside: Mar 13 20:47:25 gw snmpd[1765]: iface wlan0 - get param: ioctl(57) = failed: Invalid argument Anyone know what could be the problem? --Apple-Mail=_48888FF0-0E19-4C9F-962A-578FEF9C40B8 Content-Disposition: attachment; filename=smime.p7s Content-Type: application/pkcs7-signature; name=smime.p7s Content-Transfer-Encoding: base64 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIL1TCCBeIw ggPKoAMCAQICEGunin0K14jWUQr5WeTntOEwDQYJKoZIhvcNAQELBQAwfTELMAkGA1UEBhMCSUwx FjAUBgNVBAoTDVN0YXJ0Q29tIEx0ZC4xKzApBgNVBAsTIlNlY3VyZSBEaWdpdGFsIENlcnRpZmlj YXRlIFNpZ25pbmcxKTAnBgNVBAMTIFN0YXJ0Q29tIENlcnRpZmljYXRpb24gQXV0aG9yaXR5MB4X DTE1MTIxNjAxMDAwNVoXDTMwMTIxNjAxMDAwNVowdTELMAkGA1UEBhMCSUwxFjAUBgNVBAoTDVN0 YXJ0Q29tIEx0ZC4xKTAnBgNVBAsTIFN0YXJ0Q29tIENlcnRpZmljYXRpb24gQXV0aG9yaXR5MSMw IQYDVQQDExpTdGFydENvbSBDbGFzcyAxIENsaWVudCBDQTCCASIwDQYJKoZIhvcNAQEBBQADggEP ADCCAQoCggEBAL192vfDon2D9luC/dtbX64eG3XAtRmvmCSsu1d52DXsCR58zJQbCtB2/A5uFqNx WacpXGGtTCRk9dEDBlmixEd8QiLkUfvHpJX/xKnmVkS6Iye8wUbYzMsDzgnpazlPg19dnSqfhM+C evdfa89VLnUztRr2cgmCfyO9Otrh7LJDPG+4D8ZnAqDtVB8MKYJL6QgKyVhhaBc4y3bGWxKyXEtx 7QIZZGxPwSkzK3WIN+VKNdkiwTubW5PIdopmykwvIjLPqbJK7yPwFZYekKE015OsW6FV+s4DIM8U lVS8pkIsoGGJtMuWjLL4tq2hYQuuN0jhrxK1ljz50hH23gA9cbMCAwEAAaOCAWQwggFgMA4GA1Ud DwEB/wQEAwIBBjAdBgNVHSUEFjAUBggrBgEFBQcDAgYIKwYBBQUHAwQwEgYDVR0TAQH/BAgwBgEB /wIBADAyBgNVHR8EKzApMCegJaAjhiFodHRwOi8vY3JsLnN0YXJ0c3NsLmNvbS9zZnNjYS5jcmww ZgYIKwYBBQUHAQEEWjBYMCQGCCsGAQUFBzABhhhodHRwOi8vb2NzcC5zdGFydHNzbC5jb20wMAYI KwYBBQUHMAKGJGh0dHA6Ly9haWEuc3RhcnRzc2wuY29tL2NlcnRzL2NhLmNydDAdBgNVHQ4EFgQU JIFsOWG+SQ+PtxtGK8kotSdIbWgwHwYDVR0jBBgwFoAUTgvvGqRAW6UXaYcwyjRoQ9BBrvIwPwYD VR0gBDgwNjA0BgRVHSAAMCwwKgYIKwYBBQUHAgEWHmh0dHA6Ly93d3cuc3RhcnRzc2wuY29tL3Bv bGljeTANBgkqhkiG9w0BAQsFAAOCAgEAi+P3h+wBi4StDwECW5zhIycjBL008HACblIf26HY0JdO ruKbrWDsXUsiI0j/7Crft9S5oxvPiDtVqspBOB/y5uzSns1lZwh7sG96bYBZpcGzGxpFNjDmQbcM 3yl3WFIRS4WhNrsOY14V7y2IrUGsvetsD+bjyOngCIVeC/GmsmtbuLOzJ606tEc9uRbhjTu/b0x2 Fo+/e7UkQvKzNeo7OMhijixaULyINBfCBJb+e29bLafgu6JqjOUJ9eXXj20p6q/CW+uVrZiSW57+ q5an2P2i7hP85jQJcy5j4HzA0rSiF3YPhKGAWUxKPMAVGgcYoXzWydOvZ3UDsTDTagXpRDIKQLZo 02wrlxY6iMFqvlzsemVf1odhQJmi7Eh5TbxI40kDGcBOBHhwnaOumZhLP+SWJQnjpLpSlUOj95uf 1zo9oz9e0NgIJoz/tdfrBzez76xtDsK0KfUDHt1/q59BvDI7RX6gVr0fQoCyMczNzCTcRXYHY0tq 2J0oT+bsb6sH2b4WVWAiJKnSYaWDjdA70qHX4mq9MIjO/ZskmSY8wtAk24orAc0vwXgYanqNsBX5 Yv4sN4Z9VyrwMdLcusP7HJgRdAGKpkR2I9U4zEsNJQJewM7S4Jalo1DyPrLpL2nTET8ZrSl5Utp1 UeGp/2deoprGevfnxWB+vHNQiu85o6MwggXrMIIE06ADAgECAhAVg7EhX8r2LDRKhDrOrr8zMA0G CSqGSIb3DQEBCwUAMHUxCzAJBgNVBAYTAklMMRYwFAYDVQQKEw1TdGFydENvbSBMdGQuMSkwJwYD VQQLEyBTdGFydENvbSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eTEjMCEGA1UEAxMaU3RhcnRDb20g Q2xhc3MgMSBDbGllbnQgQ0EwHhcNMTcwMTI2MjAxNDMwWhcNMjAwNDI2MjAxNDMwWjA4MRcwFQYD VQQDDA5wZXRlckBwZWFuLm9yZzEdMBsGCSqGSIb3DQEJARYOcGV0ZXJAcGVhbi5vcmcwggIiMA0G CSqGSIb3DQEBAQUAA4ICDwAwggIKAoICAQDcqYms37M3iO33p6LWK/fj7JFLGVacfvZf4CaHyg8m jY4sVP9HzeB6A/FOk0fvxDvK0Q7dIkoQdniS7DKcsBXpJ5s+tpszOhQ36RpD3B0xao3z0sI+9MyK 6IDu7pjxunC5qLYnVkcDjBPJ0X8qyR/bSvUQ3kBEOppPs8ol8GHsiRSy3TJL2wGapAdA+1r2KCqe eHrrCTGj4Dl7xvgUkfii6wShPH0yu66raHvdN6DHUyb1EFgS70HZ22+HuffqGvOB+iZZUeE9UQT0 pbzgCcHfkXfgRtNkzKDzrfYmJi9oTIpfyvusu8F9B9L3rZM6V2Stag4LLAo+zhsX1quM20Ilo71U GPhLgvDNjJnx1qli3tAyddxMhqJMhcRYDScIoIi6xZ4jNJvMlHJGTq29oH+A2TjAmM+gJY+0p4RB vVhNf7e0jSaVeHei+H+q9OlQmylXC1GzcUrzFDqWLDB70Sta20rQakZEFsQ+e+shxmj4AakCxY4D x5PvyWk48JWtmfaXboDG8Lr5RaULjHGEtg6ULVQdYakJDuCkjyYtZSZtC8PKk1uFzJu4yhfX9vOb VEabLeO5dSvNWYllUQdOP9nuNh5ZnHxEIHA1k/UgRvdwootCJ4TrTfHp7fQLbMP7AE53x88/M++A wNofKHoNqE7iPh1s9Os0ZWi/czCiFRI7wwIDAQABo4IBsjCCAa4wDgYDVR0PAQH/BAQDAgSwMB0G A1UdJQQWMBQGCCsGAQUFBwMCBggrBgEFBQcDBDAJBgNVHRMEAjAAMB0GA1UdDgQWBBSxDZ1/nnS6 biObk7mYFx6CSYKuFzAfBgNVHSMEGDAWgBQkgWw5Yb5JD4+3G0YrySi1J0htaDBvBggrBgEFBQcB AQRjMGEwJAYIKwYBBQUHMAGGGGh0dHA6Ly9vY3NwLnN0YXJ0c3NsLmNvbTA5BggrBgEFBQcwAoYt aHR0cDovL2FpYS5zdGFydHNzbC5jb20vY2VydHMvc2NhLmNsaWVudDEuY3J0MDgGA1UdHwQxMC8w LaAroCmGJ2h0dHA6Ly9jcmwuc3RhcnRzc2wuY29tL3NjYS1jbGllbnQxLmNybDAZBgNVHREEEjAQ gQ5wZXRlckBwZWFuLm9yZzAjBgNVHRIEHDAahhhodHRwOi8vd3d3LnN0YXJ0c3NsLmNvbS8wRwYD VR0gBEAwPjA8BgsrBgEEAYG1NwECBTAtMCsGCCsGAQUFBwIBFh9odHRwczovL3d3dy5zdGFydHNz bC5jb20vcG9saWN5MA0GCSqGSIb3DQEBCwUAA4IBAQCbKZGNOgGhchJ0IcN9rOEy8cwnHlBVDBTc kCdh6HPTeb7SiPmDLxJ1mp2ptKMjVDItkV9golRi4zWW0Q+aT8lJSbmLRWnTJflQB8zhbvSHwFzU VlsYEJBBUrMrfBeowZIcDLTr5VjmC7WysSSIAPyOLtbbIhYWVDiRc7FR3cMzMx0JHByg8iZqJ5/d S7CXj5NiRb8jp3Uo9Wo5o8qwuA0YQ/7ld7tZbE47jAQ6gOQ/J+yBNWCXOjklFmXeI6fxITO5XTq/ +SN1rp4lMR5KfahwYBf0m0jeZQbxek8XTTa1qHfDuZWdKP9Nab2LPYhOs+ShIMb3BNBgiJe7a3H7 yCwjMYIETjCCBEoCAQEwgYkwdTELMAkGA1UEBhMCSUwxFjAUBgNVBAoTDVN0YXJ0Q29tIEx0ZC4x KTAnBgNVBAsTIFN0YXJ0Q29tIENlcnRpZmljYXRpb24gQXV0aG9yaXR5MSMwIQYDVQQDExpTdGFy dENvbSBDbGFzcyAxIENsaWVudCBDQQIQFYOxIV/K9iw0SoQ6zq6/MzAJBgUrDgMCGgUAoIIBmTAY BgkqhkiG9w0BCQMxCwYJKoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNzAzMTMxOTUwMzJaMCMG CSqGSIb3DQEJBDEWBBSE20E0eRklnEObTNgwYT3V7bTOmDCBmgYJKwYBBAGCNxAEMYGMMIGJMHUx CzAJBgNVBAYTAklMMRYwFAYDVQQKEw1TdGFydENvbSBMdGQuMSkwJwYDVQQLEyBTdGFydENvbSBD ZXJ0aWZpY2F0aW9uIEF1dGhvcml0eTEjMCEGA1UEAxMaU3RhcnRDb20gQ2xhc3MgMSBDbGllbnQg Q0ECEBWDsSFfyvYsNEqEOs6uvzMwgZwGCyqGSIb3DQEJEAILMYGMoIGJMHUxCzAJBgNVBAYTAklM MRYwFAYDVQQKEw1TdGFydENvbSBMdGQuMSkwJwYDVQQLEyBTdGFydENvbSBDZXJ0aWZpY2F0aW9u IEF1dGhvcml0eTEjMCEGA1UEAxMaU3RhcnRDb20gQ2xhc3MgMSBDbGllbnQgQ0ECEBWDsSFfyvYs NEqEOs6uvzMwDQYJKoZIhvcNAQEBBQAEggIAX5SsMzyClY4tmkULM206uaQcpQEImxx8G2VTFV9z ML1V5kfmH9OSNXfjRsgPzGYAn4XexZeIpL19fKtweHfV6VzDsD3E7+YrfMz92K25RxPzljo2cnTH wPrUHb4yX0TbVqnajXQxKbIGLnLa48oD9TDX4mKxkAZrRM6uGcsiUCsMnQrZHkprz0XZ01aqTBPn RG2uCQBzwiZ7NTw/s+xOfszf2j+9RXiN/DQopDrUm2XSbULgfThr0ElEU+TH0XnDdl188Z2Rqzfo UTNNPiIbG8OTKIrpYwDcMhDK58Gb8oYvsKweWhT7jCC0k77p/2blchS9n32gmoWepieF/+eQO9Mj cXKlh3aXF4mwykEu0KgILginRLlL20rIR2JWVemcOnOlfY2VTx36I5X9WXbPqwSe8qoCooL2PbJl ST6cRXyxTZBq+l+vVRhd8WpSiQAkg4aY5sLlq3sVxCXF+Oc2S9UDNb7+2jPwZBBKOpwBmGa0U8hB deIdrtNQCaAgY/tvL0DNHVmpu8SBppP98kTXT+UDEt3MbBZwZW8hpMICBw/oEAxYyP1aJ3qkcJPO aCCJ3yDq5DGZzx/Cl3QKNPfyAWkFStKrmtlN2H1GzX58gOppZ2YqN43j3T9HhN91wuKbsS/Y0M5K H0Utdkg3hJPEaeh4VLQCDoc2BXZam9fStZEAAAAAAAA= --Apple-Mail=_48888FF0-0E19-4C9F-962A-578FEF9C40B8-- From owner-freebsd-stable@freebsd.org Mon Mar 13 20:12:41 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 83C10D0AF77 for ; Mon, 13 Mar 2017 20:12:41 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 68D711BDC for ; Mon, 13 Mar 2017 20:12:41 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v2DKCdpg023363 for ; Mon, 13 Mar 2017 20:12:41 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-stable@FreeBSD.org Subject: [Bug 213903] Kernel crashes from turnstile_broadcast (/usr/src/sys/kern/subr_turnstile.c:837) Date: Mon, 13 Mar 2017 20:12:40 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: crash X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: peixoto.cassiano@gmail.com X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Mar 2017 20:12:41 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D213903 --- Comment #15 from Cassiano Peixoto --- (In reply to Franco Fichtner from comment #14) Hi Franco, Here it is: FreeBSD 10.3-STABLE #4: Mon Feb 6 09:29:52 BRST 2017=20=20=20=20 root@bgp.server.us:/usr/obj/usr/src/sys/GENERIC My debug bellow: # kgdb kernel.debug /var/crash/vmcore.last=20 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain condition= s. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid =3D 7; apic id =3D 0e fault virtual address =3D 0x30 fault code =3D supervisor read data, page not present instruction pointer =3D 0x20:0xffffffff80b2b4fa stack pointer =3D 0x28:0xfffffe0237a4f450 frame pointer =3D 0x28:0xfffffe0237a4f480 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D resume, IOPL =3D 0 current process =3D 79530 (sh) trap number =3D 12 panic: page fault cpuid =3D 7 KDB: stack backtrace: #0 0xffffffff80b16230 at kdb_backtrace+0x60 #1 0xffffffff80ad7036 at vpanic+0x126 #2 0xffffffff80ad6f03 at panic+0x43 #3 0xffffffff80f810cd at trap_fatal+0x35d #4 0xffffffff80f813e8 at trap_pfault+0x308 #5 0xffffffff80f80a2a at trap+0x47a #6 0xffffffff80f661dc at calltrap+0x8 #7 0xffffffff80ad4d80 at __rw_wunlock_hard+0x90 #8 0xffffffff80dffd9a at vm_map_delete+0x33a #9 0xffffffff80e01b47 at vm_map_remove+0x47 #10 0xffffffff80a96759 at exec_new_vmspace+0x1e9 #11 0xffffffff80a73284 at exec_elf64_imgact+0xa44 #12 0xffffffff80a94ec4 at kern_execve+0x7d4 #13 0xffffffff80a9438c at sys_execve+0x4c #14 0xffffffff80f81b00 at amd64_syscall+0x450 #15 0xffffffff80f664cb at Xfast_syscall+0xfb Uptime: 19h0m34s Dumping 1063 out of 8149 MB: (CTRL-C to abort) ..2%..11%..22%..31%..41%..52%..61%..71%..82%..91% Reading symbols from /boot/kernel.off/coretemp.ko.symbols...done. Loaded symbols for /boot/kernel.off/coretemp.ko.symbols Reading symbols from /boot/modules/plcm.ko...done. Loaded symbols for /boot/modules/plcm.ko #0 doadump (textdump=3D) at pcpu.h:219 219 pcpu.h: No such file or directory. in pcpu.h (kgdb) list *0xffffffff80b2b4fa 0xffffffff80b2b4fa is in turnstile_broadcast (/usr/src/sys/kern/subr_turnstile.c:838). 833=20=20=20=20=20 834 /* 835 * Transfer the blocked list to the pending list. 836 */ 837 mtx_lock_spin(&td_contested_lock); 838 TAILQ_CONCAT(&ts->ts_pending, &ts->ts_blocked[queue], td_lockq); 839 mtx_unlock_spin(&td_contested_lock); 840=20=20=20=20=20 841 /* 842 * Give a turnstile to each thread. The last thread gets Current language: auto; currently minimal (kgdb) bt #0 doadump (textdump=3D) at pcpu.h:219 #1 0xffffffff80ad6c53 in kern_reboot (howto=3D260) at /usr/src/sys/kern/kern_shutdown.c:486 #2 0xffffffff80ad7075 in vpanic (fmt=3D, ap=3D) at /usr/src/sys/kern/kern_shutdown.c:889 #3 0xffffffff80ad6f03 in panic (fmt=3D0x0) at /usr/src/sys/kern/kern_shutdown.c:818 #4 0xffffffff80f810cd in trap_fatal (frame=3D, eva=3D= ) at /usr/src/sys/amd64/amd64/trap.c:858 #5 0xffffffff80f813e8 in trap_pfault (frame=3D0xfffffe0237a4f3a0, usermode=3D) at /usr/src/sys/amd64/amd64/trap.c:681 #6 0xffffffff80f80a2a in trap (frame=3D0xfffffe0237a4f3a0) at /usr/src/sys/amd64/amd64/trap.c:447 #7 0xffffffff80f661dc in calltrap () at /usr/src/sys/amd64/amd64/exception.S:238 #8 0xffffffff80b2b4fa in turnstile_broadcast (ts=3D0x0, queue=3D1) at /usr/src/sys/kern/subr_turnstile.c:838 #9 0xffffffff80ad4d80 in __rw_wunlock_hard (c=3D0xfffff8013a3b5318, tid=3D= 1, file=3D0xfffff80009947001 "8?\201????", line=3D1) at /usr/src/sys/kern/kern_rwlock.c:1027 #10 0xffffffff80dffd9a in vm_map_delete (map=3D0xfffff8000c22b8c0, start=3D= , end=3D140737488355328) at /usr/src/sys/vm/vm_map.c:2911 #11 0xffffffff80e01b47 in vm_map_remove (map=3D0xfffff8000c22b8c0, start=3D140737488355328, end=3D1) at /usr/src/sys/vm/vm_map.c:3028 #12 0xffffffff80a96759 in exec_new_vmspace (imgp=3D0xfffffe0237a4f868, sv=3D0xffffffff819858e8) at /usr/src/sys/kern/kern_exec.c:1084 #13 0xffffffff80a73284 in exec_elf64_imgact (imgp=3D0xfffffe0237a4f868) at /usr/src/sys/kern/imgact_elf.c:881 #14 0xffffffff80a94ec4 in kern_execve (td=3D0xfffff80009947000, args=3D0xfffffe0237a4fa78, mac_p=3D) at /usr/src/sys/kern/kern_exec.c:606 #15 0xffffffff80a9438c in sys_execve (td=3D0xfffff80009947000, uap=3D) at /usr/src/sys/kern/kern_exec.c:222 #16 0xffffffff80f81b00 in amd64_syscall (td=3D0xfffff80009947000, traced=3D= 0) at subr_syscall.c:141 #17 0xffffffff80f664cb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:398 #18 0x0000000800d7a97a in ?? () Previous frame inner to this frame (corrupt stack?) Thanks for your help. --=20 You are receiving this mail because: You are on the CC list for the bug.= From owner-freebsd-stable@freebsd.org Tue Mar 14 04:03:42 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 86691D091C2 for ; Tue, 14 Mar 2017 04:03:42 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6D8F71D18 for ; Tue, 14 Mar 2017 04:03:42 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v2E43fM4036712 for ; Tue, 14 Mar 2017 04:03:42 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-stable@FreeBSD.org Subject: [Bug 213903] Kernel crashes from turnstile_broadcast (/usr/src/sys/kern/subr_turnstile.c:837) Date: Tue, 14 Mar 2017 04:03:41 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: crash X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: mjg@FreeBSD.org X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Mar 2017 04:03:42 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D213903 --- Comment #16 from Mateusz Guzik --- Can you please reproduce with https://people.freebsd.org/~mjg/patches/rwlock-debug-10.diff appliled on to= p. E.g. like this: cd /usr/src fetch https://people.freebsd.org/~mjg/patches/rwlock-debug-10.diff patch -p1 < rwlock-debug-10.diff --=20 You are receiving this mail because: You are on the CC list for the bug.= From owner-freebsd-stable@freebsd.org Tue Mar 14 06:52:50 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E3AECD0BE0F for ; Tue, 14 Mar 2017 06:52:50 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-211-180.reflexion.net [208.70.211.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A4E3C1D6A for ; Tue, 14 Mar 2017 06:52:50 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 5999 invoked from network); 14 Mar 2017 06:53:32 -0000 Received: from unknown (HELO mail-cs-02.app.dca.reflexion.local) (10.81.19.2) by 0 (rfx-qmail) with SMTP; 14 Mar 2017 06:53:32 -0000 Received: by mail-cs-02.app.dca.reflexion.local (Reflexion email security v8.30.2) with SMTP; Tue, 14 Mar 2017 02:52:43 -0400 (EDT) Received: (qmail 18463 invoked from network); 14 Mar 2017 06:52:43 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 14 Mar 2017 06:52:43 -0000 Received: from [192.168.1.111] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id 80BBBEC8662; Mon, 13 Mar 2017 23:52:42 -0700 (PDT) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: amd64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) Message-Id: <01735A68-FED6-4E63-964F-0820FE5C446C@dsl-only.net> Date: Mon, 13 Mar 2017 23:52:41 -0700 Cc: Andrew Turner To: freebsd-arm , FreeBSD Current , FreeBSD-STABLE Mailing List X-Mailer: Apple Mail (2.3259) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Mar 2017 06:52:51 -0000 I'm still at a loss about how to figure out what stages are messed up. (Memory coherency? Some memory not swapped out? Bad data swapped out? Wrong data swapped in?) But at least I've found a much smaller/simpler example to demonstrate some problem with in my Pine64+_ 2GB context. The Pine64+ 2GB is the only amd64 context that I have access to. The following program fails its check for data having its expected byte pattern in dynamically allocated memory after a fork/swap-out/swap-in sequence. I'll note that the program sleeps for 60s after forking to give time to do something else to cause the parent and child processes to swap out (RES=3D0 as seen in top). Note the source code line: // test_check(); // Adding this line prevents failure. It seem that accessing the region contents before forking and swapping avoids the problem. But there is a problem if the region was only written-to before the fork/swap. Another point is the size of the region matters: <=3D 14K Bytes fails and > 14K Bytes works for as much has I have tested. # more swap_testing.c // swap_testing.c // Built via (c++ was clang++ 4.0 in my case): // // cc -g -std=3Dc11 -Wpedantic swap_testing.c // -O0 and -O2 also gets the problem. #include // for fork(), sleep(.) #include // for pid_t #include // for wait(.) extern void test_setup(void); // Sets up the memory byte pattern. extern void test_check(void); // Tests the memory byte pattern. int main(void) { test_setup(); // test_check(); // Adding this line prevents failure. pid_t pid =3D fork(); int wait_status =3D 0;; if (0 // for bool, true, false #include // for size_t, NULL #include // for malloc(.), free(.) #include // for raise(.), SIGABRT #define region_size (14u*1024u) // Bad dyn_region pattern, parent and child // processes: // 256u, 4u*1024u, 8u*1024u, 9u*1024u, // 12u*1024u, 14u*1024u // Works: // 14u*1024u+1u, 15u*1024u, 16u*1024u, // 32u*1024u, 256u*1024u*1024u typedef volatile unsigned char value_type; struct region_struct { value_type array[region_size]; }; typedef struct region_struct region; static region gbl_region; static region * volatile dyn_region =3D NULL; static value_type value(size_t v) { return (value_type)v; } void test_setup(void) { dyn_region =3D malloc(sizeof(region)); if (!dyn_region) raise(SIGABRT); for(size_t i=3D0u; i 103 if (dyn_failed) raise(SIGABRT); // lldb reports this line = for the __raise call. 104 // when it fails (both = parent and child processes). 105 } (lldb) print dyn_pos (size_t) $0 =3D 2 (That is one after the failure position.) (lldb) print dyn_region (region *volatile) $3 =3D 0x0000000040616000 (lldb) print *dyn_region (region) $1 =3D { array =3D { [0] =3D '\0' [1] =3D '\0' [2] =3D '\0' . . . (all '\0' bytes) . . . [251] =3D '\0' [252] =3D '\0' [253] =3D '\0' [254] =3D '\0' [255] =3D '\0' ... } } (lldb) print gbl_region (region) $2 =3D { array =3D { [0] =3D '\0' [1] =3D '\x01' [2] =3D '\x02' . . . [251] =3D '\xfb' [252] =3D '\xfc' [253] =3D '\xfd' [254] =3D '\xfe' [255] =3D '\xff' ... } } (lldb) disass -n main a.out`main: 0x2022c <+0>: sub sp, sp, #0x30 ; =3D0x30=20 0x20230 <+4>: stp x29, x30, [sp, #0x20] 0x20234 <+8>: add x29, sp, #0x20 ; =3D0x20=20 0x20238 <+12>: stur wzr, [x29, #-0x4] 0x2023c <+16>: bl 0x202b0 ; test_setup at = swap_testing.c:74 0x20240 <+20>: bl 0x20580 ; symbol stub for: = fork 0x20244 <+24>: mov w8, wzr 0x20248 <+28>: stur w0, [x29, #-0x8] 0x2024c <+32>: stur wzr, [x29, #-0xc] 0x20250 <+36>: ldur w0, [x29, #-0x8] 0x20254 <+40>: cmp w8, w0 0x20258 <+44>: b.ge 0x20268 ; <+60> at = swap_testing.c 0x2025c <+48>: sub x0, x29, #0xc ; =3D0xc=20 0x20260 <+52>: bl 0x20590 ; symbol stub for: = wait 0x20264 <+56>: str w0, [sp, #0x10] 0x20268 <+60>: mov w8, #-0x1 0x2026c <+64>: ldur w9, [x29, #-0xc] 0x20270 <+68>: cmp w8, w9 0x20274 <+72>: b.eq 0x202a0 ; <+116> at = swap_testing.c:44 0x20278 <+76>: mov w8, wzr 0x2027c <+80>: ldur w9, [x29, #-0x8] 0x20280 <+84>: cmp w8, w9 0x20284 <+88>: b.gt 0x202a0 ; <+116> at = swap_testing.c:44 0x20288 <+92>: ldur w8, [x29, #-0x8] 0x2028c <+96>: cbnz w8, 0x2029c ; <+112> at = swap_testing.c:42 0x20290 <+100>: orr w0, wzr, #0x3c 0x20294 <+104>: bl 0x205a0 ; symbol stub for: = sleep 0x20298 <+108>: str w0, [sp, #0xc] 0x2029c <+112>: bl 0x20348 ; test_check at = swap_testing.c:89 0x202a0 <+116>: ldur w0, [x29, #-0x4] 0x202a4 <+120>: ldp x29, x30, [sp, #0x20] 0x202a8 <+124>: add sp, sp, #0x30 ; =3D0x30=20 0x202ac <+128>: ret =20 (lldb) disass -n value a.out`value: 0x204cc <+0>: sub sp, sp, #0x10 ; =3D0x10=20 0x204d0 <+4>: str x0, [sp, #0x8] 0x204d4 <+8>: ldrb w8, [sp, #0x8] 0x204d8 <+12>: mov w1, w8 0x204dc <+16>: mov w0, w8 0x204e0 <+20>: str w1, [sp, #0x4] 0x204e4 <+24>: add sp, sp, #0x10 ; =3D0x10=20 0x204e8 <+28>: ret =20 (lldb) disass -n test_setup a.out`test_setup: 0x202b0 <+0>: sub sp, sp, #0x20 ; =3D0x20=20 0x202b4 <+4>: stp x29, x30, [sp, #0x10] 0x202b8 <+8>: add x29, sp, #0x10 ; =3D0x10=20 0x202bc <+12>: orr x0, xzr, #0x3800 0x202c0 <+16>: bl 0x205b0 ; symbol stub for: = malloc 0x202c4 <+20>: adrp x30, 48 0x202c8 <+24>: add x30, x30, #0x0 ; =3D0x0=20 0x202cc <+28>: str x0, [x30] 0x202d0 <+32>: ldr x0, [x30] 0x202d4 <+36>: cbnz x0, 0x202e4 ; <+52> at = swap_testing.c:78 0x202d8 <+40>: orr w0, wzr, #0x6 0x202dc <+44>: bl 0x205c0 ; symbol stub for: = raise 0x202e0 <+48>: str w0, [sp, #0x4] 0x202e4 <+52>: str xzr, [sp, #0x8] 0x202e8 <+56>: orr x8, xzr, #0x3800 0x202ec <+60>: ldr x9, [sp, #0x8] 0x202f0 <+64>: cmp x9, x8 0x202f4 <+68>: b.hs 0x2033c ; <+140> at = swap_testing.c:81 0x202f8 <+72>: ldr x0, [sp, #0x8] 0x202fc <+76>: bl 0x204cc ; value at = swap_testing.c:72 0x20300 <+80>: adrp x30, 48 0x20304 <+84>: add x30, x30, #0x0 ; =3D0x0=20 0x20308 <+88>: adrp x8, 48 0x2030c <+92>: add x8, x8, #0x8 ; =3D0x8=20 0x20310 <+96>: ldr x9, [sp, #0x8] 0x20314 <+100>: add x8, x8, x9 0x20318 <+104>: strb w0, [x8] 0x2031c <+108>: ldr x8, [x30] 0x20320 <+112>: ldr x9, [sp, #0x8] 0x20324 <+116>: add x8, x8, x9 0x20328 <+120>: strb w0, [x8] 0x2032c <+124>: ldr x8, [sp, #0x8] 0x20330 <+128>: add x8, x8, #0x1 ; =3D0x1=20 0x20334 <+132>: str x8, [sp, #0x8] 0x20338 <+136>: b 0x202e8 ; <+56> at = swap_testing.c 0x2033c <+140>: ldp x29, x30, [sp, #0x10] 0x20340 <+144>: add sp, sp, #0x20 ; =3D0x20=20 0x20344 <+148>: ret =20 (lldb) disass -n test_check a.out`test_check: 0x20348 <+0>: sub sp, sp, #0x20 ; =3D0x20=20 0x2034c <+4>: stp x29, x30, [sp, #0x10] 0x20350 <+8>: add x29, sp, #0x10 ; =3D0x10=20 0x20354 <+12>: b 0x20358 ; <+16> at = swap_testing.c 0x20358 <+16>: mov w8, wzr 0x2035c <+20>: adrp x9, 51 0x20360 <+24>: add x9, x9, #0x808 ; =3D0x808=20 0x20364 <+28>: ldrb w10, [x9] 0x20368 <+32>: stur w8, [x29, #-0x4] 0x2036c <+36>: tbnz w10, #0x0, 0x2038c ; <+68> at = swap_testing.c 0x20370 <+40>: orr x8, xzr, #0x3800 0x20374 <+44>: adrp x9, 51 0x20378 <+48>: add x9, x9, #0x810 ; =3D0x810=20 0x2037c <+52>: ldr x9, [x9] 0x20380 <+56>: cmp x9, x8 0x20384 <+60>: cset w10, lo 0x20388 <+64>: stur w10, [x29, #-0x4] 0x2038c <+68>: ldur w8, [x29, #-0x4] 0x20390 <+72>: tbz w8, #0x0, 0x203ec ; <+164> at = swap_testing.c:95 0x20394 <+76>: adrp x8, 51 0x20398 <+80>: add x8, x8, #0x810 ; =3D0x810=20 0x2039c <+84>: ldr x0, [x8] 0x203a0 <+88>: bl 0x204cc ; value at = swap_testing.c:72 0x203a4 <+92>: adrp x8, 51 0x203a8 <+96>: add x8, x8, #0x810 ; =3D0x810=20 0x203ac <+100>: adrp x30, 51 0x203b0 <+104>: add x30, x30, #0x808 ; =3D0x808=20 0x203b4 <+108>: adrp x9, 48 0x203b8 <+112>: add x9, x9, #0x8 ; =3D0x8=20 0x203bc <+116>: uxtb w0, w0 0x203c0 <+120>: ldr x10, [x8] 0x203c4 <+124>: add x9, x9, x10 0x203c8 <+128>: ldrb w11, [x9] 0x203cc <+132>: cmp w0, w11 0x203d0 <+136>: cset w11, ne 0x203d4 <+140>: and w11, w11, #0x1 0x203d8 <+144>: strb w11, [x30] 0x203dc <+148>: ldr x9, [x8] 0x203e0 <+152>: add x9, x9, #0x1 ; =3D0x1=20 0x203e4 <+156>: str x9, [x8] 0x203e8 <+160>: b 0x20358 ; <+16> at = swap_testing.c 0x203ec <+164>: b 0x203f0 ; <+168> at = swap_testing.c 0x203f0 <+168>: mov w8, wzr 0x203f4 <+172>: adrp x9, 51 0x203f8 <+176>: add x9, x9, #0x818 ; =3D0x818=20 0x203fc <+180>: ldrb w10, [x9] 0x20400 <+184>: str w8, [sp, #0x8] 0x20404 <+188>: tbnz w10, #0x0, 0x20424 ; <+220> at = swap_testing.c 0x20408 <+192>: orr x8, xzr, #0x3800 0x2040c <+196>: adrp x9, 51 0x20410 <+200>: add x9, x9, #0x820 ; =3D0x820=20 0x20414 <+204>: ldr x9, [x9] 0x20418 <+208>: cmp x9, x8 0x2041c <+212>: cset w10, lo 0x20420 <+216>: str w10, [sp, #0x8] 0x20424 <+220>: ldr w8, [sp, #0x8] 0x20428 <+224>: tbz w8, #0x0, 0x20488 ; <+320> at = swap_testing.c 0x2042c <+228>: adrp x8, 51 0x20430 <+232>: add x8, x8, #0x820 ; =3D0x820=20 0x20434 <+236>: ldr x0, [x8] 0x20438 <+240>: bl 0x204cc ; value at = swap_testing.c:72 0x2043c <+244>: adrp x8, 51 0x20440 <+248>: add x8, x8, #0x820 ; =3D0x820=20 0x20444 <+252>: adrp x30, 51 0x20448 <+256>: add x30, x30, #0x818 ; =3D0x818=20 0x2044c <+260>: adrp x9, 48 0x20450 <+264>: add x9, x9, #0x0 ; =3D0x0=20 0x20454 <+268>: uxtb w0, w0 0x20458 <+272>: ldr x9, [x9] 0x2045c <+276>: ldr x10, [x8] 0x20460 <+280>: add x9, x9, x10 0x20464 <+284>: ldrb w11, [x9] 0x20468 <+288>: cmp w0, w11 0x2046c <+292>: cset w11, ne 0x20470 <+296>: and w11, w11, #0x1 0x20474 <+300>: strb w11, [x30] 0x20478 <+304>: ldr x9, [x8] 0x2047c <+308>: add x9, x9, #0x1 ; =3D0x1=20 0x20480 <+312>: str x9, [x8] 0x20484 <+316>: b 0x203f0 ; <+168> at = swap_testing.c 0x20488 <+320>: adrp x8, 51 0x2048c <+324>: add x8, x8, #0x808 ; =3D0x808=20 0x20490 <+328>: ldrb w9, [x8] 0x20494 <+332>: tbz w9, #0x0, 0x204a4 ; <+348> at = swap_testing.c 0x20498 <+336>: orr w0, wzr, #0x6 0x2049c <+340>: bl 0x205c0 ; symbol stub for: = raise 0x204a0 <+344>: str w0, [sp, #0x4] 0x204a4 <+348>: adrp x8, 51 0x204a8 <+352>: add x8, x8, #0x818 ; =3D0x818=20 0x204ac <+356>: ldrb w9, [x8] 0x204b0 <+360>: tbz w9, #0x0, 0x204c0 ; <+376> at = swap_testing.c:105 0x204b4 <+364>: orr w0, wzr, #0x6 0x204b8 <+368>: bl 0x205c0 ; symbol stub for: = raise -> 0x204bc <+372>: str w0, [sp] 0x204c0 <+376>: ldp x29, x30, [sp, #0x10] 0x204c4 <+380>: add sp, sp, #0x20 ; =3D0x20=20 0x204c8 <+384>: ret =20 # uname -apKU FreeBSD pine64 12.0-CURRENT FreeBSD 12.0-CURRENT r314638M arm64 = aarch64 1200023 1200023 buildworld buildlkernel did not have MALLOC_PRODUCTION=3D defined. The = kernel is a non-debug kernel. (Previous to these experiments my other corruption = examples were not caught by a debug kernel. I'm not hopeful that this simpler = context would either.) =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-stable@freebsd.org Tue Mar 14 07:02:36 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AA95BD0B364 for ; Tue, 14 Mar 2017 07:02:36 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-211-179.reflexion.net [208.70.211.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 694A2185D for ; Tue, 14 Mar 2017 07:02:36 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 2658 invoked from network); 14 Mar 2017 07:02:34 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 14 Mar 2017 07:02:34 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v8.30.2) with SMTP; Tue, 14 Mar 2017 03:02:34 -0400 (EDT) Received: (qmail 31434 invoked from network); 14 Mar 2017 07:02:34 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 14 Mar 2017 07:02:34 -0000 Received: from [192.168.1.111] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id B63C0EC892D; Tue, 14 Mar 2017 00:02:33 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: Re: amd64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) From: Mark Millard In-Reply-To: <01735A68-FED6-4E63-964F-0820FE5C446C@dsl-only.net> Date: Tue, 14 Mar 2017 00:02:33 -0700 Cc: Andrew Turner Content-Transfer-Encoding: quoted-printable Message-Id: References: <01735A68-FED6-4E63-964F-0820FE5C446C@dsl-only.net> To: freebsd-arm , FreeBSD Current , FreeBSD-STABLE Mailing List X-Mailer: Apple Mail (2.3259) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Mar 2017 07:02:36 -0000 On 2017-Mar-13, at 11:52 PM, Mark Millard wrote: > I'm still at a loss about how to figure out what stages are messed > up. (Memory coherency? Some memory not swapped out? Bad data swapped > out? Wrong data swapped in?) >=20 > But at least I've found a much smaller/simpler example to demonstrate > some problem with in my Pine64+_ 2GB context. >=20 > The Pine64+ 2GB is the only amd64 context that I have access to. Someday I'll learn to type arm64 the first time instead of amd64. > The following program fails its check for data > having its expected byte pattern in dynamically > allocated memory after a fork/swap-out/swap-in > sequence. >=20 > I'll note that the program sleeps for 60s after > forking to give time to do something else to > cause the parent and child processes to swap > out (RES=3D0 as seen in top). >=20 > Note the source code line: >=20 > // test_check(); // Adding this line prevents failure. >=20 > It seem that accessing the region contents before forking > and swapping avoids the problem. But there is a problem > if the region was only written-to before the fork/swap. >=20 > Another point is the size of the region matters: <=3D 14K Bytes > fails and > 14K Bytes works for as much has I have tested. >=20 >=20 > # more swap_testing.c > // swap_testing.c >=20 > // Built via (c++ was clang++ 4.0 in my case): > // > // cc -g -std=3Dc11 -Wpedantic swap_testing.c > // -O0 and -O2 also gets the problem. >=20 > #include // for fork(), sleep(.) > #include // for pid_t > #include // for wait(.) >=20 > extern void test_setup(void); // Sets up the memory byte pattern. > extern void test_check(void); // Tests the memory byte pattern. >=20 > int main(void) > { > test_setup(); > // test_check(); // Adding this line prevents failure. >=20 > pid_t pid =3D fork(); > int wait_status =3D 0;; >=20 > if (0=20 > if (-1!=3Dwait_status && 0<=3Dpid) > { > if (0=3D=3Dpid) > { > sleep(60); >=20 > // During this manually force this process to > // swap out. I use something like: >=20 > // stress -m 1 --vm-bytes 1800M >=20 > // in another shell and ^C'ing it after top > // shows the swapped status desired. 1800M > // just happened to work on the Pine64+ 2GB > // that I was using. > } >=20 > test_check(); > } > } >=20 > // The memory and test code follows. >=20 > #include // for bool, true, false > #include // for size_t, NULL > #include // for malloc(.), free(.) >=20 > #include // for raise(.), SIGABRT >=20 > #define region_size (14u*1024u) > // Bad dyn_region pattern, parent and child > // processes: > // 256u, 4u*1024u, 8u*1024u, 9u*1024u, > // 12u*1024u, 14u*1024u >=20 > // Works: > // 14u*1024u+1u, 15u*1024u, 16u*1024u, > // 32u*1024u, 256u*1024u*1024u >=20 > typedef volatile unsigned char value_type; >=20 > struct region_struct { value_type array[region_size]; }; > typedef struct region_struct region; >=20 > static region gbl_region; > static region * volatile dyn_region =3D NULL; >=20 > static value_type value(size_t v) { return (value_type)v; } >=20 > void test_setup(void) { > dyn_region =3D malloc(sizeof(region)); > if (!dyn_region) raise(SIGABRT); >=20 > for(size_t i=3D0u; i (*dyn_region).array[i] =3D gbl_region.array[i] =3D value(i); > } > } >=20 > static volatile bool gbl_failed =3D false; // Until potentially = disproved > static volatile size_t gbl_pos =3D 0u; >=20 > static volatile bool dyn_failed =3D false; // Until potentially = disproved > static volatile size_t dyn_pos =3D 0u; >=20 > void test_check(void) { > while (!gbl_failed && gbl_pos gbl_failed =3D (value(gbl_pos) !=3D gbl_region.array[gbl_pos]); > gbl_pos++; > } >=20 > while (!dyn_failed && dyn_pos dyn_failed =3D (value(dyn_pos) !=3D = (*dyn_region).array[dyn_pos]); > // Note: When the memory pattern fails this case is that > // records the failure. > dyn_pos++; > } >=20 > if (gbl_failed) raise(SIGABRT); > if (dyn_failed) raise(SIGABRT); // lldb reports this line for the = __raise call. > // when it fails (both parent and = child processes). > } >=20 >=20 > Other details from lldb (not using -O2 so things are > simpler, not presented in the order examined): >=20 > # lldb a.out -c /var/crash/a.out.11575.core > (lldb) target create "a.out" --core "/var/crash/a.out.11575.core" > Core file '/var/crash/a.out.11575.core' (aarch64) was loaded. > (lldb) bt > * thread #1, name =3D 'a.out', stop reason =3D signal SIGABRT > * frame #0: 0x0000000040113d38 libc.so.7`_thr_kill + 8 > frame #1: libc.so.7`__raise(s=3D6) at raise.c:52 > frame #2: a.out`test_check at swap_testing.c:103 > frame #3: a.out`main at swap_testing.c:42 > frame #4: 0x0000000000020184 a.out`__start + 364 > frame #5: ld-elf.so.1`.rtld_start at rtld_start.S:41 >=20 > (lldb) up 2 > frame #2: a.out`test_check at swap_testing.c:103 > 100 } > 101 =09 > 102 if (gbl_failed) raise(SIGABRT); > -> 103 if (dyn_failed) raise(SIGABRT); // lldb reports this = line for the __raise call. > 104 // when it fails = (both parent and child processes). > 105 } >=20 > (lldb) print dyn_pos > (size_t) $0 =3D 2 >=20 > (That is one after the failure position.) >=20 >=20 > (lldb) print dyn_region > (region *volatile) $3 =3D 0x0000000040616000 >=20 > (lldb) print *dyn_region > (region) $1 =3D { > array =3D { > [0] =3D '\0' > [1] =3D '\0' > [2] =3D '\0' > . . . (all '\0' bytes) . . . > [251] =3D '\0' > [252] =3D '\0' > [253] =3D '\0' > [254] =3D '\0' > [255] =3D '\0' > ... > } > } >=20 > (lldb) print gbl_region > (region) $2 =3D { > array =3D { > [0] =3D '\0' > [1] =3D '\x01' > [2] =3D '\x02' > . . . > [251] =3D '\xfb' > [252] =3D '\xfc' > [253] =3D '\xfd' > [254] =3D '\xfe' > [255] =3D '\xff' > ... > } > } >=20 > (lldb) disass -n main > a.out`main: > 0x2022c <+0>: sub sp, sp, #0x30 ; =3D0x30=20 > 0x20230 <+4>: stp x29, x30, [sp, #0x20] > 0x20234 <+8>: add x29, sp, #0x20 ; =3D0x20=20 > 0x20238 <+12>: stur wzr, [x29, #-0x4] > 0x2023c <+16>: bl 0x202b0 ; test_setup at = swap_testing.c:74 > 0x20240 <+20>: bl 0x20580 ; symbol stub for: = fork > 0x20244 <+24>: mov w8, wzr > 0x20248 <+28>: stur w0, [x29, #-0x8] > 0x2024c <+32>: stur wzr, [x29, #-0xc] > 0x20250 <+36>: ldur w0, [x29, #-0x8] > 0x20254 <+40>: cmp w8, w0 > 0x20258 <+44>: b.ge 0x20268 ; <+60> at = swap_testing.c > 0x2025c <+48>: sub x0, x29, #0xc ; =3D0xc=20 > 0x20260 <+52>: bl 0x20590 ; symbol stub for: = wait > 0x20264 <+56>: str w0, [sp, #0x10] > 0x20268 <+60>: mov w8, #-0x1 > 0x2026c <+64>: ldur w9, [x29, #-0xc] > 0x20270 <+68>: cmp w8, w9 > 0x20274 <+72>: b.eq 0x202a0 ; <+116> at = swap_testing.c:44 > 0x20278 <+76>: mov w8, wzr > 0x2027c <+80>: ldur w9, [x29, #-0x8] > 0x20280 <+84>: cmp w8, w9 > 0x20284 <+88>: b.gt 0x202a0 ; <+116> at = swap_testing.c:44 > 0x20288 <+92>: ldur w8, [x29, #-0x8] > 0x2028c <+96>: cbnz w8, 0x2029c ; <+112> at = swap_testing.c:42 > 0x20290 <+100>: orr w0, wzr, #0x3c > 0x20294 <+104>: bl 0x205a0 ; symbol stub for: = sleep > 0x20298 <+108>: str w0, [sp, #0xc] > 0x2029c <+112>: bl 0x20348 ; test_check at = swap_testing.c:89 > 0x202a0 <+116>: ldur w0, [x29, #-0x4] > 0x202a4 <+120>: ldp x29, x30, [sp, #0x20] > 0x202a8 <+124>: add sp, sp, #0x30 ; =3D0x30=20 > 0x202ac <+128>: ret =20 >=20 > (lldb) disass -n value > a.out`value: > 0x204cc <+0>: sub sp, sp, #0x10 ; =3D0x10=20 > 0x204d0 <+4>: str x0, [sp, #0x8] > 0x204d4 <+8>: ldrb w8, [sp, #0x8] > 0x204d8 <+12>: mov w1, w8 > 0x204dc <+16>: mov w0, w8 > 0x204e0 <+20>: str w1, [sp, #0x4] > 0x204e4 <+24>: add sp, sp, #0x10 ; =3D0x10=20 > 0x204e8 <+28>: ret =20 >=20 > (lldb) disass -n test_setup > a.out`test_setup: > 0x202b0 <+0>: sub sp, sp, #0x20 ; =3D0x20=20 > 0x202b4 <+4>: stp x29, x30, [sp, #0x10] > 0x202b8 <+8>: add x29, sp, #0x10 ; =3D0x10=20 > 0x202bc <+12>: orr x0, xzr, #0x3800 > 0x202c0 <+16>: bl 0x205b0 ; symbol stub for: = malloc > 0x202c4 <+20>: adrp x30, 48 > 0x202c8 <+24>: add x30, x30, #0x0 ; =3D0x0=20 > 0x202cc <+28>: str x0, [x30] > 0x202d0 <+32>: ldr x0, [x30] > 0x202d4 <+36>: cbnz x0, 0x202e4 ; <+52> at = swap_testing.c:78 > 0x202d8 <+40>: orr w0, wzr, #0x6 > 0x202dc <+44>: bl 0x205c0 ; symbol stub for: = raise > 0x202e0 <+48>: str w0, [sp, #0x4] > 0x202e4 <+52>: str xzr, [sp, #0x8] > 0x202e8 <+56>: orr x8, xzr, #0x3800 > 0x202ec <+60>: ldr x9, [sp, #0x8] > 0x202f0 <+64>: cmp x9, x8 > 0x202f4 <+68>: b.hs 0x2033c ; <+140> at = swap_testing.c:81 > 0x202f8 <+72>: ldr x0, [sp, #0x8] > 0x202fc <+76>: bl 0x204cc ; value at = swap_testing.c:72 > 0x20300 <+80>: adrp x30, 48 > 0x20304 <+84>: add x30, x30, #0x0 ; =3D0x0=20 > 0x20308 <+88>: adrp x8, 48 > 0x2030c <+92>: add x8, x8, #0x8 ; =3D0x8=20 > 0x20310 <+96>: ldr x9, [sp, #0x8] > 0x20314 <+100>: add x8, x8, x9 > 0x20318 <+104>: strb w0, [x8] > 0x2031c <+108>: ldr x8, [x30] > 0x20320 <+112>: ldr x9, [sp, #0x8] > 0x20324 <+116>: add x8, x8, x9 > 0x20328 <+120>: strb w0, [x8] > 0x2032c <+124>: ldr x8, [sp, #0x8] > 0x20330 <+128>: add x8, x8, #0x1 ; =3D0x1=20 > 0x20334 <+132>: str x8, [sp, #0x8] > 0x20338 <+136>: b 0x202e8 ; <+56> at = swap_testing.c > 0x2033c <+140>: ldp x29, x30, [sp, #0x10] > 0x20340 <+144>: add sp, sp, #0x20 ; =3D0x20=20 > 0x20344 <+148>: ret =20 >=20 > (lldb) disass -n test_check > a.out`test_check: > 0x20348 <+0>: sub sp, sp, #0x20 ; =3D0x20=20 > 0x2034c <+4>: stp x29, x30, [sp, #0x10] > 0x20350 <+8>: add x29, sp, #0x10 ; =3D0x10=20 > 0x20354 <+12>: b 0x20358 ; <+16> at = swap_testing.c > 0x20358 <+16>: mov w8, wzr > 0x2035c <+20>: adrp x9, 51 > 0x20360 <+24>: add x9, x9, #0x808 ; =3D0x808=20 > 0x20364 <+28>: ldrb w10, [x9] > 0x20368 <+32>: stur w8, [x29, #-0x4] > 0x2036c <+36>: tbnz w10, #0x0, 0x2038c ; <+68> at = swap_testing.c > 0x20370 <+40>: orr x8, xzr, #0x3800 > 0x20374 <+44>: adrp x9, 51 > 0x20378 <+48>: add x9, x9, #0x810 ; =3D0x810=20 > 0x2037c <+52>: ldr x9, [x9] > 0x20380 <+56>: cmp x9, x8 > 0x20384 <+60>: cset w10, lo > 0x20388 <+64>: stur w10, [x29, #-0x4] > 0x2038c <+68>: ldur w8, [x29, #-0x4] > 0x20390 <+72>: tbz w8, #0x0, 0x203ec ; <+164> at = swap_testing.c:95 > 0x20394 <+76>: adrp x8, 51 > 0x20398 <+80>: add x8, x8, #0x810 ; =3D0x810=20 > 0x2039c <+84>: ldr x0, [x8] > 0x203a0 <+88>: bl 0x204cc ; value at = swap_testing.c:72 > 0x203a4 <+92>: adrp x8, 51 > 0x203a8 <+96>: add x8, x8, #0x810 ; =3D0x810=20 > 0x203ac <+100>: adrp x30, 51 > 0x203b0 <+104>: add x30, x30, #0x808 ; =3D0x808=20 > 0x203b4 <+108>: adrp x9, 48 > 0x203b8 <+112>: add x9, x9, #0x8 ; =3D0x8=20 > 0x203bc <+116>: uxtb w0, w0 > 0x203c0 <+120>: ldr x10, [x8] > 0x203c4 <+124>: add x9, x9, x10 > 0x203c8 <+128>: ldrb w11, [x9] > 0x203cc <+132>: cmp w0, w11 > 0x203d0 <+136>: cset w11, ne > 0x203d4 <+140>: and w11, w11, #0x1 > 0x203d8 <+144>: strb w11, [x30] > 0x203dc <+148>: ldr x9, [x8] > 0x203e0 <+152>: add x9, x9, #0x1 ; =3D0x1=20 > 0x203e4 <+156>: str x9, [x8] > 0x203e8 <+160>: b 0x20358 ; <+16> at = swap_testing.c > 0x203ec <+164>: b 0x203f0 ; <+168> at = swap_testing.c > 0x203f0 <+168>: mov w8, wzr > 0x203f4 <+172>: adrp x9, 51 > 0x203f8 <+176>: add x9, x9, #0x818 ; =3D0x818=20 > 0x203fc <+180>: ldrb w10, [x9] > 0x20400 <+184>: str w8, [sp, #0x8] > 0x20404 <+188>: tbnz w10, #0x0, 0x20424 ; <+220> at = swap_testing.c > 0x20408 <+192>: orr x8, xzr, #0x3800 > 0x2040c <+196>: adrp x9, 51 > 0x20410 <+200>: add x9, x9, #0x820 ; =3D0x820=20 > 0x20414 <+204>: ldr x9, [x9] > 0x20418 <+208>: cmp x9, x8 > 0x2041c <+212>: cset w10, lo > 0x20420 <+216>: str w10, [sp, #0x8] > 0x20424 <+220>: ldr w8, [sp, #0x8] > 0x20428 <+224>: tbz w8, #0x0, 0x20488 ; <+320> at = swap_testing.c > 0x2042c <+228>: adrp x8, 51 > 0x20430 <+232>: add x8, x8, #0x820 ; =3D0x820=20 > 0x20434 <+236>: ldr x0, [x8] > 0x20438 <+240>: bl 0x204cc ; value at = swap_testing.c:72 > 0x2043c <+244>: adrp x8, 51 > 0x20440 <+248>: add x8, x8, #0x820 ; =3D0x820=20 > 0x20444 <+252>: adrp x30, 51 > 0x20448 <+256>: add x30, x30, #0x818 ; =3D0x818=20 > 0x2044c <+260>: adrp x9, 48 > 0x20450 <+264>: add x9, x9, #0x0 ; =3D0x0=20 > 0x20454 <+268>: uxtb w0, w0 > 0x20458 <+272>: ldr x9, [x9] > 0x2045c <+276>: ldr x10, [x8] > 0x20460 <+280>: add x9, x9, x10 > 0x20464 <+284>: ldrb w11, [x9] > 0x20468 <+288>: cmp w0, w11 > 0x2046c <+292>: cset w11, ne > 0x20470 <+296>: and w11, w11, #0x1 > 0x20474 <+300>: strb w11, [x30] > 0x20478 <+304>: ldr x9, [x8] > 0x2047c <+308>: add x9, x9, #0x1 ; =3D0x1=20 > 0x20480 <+312>: str x9, [x8] > 0x20484 <+316>: b 0x203f0 ; <+168> at = swap_testing.c > 0x20488 <+320>: adrp x8, 51 > 0x2048c <+324>: add x8, x8, #0x808 ; =3D0x808=20 > 0x20490 <+328>: ldrb w9, [x8] > 0x20494 <+332>: tbz w9, #0x0, 0x204a4 ; <+348> at = swap_testing.c > 0x20498 <+336>: orr w0, wzr, #0x6 > 0x2049c <+340>: bl 0x205c0 ; symbol stub for: = raise > 0x204a0 <+344>: str w0, [sp, #0x4] > 0x204a4 <+348>: adrp x8, 51 > 0x204a8 <+352>: add x8, x8, #0x818 ; =3D0x818=20 > 0x204ac <+356>: ldrb w9, [x8] > 0x204b0 <+360>: tbz w9, #0x0, 0x204c0 ; <+376> at = swap_testing.c:105 > 0x204b4 <+364>: orr w0, wzr, #0x6 > 0x204b8 <+368>: bl 0x205c0 ; symbol stub for: = raise > -> 0x204bc <+372>: str w0, [sp] > 0x204c0 <+376>: ldp x29, x30, [sp, #0x10] > 0x204c4 <+380>: add sp, sp, #0x20 ; =3D0x20=20 > 0x204c8 <+384>: ret =20 >=20 > # uname -apKU > FreeBSD pine64 12.0-CURRENT FreeBSD 12.0-CURRENT r314638M arm64 = aarch64 1200023 1200023 >=20 > buildworld buildlkernel did not have MALLOC_PRODUCTION=3D defined. The = kernel is a > non-debug kernel. (Previous to these experiments my other corruption = examples > were not caught by a debug kernel. I'm not hopeful that this simpler = context > would either.) =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-stable@freebsd.org Tue Mar 14 08:04:58 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 76608D0BE77 for ; Tue, 14 Mar 2017 08:04:58 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-211-180.reflexion.net [208.70.211.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3B89110E5 for ; Tue, 14 Mar 2017 08:04:58 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 11682 invoked from network); 14 Mar 2017 07:59:06 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 14 Mar 2017 07:59:06 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v8.30.2) with SMTP; Tue, 14 Mar 2017 03:58:17 -0400 (EDT) Received: (qmail 2793 invoked from network); 14 Mar 2017 07:58:17 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 14 Mar 2017 07:58:17 -0000 Received: from [192.168.1.111] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id 72316EC8123; Tue, 14 Mar 2017 00:58:16 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: Re: amd64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) From: Mark Millard In-Reply-To: Date: Tue, 14 Mar 2017 00:58:15 -0700 Cc: Andrew Turner Content-Transfer-Encoding: quoted-printable Message-Id: <16B3D614-62E1-4E58-B409-8DB9DBB35BCB@dsl-only.net> References: <01735A68-FED6-4E63-964F-0820FE5C446C@dsl-only.net> To: freebsd-arm , FreeBSD Current , FreeBSD-STABLE Mailing List X-Mailer: Apple Mail (2.3259) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Mar 2017 08:04:58 -0000 [Another correction I'm afraid --about alternative program variations this time.] On 2017-Mar-13, at 11:52 PM, Mark Millard wrote: > I'm still at a loss about how to figure out what stages are messed > up. (Memory coherency? Some memory not swapped out? Bad data swapped > out? Wrong data swapped in?) >=20 > But at least I've found a much smaller/simpler example to demonstrate > some problem with in my Pine64+_ 2GB context. >=20 > The Pine64+ 2GB is the only amd64 context that I have access to. Someday I'll learn to type arm64 the first time instead of amd64. > The following program fails its check for data > having its expected byte pattern in dynamically > allocated memory after a fork/swap-out/swap-in > sequence. >=20 > I'll note that the program sleeps for 60s after > forking to give time to do something else to > cause the parent and child processes to swap > out (RES=3D0 as seen in top). The following about the extra test_check() was wrong. > Note the source code line: >=20 > // test_check(); // Adding this line prevents failure. >=20 > It seem that accessing the region contents before forking > and swapping avoids the problem. But there is a problem > if the region was only written-to before the fork/swap. This was because I'd carelessly moved some loop variables to globals in a way that depended on the initialization of the globals and the extra call changed those values. I've noted code adjustments below (3 lines). I get the failures with them as well. > Another point is the size of the region matters: <=3D 14K Bytes > fails and > 14K Bytes works for as much has I have tested. >=20 >=20 > # more swap_testing.c > // swap_testing.c >=20 > // Built via (c++ was clang++ 4.0 in my case): > // > // cc -g -std=3Dc11 -Wpedantic swap_testing.c > // -O0 and -O2 also gets the problem. >=20 > #include // for fork(), sleep(.) > #include // for pid_t > #include // for wait(.) >=20 > extern void test_setup(void); // Sets up the memory byte pattern. > extern void test_check(void); // Tests the memory byte pattern. >=20 > int main(void) > { > test_setup(); test_check(); // This test passes. >=20 > pid_t pid =3D fork(); > int wait_status =3D 0;; >=20 > if (0=20 > if (-1!=3Dwait_status && 0<=3Dpid) > { > if (0=3D=3Dpid) > { > sleep(60); >=20 > // During this manually force this process to > // swap out. I use something like: >=20 > // stress -m 1 --vm-bytes 1800M >=20 > // in another shell and ^C'ing it after top > // shows the swapped status desired. 1800M > // just happened to work on the Pine64+ 2GB > // that I was using. > } >=20 > test_check(); > } > } >=20 > // The memory and test code follows. >=20 > #include // for bool, true, false > #include // for size_t, NULL > #include // for malloc(.), free(.) >=20 > #include // for raise(.), SIGABRT >=20 > #define region_size (14u*1024u) > // Bad dyn_region pattern, parent and child > // processes: > // 256u, 4u*1024u, 8u*1024u, 9u*1024u, > // 12u*1024u, 14u*1024u >=20 > // Works: > // 14u*1024u+1u, 15u*1024u, 16u*1024u, > // 32u*1024u, 256u*1024u*1024u >=20 > typedef volatile unsigned char value_type; >=20 > struct region_struct { value_type array[region_size]; }; > typedef struct region_struct region; >=20 > static region gbl_region; > static region * volatile dyn_region =3D NULL; >=20 > static value_type value(size_t v) { return (value_type)v; } >=20 > void test_setup(void) { > dyn_region =3D malloc(sizeof(region)); > if (!dyn_region) raise(SIGABRT); >=20 > for(size_t i=3D0u; i (*dyn_region).array[i] =3D gbl_region.array[i] =3D value(i); > } > } >=20 > static volatile bool gbl_failed =3D false; // Until potentially = disproved > static volatile size_t gbl_pos =3D 0u; >=20 > static volatile bool dyn_failed =3D false; // Until potentially = disproved > static volatile size_t dyn_pos =3D 0u; >=20 > void test_check(void) { gbl_pos =3D 0u; > while (!gbl_failed && gbl_pos gbl_failed =3D (value(gbl_pos) !=3D gbl_region.array[gbl_pos]); > gbl_pos++; > } >=20 dyn_pos =3D 0u; > while (!dyn_failed && dyn_pos dyn_failed =3D (value(dyn_pos) !=3D = (*dyn_region).array[dyn_pos]); > // Note: When the memory pattern fails this case is that > // records the failure. > dyn_pos++; > } >=20 > if (gbl_failed) raise(SIGABRT); > if (dyn_failed) raise(SIGABRT); // lldb reports this line for the = __raise call. > // when it fails (both parent and = child processes). > } I'm not bothering to redo the details below for the line number variations. > Other details from lldb (not using -O2 so things are > simpler, not presented in the order examined): >=20 > # lldb a.out -c /var/crash/a.out.11575.core > (lldb) target create "a.out" --core "/var/crash/a.out.11575.core" > Core file '/var/crash/a.out.11575.core' (aarch64) was loaded. > (lldb) bt > * thread #1, name =3D 'a.out', stop reason =3D signal SIGABRT > * frame #0: 0x0000000040113d38 libc.so.7`_thr_kill + 8 > frame #1: libc.so.7`__raise(s=3D6) at raise.c:52 > frame #2: a.out`test_check at swap_testing.c:103 > frame #3: a.out`main at swap_testing.c:42 > frame #4: 0x0000000000020184 a.out`__start + 364 > frame #5: ld-elf.so.1`.rtld_start at rtld_start.S:41 >=20 > (lldb) up 2 > frame #2: a.out`test_check at swap_testing.c:103 > 100 } > 101 =09 > 102 if (gbl_failed) raise(SIGABRT); > -> 103 if (dyn_failed) raise(SIGABRT); // lldb reports this = line for the __raise call. > 104 // when it fails (both = parent and child processes). > 105 } >=20 > (lldb) print dyn_pos > (size_t) $0 =3D 2 >=20 > (That is one after the failure position.) >=20 >=20 > (lldb) print dyn_region > (region *volatile) $3 =3D 0x0000000040616000 >=20 > (lldb) print *dyn_region > (region) $1 =3D { > array =3D { > [0] =3D '\0' > [1] =3D '\0' > [2] =3D '\0' > . . . (all '\0' bytes) . . . > [251] =3D '\0' > [252] =3D '\0' > [253] =3D '\0' > [254] =3D '\0' > [255] =3D '\0' > ... > } > } >=20 > (lldb) print gbl_region > (region) $2 =3D { > array =3D { > [0] =3D '\0' > [1] =3D '\x01' > [2] =3D '\x02' > . . . > [251] =3D '\xfb' > [252] =3D '\xfc' > [253] =3D '\xfd' > [254] =3D '\xfe' > [255] =3D '\xff' > ... > } > } >=20 > (lldb) disass -n main > a.out`main: > 0x2022c <+0>: sub sp, sp, #0x30 ; =3D0x30=20 > 0x20230 <+4>: stp x29, x30, [sp, #0x20] > 0x20234 <+8>: add x29, sp, #0x20 ; =3D0x20=20 > 0x20238 <+12>: stur wzr, [x29, #-0x4] > 0x2023c <+16>: bl 0x202b0 ; test_setup at = swap_testing.c:74 > 0x20240 <+20>: bl 0x20580 ; symbol stub for: = fork > 0x20244 <+24>: mov w8, wzr > 0x20248 <+28>: stur w0, [x29, #-0x8] > 0x2024c <+32>: stur wzr, [x29, #-0xc] > 0x20250 <+36>: ldur w0, [x29, #-0x8] > 0x20254 <+40>: cmp w8, w0 > 0x20258 <+44>: b.ge 0x20268 ; <+60> at = swap_testing.c > 0x2025c <+48>: sub x0, x29, #0xc ; =3D0xc=20 > 0x20260 <+52>: bl 0x20590 ; symbol stub for: = wait > 0x20264 <+56>: str w0, [sp, #0x10] > 0x20268 <+60>: mov w8, #-0x1 > 0x2026c <+64>: ldur w9, [x29, #-0xc] > 0x20270 <+68>: cmp w8, w9 > 0x20274 <+72>: b.eq 0x202a0 ; <+116> at = swap_testing.c:44 > 0x20278 <+76>: mov w8, wzr > 0x2027c <+80>: ldur w9, [x29, #-0x8] > 0x20280 <+84>: cmp w8, w9 > 0x20284 <+88>: b.gt 0x202a0 ; <+116> at = swap_testing.c:44 > 0x20288 <+92>: ldur w8, [x29, #-0x8] > 0x2028c <+96>: cbnz w8, 0x2029c ; <+112> at = swap_testing.c:42 > 0x20290 <+100>: orr w0, wzr, #0x3c > 0x20294 <+104>: bl 0x205a0 ; symbol stub for: = sleep > 0x20298 <+108>: str w0, [sp, #0xc] > 0x2029c <+112>: bl 0x20348 ; test_check at = swap_testing.c:89 > 0x202a0 <+116>: ldur w0, [x29, #-0x4] > 0x202a4 <+120>: ldp x29, x30, [sp, #0x20] > 0x202a8 <+124>: add sp, sp, #0x30 ; =3D0x30=20 > 0x202ac <+128>: ret =20 >=20 > (lldb) disass -n value > a.out`value: > 0x204cc <+0>: sub sp, sp, #0x10 ; =3D0x10=20 > 0x204d0 <+4>: str x0, [sp, #0x8] > 0x204d4 <+8>: ldrb w8, [sp, #0x8] > 0x204d8 <+12>: mov w1, w8 > 0x204dc <+16>: mov w0, w8 > 0x204e0 <+20>: str w1, [sp, #0x4] > 0x204e4 <+24>: add sp, sp, #0x10 ; =3D0x10=20 > 0x204e8 <+28>: ret =20 >=20 > (lldb) disass -n test_setup > a.out`test_setup: > 0x202b0 <+0>: sub sp, sp, #0x20 ; =3D0x20=20 > 0x202b4 <+4>: stp x29, x30, [sp, #0x10] > 0x202b8 <+8>: add x29, sp, #0x10 ; =3D0x10=20 > 0x202bc <+12>: orr x0, xzr, #0x3800 > 0x202c0 <+16>: bl 0x205b0 ; symbol stub for: = malloc > 0x202c4 <+20>: adrp x30, 48 > 0x202c8 <+24>: add x30, x30, #0x0 ; =3D0x0=20 > 0x202cc <+28>: str x0, [x30] > 0x202d0 <+32>: ldr x0, [x30] > 0x202d4 <+36>: cbnz x0, 0x202e4 ; <+52> at = swap_testing.c:78 > 0x202d8 <+40>: orr w0, wzr, #0x6 > 0x202dc <+44>: bl 0x205c0 ; symbol stub for: = raise > 0x202e0 <+48>: str w0, [sp, #0x4] > 0x202e4 <+52>: str xzr, [sp, #0x8] > 0x202e8 <+56>: orr x8, xzr, #0x3800 > 0x202ec <+60>: ldr x9, [sp, #0x8] > 0x202f0 <+64>: cmp x9, x8 > 0x202f4 <+68>: b.hs 0x2033c ; <+140> at = swap_testing.c:81 > 0x202f8 <+72>: ldr x0, [sp, #0x8] > 0x202fc <+76>: bl 0x204cc ; value at = swap_testing.c:72 > 0x20300 <+80>: adrp x30, 48 > 0x20304 <+84>: add x30, x30, #0x0 ; =3D0x0=20 > 0x20308 <+88>: adrp x8, 48 > 0x2030c <+92>: add x8, x8, #0x8 ; =3D0x8=20 > 0x20310 <+96>: ldr x9, [sp, #0x8] > 0x20314 <+100>: add x8, x8, x9 > 0x20318 <+104>: strb w0, [x8] > 0x2031c <+108>: ldr x8, [x30] > 0x20320 <+112>: ldr x9, [sp, #0x8] > 0x20324 <+116>: add x8, x8, x9 > 0x20328 <+120>: strb w0, [x8] > 0x2032c <+124>: ldr x8, [sp, #0x8] > 0x20330 <+128>: add x8, x8, #0x1 ; =3D0x1=20 > 0x20334 <+132>: str x8, [sp, #0x8] > 0x20338 <+136>: b 0x202e8 ; <+56> at = swap_testing.c > 0x2033c <+140>: ldp x29, x30, [sp, #0x10] > 0x20340 <+144>: add sp, sp, #0x20 ; =3D0x20=20 > 0x20344 <+148>: ret =20 >=20 > (lldb) disass -n test_check > a.out`test_check: > 0x20348 <+0>: sub sp, sp, #0x20 ; =3D0x20=20 > 0x2034c <+4>: stp x29, x30, [sp, #0x10] > 0x20350 <+8>: add x29, sp, #0x10 ; =3D0x10=20 > 0x20354 <+12>: b 0x20358 ; <+16> at = swap_testing.c > 0x20358 <+16>: mov w8, wzr > 0x2035c <+20>: adrp x9, 51 > 0x20360 <+24>: add x9, x9, #0x808 ; =3D0x808=20 > 0x20364 <+28>: ldrb w10, [x9] > 0x20368 <+32>: stur w8, [x29, #-0x4] > 0x2036c <+36>: tbnz w10, #0x0, 0x2038c ; <+68> at = swap_testing.c > 0x20370 <+40>: orr x8, xzr, #0x3800 > 0x20374 <+44>: adrp x9, 51 > 0x20378 <+48>: add x9, x9, #0x810 ; =3D0x810=20 > 0x2037c <+52>: ldr x9, [x9] > 0x20380 <+56>: cmp x9, x8 > 0x20384 <+60>: cset w10, lo > 0x20388 <+64>: stur w10, [x29, #-0x4] > 0x2038c <+68>: ldur w8, [x29, #-0x4] > 0x20390 <+72>: tbz w8, #0x0, 0x203ec ; <+164> at = swap_testing.c:95 > 0x20394 <+76>: adrp x8, 51 > 0x20398 <+80>: add x8, x8, #0x810 ; =3D0x810=20 > 0x2039c <+84>: ldr x0, [x8] > 0x203a0 <+88>: bl 0x204cc ; value at = swap_testing.c:72 > 0x203a4 <+92>: adrp x8, 51 > 0x203a8 <+96>: add x8, x8, #0x810 ; =3D0x810=20 > 0x203ac <+100>: adrp x30, 51 > 0x203b0 <+104>: add x30, x30, #0x808 ; =3D0x808=20 > 0x203b4 <+108>: adrp x9, 48 > 0x203b8 <+112>: add x9, x9, #0x8 ; =3D0x8=20 > 0x203bc <+116>: uxtb w0, w0 > 0x203c0 <+120>: ldr x10, [x8] > 0x203c4 <+124>: add x9, x9, x10 > 0x203c8 <+128>: ldrb w11, [x9] > 0x203cc <+132>: cmp w0, w11 > 0x203d0 <+136>: cset w11, ne > 0x203d4 <+140>: and w11, w11, #0x1 > 0x203d8 <+144>: strb w11, [x30] > 0x203dc <+148>: ldr x9, [x8] > 0x203e0 <+152>: add x9, x9, #0x1 ; =3D0x1=20 > 0x203e4 <+156>: str x9, [x8] > 0x203e8 <+160>: b 0x20358 ; <+16> at = swap_testing.c > 0x203ec <+164>: b 0x203f0 ; <+168> at = swap_testing.c > 0x203f0 <+168>: mov w8, wzr > 0x203f4 <+172>: adrp x9, 51 > 0x203f8 <+176>: add x9, x9, #0x818 ; =3D0x818=20 > 0x203fc <+180>: ldrb w10, [x9] > 0x20400 <+184>: str w8, [sp, #0x8] > 0x20404 <+188>: tbnz w10, #0x0, 0x20424 ; <+220> at = swap_testing.c > 0x20408 <+192>: orr x8, xzr, #0x3800 > 0x2040c <+196>: adrp x9, 51 > 0x20410 <+200>: add x9, x9, #0x820 ; =3D0x820=20 > 0x20414 <+204>: ldr x9, [x9] > 0x20418 <+208>: cmp x9, x8 > 0x2041c <+212>: cset w10, lo > 0x20420 <+216>: str w10, [sp, #0x8] > 0x20424 <+220>: ldr w8, [sp, #0x8] > 0x20428 <+224>: tbz w8, #0x0, 0x20488 ; <+320> at = swap_testing.c > 0x2042c <+228>: adrp x8, 51 > 0x20430 <+232>: add x8, x8, #0x820 ; =3D0x820=20 > 0x20434 <+236>: ldr x0, [x8] > 0x20438 <+240>: bl 0x204cc ; value at = swap_testing.c:72 > 0x2043c <+244>: adrp x8, 51 > 0x20440 <+248>: add x8, x8, #0x820 ; =3D0x820=20 > 0x20444 <+252>: adrp x30, 51 > 0x20448 <+256>: add x30, x30, #0x818 ; =3D0x818=20 > 0x2044c <+260>: adrp x9, 48 > 0x20450 <+264>: add x9, x9, #0x0 ; =3D0x0=20 > 0x20454 <+268>: uxtb w0, w0 > 0x20458 <+272>: ldr x9, [x9] > 0x2045c <+276>: ldr x10, [x8] > 0x20460 <+280>: add x9, x9, x10 > 0x20464 <+284>: ldrb w11, [x9] > 0x20468 <+288>: cmp w0, w11 > 0x2046c <+292>: cset w11, ne > 0x20470 <+296>: and w11, w11, #0x1 > 0x20474 <+300>: strb w11, [x30] > 0x20478 <+304>: ldr x9, [x8] > 0x2047c <+308>: add x9, x9, #0x1 ; =3D0x1=20 > 0x20480 <+312>: str x9, [x8] > 0x20484 <+316>: b 0x203f0 ; <+168> at = swap_testing.c > 0x20488 <+320>: adrp x8, 51 > 0x2048c <+324>: add x8, x8, #0x808 ; =3D0x808=20 > 0x20490 <+328>: ldrb w9, [x8] > 0x20494 <+332>: tbz w9, #0x0, 0x204a4 ; <+348> at = swap_testing.c > 0x20498 <+336>: orr w0, wzr, #0x6 > 0x2049c <+340>: bl 0x205c0 ; symbol stub for: = raise > 0x204a0 <+344>: str w0, [sp, #0x4] > 0x204a4 <+348>: adrp x8, 51 > 0x204a8 <+352>: add x8, x8, #0x818 ; =3D0x818=20 > 0x204ac <+356>: ldrb w9, [x8] > 0x204b0 <+360>: tbz w9, #0x0, 0x204c0 ; <+376> at = swap_testing.c:105 > 0x204b4 <+364>: orr w0, wzr, #0x6 > 0x204b8 <+368>: bl 0x205c0 ; symbol stub for: = raise > -> 0x204bc <+372>: str w0, [sp] > 0x204c0 <+376>: ldp x29, x30, [sp, #0x10] > 0x204c4 <+380>: add sp, sp, #0x20 ; =3D0x20=20 > 0x204c8 <+384>: ret =20 >=20 > # uname -apKU > FreeBSD pine64 12.0-CURRENT FreeBSD 12.0-CURRENT r314638M arm64 = aarch64 1200023 1200023 >=20 > buildworld buildlkernel did not have MALLOC_PRODUCTION=3D defined. The = kernel is a > non-debug kernel. (Previous to these experiments my other corruption = examples > were not caught by a debug kernel. I'm not hopeful that this simpler = context > would either.) =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-stable@freebsd.org Tue Mar 14 09:43:28 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 75C68D0C283 for ; Tue, 14 Mar 2017 09:43:28 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 6340FAB0 for ; Tue, 14 Mar 2017 09:43:28 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: by mailman.ysv.freebsd.org (Postfix) id 62AF9D0C282; Tue, 14 Mar 2017 09:43:28 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 625DAD0C281 for ; Tue, 14 Mar 2017 09:43:28 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from constantine.ingresso.co.uk (unknown [IPv6:2a02:b90:3002:411::3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 335C5AAF; Tue, 14 Mar 2017 09:43:28 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from dilbert.london-internal.ingresso.co.uk ([10.64.50.6] helo=dilbert.ingresso.co.uk) by constantine.ingresso.co.uk with esmtps (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.88 (FreeBSD)) (envelope-from ) id 1cnizF-0005Yz-5e; Tue, 14 Mar 2017 09:43:25 +0000 Received: from petefrench by dilbert.ingresso.co.uk with local (Exim 4.88 (FreeBSD)) (envelope-from ) id 1cnizF-00032Z-45; Tue, 14 Mar 2017 09:43:25 +0000 To: petefrench@ingresso.co.uk, trasz@FreeBSD.org Subject: Re: moutnroot failing on zpools in Azure after upgrade from 10 to 11 due to lack of waiting for da0 Cc: stable@freebsd.org In-Reply-To: <20170313190728.GA2967@brick> Message-Id: From: Pete French Date: Tue, 14 Mar 2017 09:43:25 +0000 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Mar 2017 09:43:28 -0000 > Are you sure the above transcript is right? There are three reasons > I'm asking. First, you'll see the "Root mount waiting" message, > which means the root mount code is, well, waiting for storvsc, exactly > as expected. Second - there is no "Trying to mount root". But most > of all - for some reason the "Mounting failed" is shown _before_ the > "Root mount waiting", and I have no idea how this could ever happen. OK, that's interesting, and kind of worrying! I belive, it's correct - I have put the full trascript up here for you so you can see all of it: https://www.twisted.org.uk/~pete/914893a3-249e-4a91-851c-f467fc185eec.txt I am assuming that Azure's capturing of the outut is correct.... -pete. From owner-freebsd-stable@freebsd.org Tue Mar 14 11:54:16 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 11D8DD0CAA0 for ; Tue, 14 Mar 2017 11:54:16 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 016CE1CDA for ; Tue, 14 Mar 2017 11:54:16 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v2EBsEUT003522 for ; Tue, 14 Mar 2017 11:54:15 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-stable@FreeBSD.org Subject: [Bug 213903] Kernel crashes from turnstile_broadcast (/usr/src/sys/kern/subr_turnstile.c:837) Date: Tue, 14 Mar 2017 11:54:14 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: crash X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: peixoto.cassiano@gmail.com X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Mar 2017 11:54:16 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D213903 --- Comment #17 from Cassiano Peixoto --- (In reply to Mateusz Guzik from comment #16) Hi Mateusz, Sorry but i can't try this patch, i had to rollback the old kernel to avoid crashes. It's a production server and i can't let it down. :( --=20 You are receiving this mail because: You are on the CC list for the bug.= From owner-freebsd-stable@freebsd.org Tue Mar 14 18:07:27 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2F70AD0C1A0 for ; Tue, 14 Mar 2017 18:07:27 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-211-175.reflexion.net [208.70.211.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E47D61B3 for ; Tue, 14 Mar 2017 18:07:26 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 23324 invoked from network); 14 Mar 2017 18:07:20 -0000 Received: from unknown (HELO mail-cs-02.app.dca.reflexion.local) (10.81.19.2) by 0 (rfx-qmail) with SMTP; 14 Mar 2017 18:07:20 -0000 Received: by mail-cs-02.app.dca.reflexion.local (Reflexion email security v8.30.2) with SMTP; Tue, 14 Mar 2017 14:07:20 -0400 (EDT) Received: (qmail 742 invoked from network); 14 Mar 2017 18:07:20 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 14 Mar 2017 18:07:20 -0000 Received: from [192.168.1.111] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id AEAB4EC8534; Tue, 14 Mar 2017 11:07:19 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!] From: Mark Millard In-Reply-To: <16B3D614-62E1-4E58-B409-8DB9DBB35BCB@dsl-only.net> Date: Tue, 14 Mar 2017 11:07:19 -0700 Cc: Andrew Turner Content-Transfer-Encoding: quoted-printable Message-Id: <5BEAFC6C-DA80-4D7B-AB55-977E585D1ACC@dsl-only.net> References: <01735A68-FED6-4E63-964F-0820FE5C446C@dsl-only.net> <16B3D614-62E1-4E58-B409-8DB9DBB35BCB@dsl-only.net> To: freebsd-arm , FreeBSD Current , FreeBSD-STABLE Mailing List X-Mailer: Apple Mail (2.3259) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Mar 2017 18:07:27 -0000 [This is just a correction to the subject-line text to say arm64 instead of amd64.] On 2017-Mar-14, at 12:58 AM, Mark Millard wrote: [Another correction I'm afraid --about alternative program variations this time.] On 2017-Mar-13, at 11:52 PM, Mark Millard wrote: > I'm still at a loss about how to figure out what stages are messed > up. (Memory coherency? Some memory not swapped out? Bad data swapped > out? Wrong data swapped in?) >=20 > But at least I've found a much smaller/simpler example to demonstrate > some problem with in my Pine64+_ 2GB context. >=20 > The Pine64+ 2GB is the only amd64 context that I have access to. Someday I'll learn to type arm64 the first time instead of amd64. > The following program fails its check for data > having its expected byte pattern in dynamically > allocated memory after a fork/swap-out/swap-in > sequence. >=20 > I'll note that the program sleeps for 60s after > forking to give time to do something else to > cause the parent and child processes to swap > out (RES=3D0 as seen in top). The following about the extra test_check() was wrong. > Note the source code line: >=20 > // test_check(); // Adding this line prevents failure. >=20 > It seem that accessing the region contents before forking > and swapping avoids the problem. But there is a problem > if the region was only written-to before the fork/swap. This was because I'd carelessly moved some loop variables to globals in a way that depended on the initialization of the globals and the extra call changed those values. I've noted code adjustments below (3 lines). I get the failures with them as well. > Another point is the size of the region matters: <=3D 14K Bytes > fails and > 14K Bytes works for as much has I have tested. >=20 >=20 > # more swap_testing.c > // swap_testing.c >=20 > // Built via (c++ was clang++ 4.0 in my case): > // > // cc -g -std=3Dc11 -Wpedantic swap_testing.c > // -O0 and -O2 also gets the problem. >=20 > #include // for fork(), sleep(.) > #include // for pid_t > #include // for wait(.) >=20 > extern void test_setup(void); // Sets up the memory byte pattern. > extern void test_check(void); // Tests the memory byte pattern. >=20 > int main(void) > { > test_setup(); test_check(); // This test passes. >=20 > pid_t pid =3D fork(); > int wait_status =3D 0;; >=20 > if (0=20 > if (-1!=3Dwait_status && 0<=3Dpid) > { > if (0=3D=3Dpid) > { > sleep(60); >=20 > // During this manually force this process to > // swap out. I use something like: >=20 > // stress -m 1 --vm-bytes 1800M >=20 > // in another shell and ^C'ing it after top > // shows the swapped status desired. 1800M > // just happened to work on the Pine64+ 2GB > // that I was using. > } >=20 > test_check(); > } > } >=20 > // The memory and test code follows. >=20 > #include // for bool, true, false > #include // for size_t, NULL > #include // for malloc(.), free(.) >=20 > #include // for raise(.), SIGABRT >=20 > #define region_size (14u*1024u) > // Bad dyn_region pattern, parent and child > // processes: > // 256u, 4u*1024u, 8u*1024u, 9u*1024u, > // 12u*1024u, 14u*1024u >=20 > // Works: > // 14u*1024u+1u, 15u*1024u, 16u*1024u, > // 32u*1024u, 256u*1024u*1024u >=20 > typedef volatile unsigned char value_type; >=20 > struct region_struct { value_type array[region_size]; }; > typedef struct region_struct region; >=20 > static region gbl_region; > static region * volatile dyn_region =3D NULL; >=20 > static value_type value(size_t v) { return (value_type)v; } >=20 > void test_setup(void) { > dyn_region =3D malloc(sizeof(region)); > if (!dyn_region) raise(SIGABRT); >=20 > for(size_t i=3D0u; i (*dyn_region).array[i] =3D gbl_region.array[i] =3D value(i); > } > } >=20 > static volatile bool gbl_failed =3D false; // Until potentially = disproved > static volatile size_t gbl_pos =3D 0u; >=20 > static volatile bool dyn_failed =3D false; // Until potentially = disproved > static volatile size_t dyn_pos =3D 0u; >=20 > void test_check(void) { gbl_pos =3D 0u; > while (!gbl_failed && gbl_pos gbl_failed =3D (value(gbl_pos) !=3D gbl_region.array[gbl_pos]); > gbl_pos++; > } >=20 dyn_pos =3D 0u; > while (!dyn_failed && dyn_pos dyn_failed =3D (value(dyn_pos) !=3D = (*dyn_region).array[dyn_pos]); > // Note: When the memory pattern fails this case is that > // records the failure. > dyn_pos++; > } >=20 > if (gbl_failed) raise(SIGABRT); > if (dyn_failed) raise(SIGABRT); // lldb reports this line for the = __raise call. > // when it fails (both parent and = child processes). > } I'm not bothering to redo the details below for the line number variations. > Other details from lldb (not using -O2 so things are > simpler, not presented in the order examined): >=20 > # lldb a.out -c /var/crash/a.out.11575.core > (lldb) target create "a.out" --core "/var/crash/a.out.11575.core" > Core file '/var/crash/a.out.11575.core' (aarch64) was loaded. > (lldb) bt > * thread #1, name =3D 'a.out', stop reason =3D signal SIGABRT > * frame #0: 0x0000000040113d38 libc.so.7`_thr_kill + 8 > frame #1: libc.so.7`__raise(s=3D6) at raise.c:52 > frame #2: a.out`test_check at swap_testing.c:103 > frame #3: a.out`main at swap_testing.c:42 > frame #4: 0x0000000000020184 a.out`__start + 364 > frame #5: ld-elf.so.1`.rtld_start at rtld_start.S:41 >=20 > (lldb) up 2 > frame #2: a.out`test_check at swap_testing.c:103 > 100 } > 101 =09 > 102 if (gbl_failed) raise(SIGABRT); > -> 103 if (dyn_failed) raise(SIGABRT); // lldb reports this = line for the __raise call. > 104 // when it fails (both = parent and child processes). > 105 } >=20 > (lldb) print dyn_pos > (size_t) $0 =3D 2 >=20 > (That is one after the failure position.) >=20 >=20 > (lldb) print dyn_region > (region *volatile) $3 =3D 0x0000000040616000 >=20 > (lldb) print *dyn_region > (region) $1 =3D { > array =3D { > [0] =3D '\0' > [1] =3D '\0' > [2] =3D '\0' > . . . (all '\0' bytes) . . . > [251] =3D '\0' > [252] =3D '\0' > [253] =3D '\0' > [254] =3D '\0' > [255] =3D '\0' > ... > } > } >=20 > (lldb) print gbl_region > (region) $2 =3D { > array =3D { > [0] =3D '\0' > [1] =3D '\x01' > [2] =3D '\x02' > . . . > [251] =3D '\xfb' > [252] =3D '\xfc' > [253] =3D '\xfd' > [254] =3D '\xfe' > [255] =3D '\xff' > ... > } > } >=20 > (lldb) disass -n main > a.out`main: > 0x2022c <+0>: sub sp, sp, #0x30 ; =3D0x30=20 > 0x20230 <+4>: stp x29, x30, [sp, #0x20] > 0x20234 <+8>: add x29, sp, #0x20 ; =3D0x20=20 > 0x20238 <+12>: stur wzr, [x29, #-0x4] > 0x2023c <+16>: bl 0x202b0 ; test_setup at = swap_testing.c:74 > 0x20240 <+20>: bl 0x20580 ; symbol stub for: = fork > 0x20244 <+24>: mov w8, wzr > 0x20248 <+28>: stur w0, [x29, #-0x8] > 0x2024c <+32>: stur wzr, [x29, #-0xc] > 0x20250 <+36>: ldur w0, [x29, #-0x8] > 0x20254 <+40>: cmp w8, w0 > 0x20258 <+44>: b.ge 0x20268 ; <+60> at = swap_testing.c > 0x2025c <+48>: sub x0, x29, #0xc ; =3D0xc=20 > 0x20260 <+52>: bl 0x20590 ; symbol stub for: = wait > 0x20264 <+56>: str w0, [sp, #0x10] > 0x20268 <+60>: mov w8, #-0x1 > 0x2026c <+64>: ldur w9, [x29, #-0xc] > 0x20270 <+68>: cmp w8, w9 > 0x20274 <+72>: b.eq 0x202a0 ; <+116> at = swap_testing.c:44 > 0x20278 <+76>: mov w8, wzr > 0x2027c <+80>: ldur w9, [x29, #-0x8] > 0x20280 <+84>: cmp w8, w9 > 0x20284 <+88>: b.gt 0x202a0 ; <+116> at = swap_testing.c:44 > 0x20288 <+92>: ldur w8, [x29, #-0x8] > 0x2028c <+96>: cbnz w8, 0x2029c ; <+112> at = swap_testing.c:42 > 0x20290 <+100>: orr w0, wzr, #0x3c > 0x20294 <+104>: bl 0x205a0 ; symbol stub for: = sleep > 0x20298 <+108>: str w0, [sp, #0xc] > 0x2029c <+112>: bl 0x20348 ; test_check at = swap_testing.c:89 > 0x202a0 <+116>: ldur w0, [x29, #-0x4] > 0x202a4 <+120>: ldp x29, x30, [sp, #0x20] > 0x202a8 <+124>: add sp, sp, #0x30 ; =3D0x30=20 > 0x202ac <+128>: ret =20 >=20 > (lldb) disass -n value > a.out`value: > 0x204cc <+0>: sub sp, sp, #0x10 ; =3D0x10=20 > 0x204d0 <+4>: str x0, [sp, #0x8] > 0x204d4 <+8>: ldrb w8, [sp, #0x8] > 0x204d8 <+12>: mov w1, w8 > 0x204dc <+16>: mov w0, w8 > 0x204e0 <+20>: str w1, [sp, #0x4] > 0x204e4 <+24>: add sp, sp, #0x10 ; =3D0x10=20 > 0x204e8 <+28>: ret =20 >=20 > (lldb) disass -n test_setup > a.out`test_setup: > 0x202b0 <+0>: sub sp, sp, #0x20 ; =3D0x20=20 > 0x202b4 <+4>: stp x29, x30, [sp, #0x10] > 0x202b8 <+8>: add x29, sp, #0x10 ; =3D0x10=20 > 0x202bc <+12>: orr x0, xzr, #0x3800 > 0x202c0 <+16>: bl 0x205b0 ; symbol stub for: = malloc > 0x202c4 <+20>: adrp x30, 48 > 0x202c8 <+24>: add x30, x30, #0x0 ; =3D0x0=20 > 0x202cc <+28>: str x0, [x30] > 0x202d0 <+32>: ldr x0, [x30] > 0x202d4 <+36>: cbnz x0, 0x202e4 ; <+52> at = swap_testing.c:78 > 0x202d8 <+40>: orr w0, wzr, #0x6 > 0x202dc <+44>: bl 0x205c0 ; symbol stub for: = raise > 0x202e0 <+48>: str w0, [sp, #0x4] > 0x202e4 <+52>: str xzr, [sp, #0x8] > 0x202e8 <+56>: orr x8, xzr, #0x3800 > 0x202ec <+60>: ldr x9, [sp, #0x8] > 0x202f0 <+64>: cmp x9, x8 > 0x202f4 <+68>: b.hs 0x2033c ; <+140> at = swap_testing.c:81 > 0x202f8 <+72>: ldr x0, [sp, #0x8] > 0x202fc <+76>: bl 0x204cc ; value at = swap_testing.c:72 > 0x20300 <+80>: adrp x30, 48 > 0x20304 <+84>: add x30, x30, #0x0 ; =3D0x0=20 > 0x20308 <+88>: adrp x8, 48 > 0x2030c <+92>: add x8, x8, #0x8 ; =3D0x8=20 > 0x20310 <+96>: ldr x9, [sp, #0x8] > 0x20314 <+100>: add x8, x8, x9 > 0x20318 <+104>: strb w0, [x8] > 0x2031c <+108>: ldr x8, [x30] > 0x20320 <+112>: ldr x9, [sp, #0x8] > 0x20324 <+116>: add x8, x8, x9 > 0x20328 <+120>: strb w0, [x8] > 0x2032c <+124>: ldr x8, [sp, #0x8] > 0x20330 <+128>: add x8, x8, #0x1 ; =3D0x1=20 > 0x20334 <+132>: str x8, [sp, #0x8] > 0x20338 <+136>: b 0x202e8 ; <+56> at = swap_testing.c > 0x2033c <+140>: ldp x29, x30, [sp, #0x10] > 0x20340 <+144>: add sp, sp, #0x20 ; =3D0x20=20 > 0x20344 <+148>: ret =20 >=20 > (lldb) disass -n test_check > a.out`test_check: > 0x20348 <+0>: sub sp, sp, #0x20 ; =3D0x20=20 > 0x2034c <+4>: stp x29, x30, [sp, #0x10] > 0x20350 <+8>: add x29, sp, #0x10 ; =3D0x10=20 > 0x20354 <+12>: b 0x20358 ; <+16> at = swap_testing.c > 0x20358 <+16>: mov w8, wzr > 0x2035c <+20>: adrp x9, 51 > 0x20360 <+24>: add x9, x9, #0x808 ; =3D0x808=20 > 0x20364 <+28>: ldrb w10, [x9] > 0x20368 <+32>: stur w8, [x29, #-0x4] > 0x2036c <+36>: tbnz w10, #0x0, 0x2038c ; <+68> at = swap_testing.c > 0x20370 <+40>: orr x8, xzr, #0x3800 > 0x20374 <+44>: adrp x9, 51 > 0x20378 <+48>: add x9, x9, #0x810 ; =3D0x810=20 > 0x2037c <+52>: ldr x9, [x9] > 0x20380 <+56>: cmp x9, x8 > 0x20384 <+60>: cset w10, lo > 0x20388 <+64>: stur w10, [x29, #-0x4] > 0x2038c <+68>: ldur w8, [x29, #-0x4] > 0x20390 <+72>: tbz w8, #0x0, 0x203ec ; <+164> at = swap_testing.c:95 > 0x20394 <+76>: adrp x8, 51 > 0x20398 <+80>: add x8, x8, #0x810 ; =3D0x810=20 > 0x2039c <+84>: ldr x0, [x8] > 0x203a0 <+88>: bl 0x204cc ; value at = swap_testing.c:72 > 0x203a4 <+92>: adrp x8, 51 > 0x203a8 <+96>: add x8, x8, #0x810 ; =3D0x810=20 > 0x203ac <+100>: adrp x30, 51 > 0x203b0 <+104>: add x30, x30, #0x808 ; =3D0x808=20 > 0x203b4 <+108>: adrp x9, 48 > 0x203b8 <+112>: add x9, x9, #0x8 ; =3D0x8=20 > 0x203bc <+116>: uxtb w0, w0 > 0x203c0 <+120>: ldr x10, [x8] > 0x203c4 <+124>: add x9, x9, x10 > 0x203c8 <+128>: ldrb w11, [x9] > 0x203cc <+132>: cmp w0, w11 > 0x203d0 <+136>: cset w11, ne > 0x203d4 <+140>: and w11, w11, #0x1 > 0x203d8 <+144>: strb w11, [x30] > 0x203dc <+148>: ldr x9, [x8] > 0x203e0 <+152>: add x9, x9, #0x1 ; =3D0x1=20 > 0x203e4 <+156>: str x9, [x8] > 0x203e8 <+160>: b 0x20358 ; <+16> at = swap_testing.c > 0x203ec <+164>: b 0x203f0 ; <+168> at = swap_testing.c > 0x203f0 <+168>: mov w8, wzr > 0x203f4 <+172>: adrp x9, 51 > 0x203f8 <+176>: add x9, x9, #0x818 ; =3D0x818=20 > 0x203fc <+180>: ldrb w10, [x9] > 0x20400 <+184>: str w8, [sp, #0x8] > 0x20404 <+188>: tbnz w10, #0x0, 0x20424 ; <+220> at = swap_testing.c > 0x20408 <+192>: orr x8, xzr, #0x3800 > 0x2040c <+196>: adrp x9, 51 > 0x20410 <+200>: add x9, x9, #0x820 ; =3D0x820=20 > 0x20414 <+204>: ldr x9, [x9] > 0x20418 <+208>: cmp x9, x8 > 0x2041c <+212>: cset w10, lo > 0x20420 <+216>: str w10, [sp, #0x8] > 0x20424 <+220>: ldr w8, [sp, #0x8] > 0x20428 <+224>: tbz w8, #0x0, 0x20488 ; <+320> at = swap_testing.c > 0x2042c <+228>: adrp x8, 51 > 0x20430 <+232>: add x8, x8, #0x820 ; =3D0x820=20 > 0x20434 <+236>: ldr x0, [x8] > 0x20438 <+240>: bl 0x204cc ; value at = swap_testing.c:72 > 0x2043c <+244>: adrp x8, 51 > 0x20440 <+248>: add x8, x8, #0x820 ; =3D0x820=20 > 0x20444 <+252>: adrp x30, 51 > 0x20448 <+256>: add x30, x30, #0x818 ; =3D0x818=20 > 0x2044c <+260>: adrp x9, 48 > 0x20450 <+264>: add x9, x9, #0x0 ; =3D0x0=20 > 0x20454 <+268>: uxtb w0, w0 > 0x20458 <+272>: ldr x9, [x9] > 0x2045c <+276>: ldr x10, [x8] > 0x20460 <+280>: add x9, x9, x10 > 0x20464 <+284>: ldrb w11, [x9] > 0x20468 <+288>: cmp w0, w11 > 0x2046c <+292>: cset w11, ne > 0x20470 <+296>: and w11, w11, #0x1 > 0x20474 <+300>: strb w11, [x30] > 0x20478 <+304>: ldr x9, [x8] > 0x2047c <+308>: add x9, x9, #0x1 ; =3D0x1=20 > 0x20480 <+312>: str x9, [x8] > 0x20484 <+316>: b 0x203f0 ; <+168> at = swap_testing.c > 0x20488 <+320>: adrp x8, 51 > 0x2048c <+324>: add x8, x8, #0x808 ; =3D0x808=20 > 0x20490 <+328>: ldrb w9, [x8] > 0x20494 <+332>: tbz w9, #0x0, 0x204a4 ; <+348> at = swap_testing.c > 0x20498 <+336>: orr w0, wzr, #0x6 > 0x2049c <+340>: bl 0x205c0 ; symbol stub for: = raise > 0x204a0 <+344>: str w0, [sp, #0x4] > 0x204a4 <+348>: adrp x8, 51 > 0x204a8 <+352>: add x8, x8, #0x818 ; =3D0x818=20 > 0x204ac <+356>: ldrb w9, [x8] > 0x204b0 <+360>: tbz w9, #0x0, 0x204c0 ; <+376> at = swap_testing.c:105 > 0x204b4 <+364>: orr w0, wzr, #0x6 > 0x204b8 <+368>: bl 0x205c0 ; symbol stub for: = raise > -> 0x204bc <+372>: str w0, [sp] > 0x204c0 <+376>: ldp x29, x30, [sp, #0x10] > 0x204c4 <+380>: add sp, sp, #0x20 ; =3D0x20=20 > 0x204c8 <+384>: ret =20 >=20 > # uname -apKU > FreeBSD pine64 12.0-CURRENT FreeBSD 12.0-CURRENT r314638M arm64 = aarch64 1200023 1200023 >=20 > buildworld buildlkernel did not have MALLOC_PRODUCTION=3D defined. The = kernel is a > non-debug kernel. (Previous to these experiments my other corruption = examples > were not caught by a debug kernel. I'm not hopeful that this simpler = context > would either.) =3D=3D=3D Mark Millard markmi at dsl-only.net _______________________________________________ freebsd-arm@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-arm To unsubscribe, send any mail to "freebsd-arm-unsubscribe@freebsd.org" From owner-freebsd-stable@freebsd.org Tue Mar 14 22:29:02 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2BF4CD0D80D for ; Tue, 14 Mar 2017 22:29:02 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-211-181.reflexion.net [208.70.211.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E1A8A161B for ; Tue, 14 Mar 2017 22:29:01 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 567 invoked from network); 14 Mar 2017 22:28:55 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 14 Mar 2017 22:28:55 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v8.30.2) with SMTP; Tue, 14 Mar 2017 18:28:55 -0400 (EDT) Received: (qmail 26886 invoked from network); 14 Mar 2017 22:28:55 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 14 Mar 2017 22:28:55 -0000 Received: from [192.168.1.111] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id 4CAC6EC8159; Tue, 14 Mar 2017 15:28:54 -0700 (PDT) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!] Date: Tue, 14 Mar 2017 15:28:53 -0700 References: <01735A68-FED6-4E63-964F-0820FE5C446C@dsl-only.net> <16B3D614-62E1-4E58-B409-8DB9DBB35BCB@dsl-only.net> <5BEAFC6C-DA80-4D7B-AB55-977E585D1ACC@dsl-only.net> To: Andrew Turner , freebsd-arm , FreeBSD Current , FreeBSD-STABLE Mailing List In-Reply-To: <5BEAFC6C-DA80-4D7B-AB55-977E585D1ACC@dsl-only.net> Message-Id: X-Mailer: Apple Mail (2.3259) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Mar 2017 22:29:02 -0000 [test_check() between the fork and the wait/sleep prevents the failure from occurring. Even a small access to the memory at that stage prevents the failure. Details follow.] On 2017-Mar-14, at 11:07 AM, Mark Millard wrote: > [This is just a correction to the subject-line text to say arm64 > instead of amd64.] >=20 > On 2017-Mar-14, at 12:58 AM, Mark Millard wrote: >=20 > [Another correction I'm afraid --about alternative program variations > this time.] >=20 > On 2017-Mar-13, at 11:52 PM, Mark Millard wrote: >=20 >> I'm still at a loss about how to figure out what stages are messed >> up. (Memory coherency? Some memory not swapped out? Bad data swapped >> out? Wrong data swapped in?) >>=20 >> But at least I've found a much smaller/simpler example to demonstrate >> some problem with in my Pine64+_ 2GB context. >>=20 >> The Pine64+ 2GB is the only amd64 context that I have access to. >=20 > Someday I'll learn to type arm64 the first time instead of amd64. >=20 >> The following program fails its check for data >> having its expected byte pattern in dynamically >> allocated memory after a fork/swap-out/swap-in >> sequence. >>=20 >> I'll note that the program sleeps for 60s after >> forking to give time to do something else to >> cause the parent and child processes to swap >> out (RES=3D0 as seen in top). >=20 > The following about the extra test_check() was > wrong. >=20 >> Note the source code line: >>=20 >> // test_check(); // Adding this line prevents failure. >>=20 >> It seem that accessing the region contents before forking >> and swapping avoids the problem. But there is a problem >> if the region was only written-to before the fork/swap. There is a place that if a test_check call is put then the problem does not happen at any stage: I tried putting a call between the fork and the later wait/sleep code: int main(void) { test_setup(); test_check(); // Before fork() [passes] pid_t pid =3D fork(); int wait_status =3D 0;; // test_check(); // After fork(); before wait/sleep.=20 // If used it prevents failure later! if (0 Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5BCF7D0DBC7; Tue, 14 Mar 2017 23:47:20 +0000 (UTC) (envelope-from ticso@cicely7.cicely.de) Received: from raven.bwct.de (raven.bwct.de [195.149.99.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "raven.bwct.de", Issuer "raven.bwct.de" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id E5DC81CE9; Tue, 14 Mar 2017 23:47:19 +0000 (UTC) (envelope-from ticso@cicely7.cicely.de) Received: from mail.cicely.de ([10.1.1.37]) by raven.bwct.de (8.15.2/8.15.2) with ESMTPS id v2ENihnU040775 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Wed, 15 Mar 2017 00:44:43 +0100 (CET) (envelope-from ticso@cicely7.cicely.de) Received: from cicely7.cicely.de (cicely7.cicely.de [10.1.1.9]) by mail.cicely.de (8.14.5/8.14.4) with ESMTP id v2ENifKN051896 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 15 Mar 2017 00:44:41 +0100 (CET) (envelope-from ticso@cicely7.cicely.de) Received: from cicely7.cicely.de (localhost [127.0.0.1]) by cicely7.cicely.de (8.15.2/8.15.2) with ESMTP id v2ENiecG025847; Wed, 15 Mar 2017 00:44:40 +0100 (CET) (envelope-from ticso@cicely7.cicely.de) Received: (from ticso@localhost) by cicely7.cicely.de (8.15.2/8.15.2/Submit) id v2ENicbl025846; Wed, 15 Mar 2017 00:44:38 +0100 (CET) (envelope-from ticso) Date: Wed, 15 Mar 2017 00:44:38 +0100 From: Bernd Walter To: Mark Millard Cc: Andrew Turner , freebsd-arm , FreeBSD Current , FreeBSD-STABLE Mailing List Subject: Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!] Message-ID: <20170314234437.GA25820@cicely7.cicely.de> Reply-To: ticso@cicely.de References: <01735A68-FED6-4E63-964F-0820FE5C446C@dsl-only.net> <16B3D614-62E1-4E58-B409-8DB9DBB35BCB@dsl-only.net> <5BEAFC6C-DA80-4D7B-AB55-977E585D1ACC@dsl-only.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Operating-System: FreeBSD cicely7.cicely.de 10.2-RELEASE amd64 User-Agent: Mutt/1.5.11 X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED=-1, BAYES_00=-1.9, RP_MATCHES_RCVD=-1.507 autolearn=ham version=3.3.0 X-Spam-Checker-Version: SpamAssassin 3.3.0 (2010-01-18) on spamd.cicely.de X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Mar 2017 23:47:20 -0000 On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote: > [test_check() between the fork and the wait/sleep prevents the > failure from occurring. Even a small access to the memory at > that stage prevents the failure. Details follow.] Maybe a stupid question, since you might have written it somewhere. What medium do you swap to? I've seen broken firmware on microSD cards doing silent data corruption for some access patterns. -- B.Walter http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm. From owner-freebsd-stable@freebsd.org Wed Mar 15 01:19:00 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2A956D0A1CF for ; Wed, 15 Mar 2017 01:19:00 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-211-180.reflexion.net [208.70.211.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CECFF117B for ; Wed, 15 Mar 2017 01:18:59 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 18151 invoked from network); 15 Mar 2017 01:18:58 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 15 Mar 2017 01:18:58 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v8.30.2) with SMTP; Tue, 14 Mar 2017 21:18:58 -0400 (EDT) Received: (qmail 13043 invoked from network); 15 Mar 2017 01:18:58 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 15 Mar 2017 01:18:58 -0000 Received: from [192.168.1.111] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id 40CE3EC7652; Tue, 14 Mar 2017 18:18:57 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!] From: Mark Millard In-Reply-To: <20170314234437.GA25820@cicely7.cicely.de> Date: Tue, 14 Mar 2017 18:18:56 -0700 Cc: Andrew Turner , freebsd-arm , FreeBSD Current , FreeBSD-STABLE Mailing List Content-Transfer-Encoding: 7bit Message-Id: <830B8C57-C9B0-4902-94D2-A8E7F1CB9ADB@dsl-only.net> References: <01735A68-FED6-4E63-964F-0820FE5C446C@dsl-only.net> <16B3D614-62E1-4E58-B409-8DB9DBB35BCB@dsl-only.net> <5BEAFC6C-DA80-4D7B-AB55-977E585D1ACC@dsl-only.net> <20170314234437.GA25820@cicely7.cicely.de> To: ticso@cicely.de X-Mailer: Apple Mail (2.3259) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Mar 2017 01:19:00 -0000 On 2017-Mar-14, at 4:44 PM, Bernd Walter wrote: > On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote: >> [test_check() between the fork and the wait/sleep prevents the >> failure from occurring. Even a small access to the memory at >> that stage prevents the failure. Details follow.] > > Maybe a stupid question, since you might have written it somewhere. > What medium do you swap to? > I've seen broken firmware on microSD cards doing silent data > corruption for some access patterns. The root filesystem is on a USB SSD on a powered hub. Only the kernel is from the microSD card. I have several examples of the USB SSD model and have never observed such problems in any other context. The original issue that started this investigation has been reported by several people on the lists: Failed assertion: "tsd_booted" on arm64 specifically, no other contexts so far as I know. Earlier I had discovered that: A) I could use a swap-in to cause the messages from instances of sh or su that had swapped out earlier. B) The core dumps showed that a large memory region containing the global tsd_booted had all turned to be zero bytes. The assert is just exposing one of those zeros. (tsd_booted is from jemalloc that is in a .so that is loaded.) This prompted me to look for simpler contexts involving swapping that also show memory corruption. So far I've only managed to produce corrupted memory when fork and later swapping are both involved. Being a shared library global is not a requirement for the problem, although such contexts can have an issue. I've not made a simpler example of that yet, although I tried. I have not explored vfork, rfork, or any other alternatives. So far all failure examples end up with zeroed memory when the memory does not match the original pattern from before the fork. At least that is what the core dumps show for all examples that I've looked at. See bugzilla 217138 and 217239. In some respects this example is more analogous to the 217239 context as I remember. My tests on amd64, armv6 (really -mcpu=cortex-a7 so armv7), and powerpc64 have never produced any problems, including never getting the failed assertion. Only arm64. (But I've access to only one arm64 system, a Pine64+ 2GB.) Prior to this I tracked down a different arm64 problem to the fork_trampline code (for the child process) modifying a system register but in a place allowing interrupts that could also change the value. Andrew Turner fixed that one at the time. For this fork/swapping kind of issue I'm not sure that I'll be able to do more than provide the simpler example and the steps that I used. My isolating the internal stage(s) and specific problem(s) at the code level of detail does not seem likely. But whatever is found needs to be able to explain the contrast with an access after the fork but before the swap preventing the failing behavior. So what I've got so far hopefully does provide some hints to someone. === Mark Millard markmi at dsl-only.net From owner-freebsd-stable@freebsd.org Wed Mar 15 03:09:07 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E50EFD0C0B7 for ; Wed, 15 Mar 2017 03:09:07 +0000 (UTC) (envelope-from delborrell@googlemail.com) Received: from mail-ua0-x22a.google.com (mail-ua0-x22a.google.com [IPv6:2607:f8b0:400c:c08::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A51BB13A1 for ; Wed, 15 Mar 2017 03:09:07 +0000 (UTC) (envelope-from delborrell@googlemail.com) Received: by mail-ua0-x22a.google.com with SMTP id 72so2442554uaf.3 for ; Tue, 14 Mar 2017 20:09:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=P6c50XjU4JSPI2nuAGxRQJmiqCfK4UegjIZG+PwqKHo=; b=Vu8PiTLijULDGEvaXQ95vSOHwxpdZAHMcaM4SwUbbjjEiolQBHH3F/OcECnRFs3Gn+ tV4/FCYkx/yKHUcrGISIdAzqHzd81X+Nje9ReaTcEZlI4Z+932v7N/kFC4Ml1vOur9Mf Niq0vNgNjB4w3r3ZRJCPqdvMvWOZmdyrYz2I8GVByYCfFFf5RRNvYtn3m9NtX31yEkMG B+w/Fkp1GbpOjnTSvQ02vd2JIH+XvLJf/8iJBVeDAzEYznNbXrRQk3fX5mb8TGFgqhBl sLA65H4RwXa7DHerBNf3fuMiFDIGn7zGF2mM/JGzJE4LNI4etmV6foS/WbonLsOfNMzI wHIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=P6c50XjU4JSPI2nuAGxRQJmiqCfK4UegjIZG+PwqKHo=; b=XMlQFIqafwtu4U4m60HhOGZIP6kA7odp6nmz2EJuK+mCRbmzYdOyMsoohs77NcNa62 byzEt2FC6fpMVwQL/1Qq1YGkR3uMrBv0H0e/d0QhN+I4s1s198wXY0FYQyPpcSh0U27F E+YBRlDMZdoTYuaiBw5RKGvH9o6w+yUMigh26PoemW3gq8r1BwXYTl3cF0Tye6C6uF/c H4sfma+i1gVRJFA8Qfhymg6nlxs3gi+cdksUuF8XehaMbzujIjuh94nvZwUH3GpiNv3Q gNZVtJJl3a+U8OdEoLt43Q1G5beuJE2fQfLbKoexhROBny8AIG97k0lM4h6ZtBT4LCqb tdNA== X-Gm-Message-State: AFeK/H3pYYq/ILdi78lbG9Tz9AheHpXvRuaNU5r3zLG+/OQ4kecb0i3ITf1vV43jNNj2loOEHkdOBjqNlbRyiA== X-Received: by 10.176.84.213 with SMTP id q21mr445060uaa.24.1489547346504; Tue, 14 Mar 2017 20:09:06 -0700 (PDT) MIME-Version: 1.0 Received: by 10.31.48.213 with HTTP; Tue, 14 Mar 2017 20:09:06 -0700 (PDT) From: Steven Borrelli Date: Tue, 14 Mar 2017 22:09:06 -0500 Message-ID: Subject: IPFW kernel build failing on 11.0-STABLE To: freebsd-stable@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Mar 2017 03:09:08 -0000 Tried building IPFW into my kernel and it failed midway with this: cc -c -O2 -pipe -fno-strict-aliasing -g -nostdinc -I. -I/usr/src/sys -I/usr/src/sys/contrib/libfdt -D_KERNEL -DHAVE_ KERNEL_OPTION_HEADERS -include opt_global.h -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -MD -MF.depend.ppb_1 284.o -MTppb_1284.o -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float -fno-asynchronous-unwind-tables -ffre estanding -fwrapv -fstack-protector -gdwarf-2 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-p rototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__ -Wmissing-i nclude-dirs -fdiagnostics-show-option -Wno-unknown-pragmas -Wno-error-tautological-compare -Wno-error-empty-body -Wn o-error-parentheses-equality -Wno-error-unused-function -Wno-error-pointer-sign -Wno-error-shift-negative-value -mno- aes -mno-avx -std=iso9899:1999 -Werror /usr/src/sys/dev/ppbus/ppb_1284.c /usr/src/sys/dev/ppbus/ppb_1284.c:296:46: error: implicit conversion from 'int' to 'char' changes value from 144 to -112 [-Werror,-Wconstant-conversion] if ((error = do_peripheral_wait(bus, SELECT | nBUSY, 0))) { ~~~~~~~~~~~~~~~~~~ ~~~~~~~^~~~~~~ /usr/src/sys/dev/ppbus/ppb_1284.c:785:48: error: implicit conversion from 'int' to 'char' changes value from 240 to -16 [-Werror,-Wconstant-conversion] if (do_1284_wait(bus, nACK | SELECT | PERROR | nBUSY, ~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~ /usr/src/sys/dev/ppbus/ppb_1284.c:786:29: error: implicit conversion from 'int' to 'char' changes value from 240 to -16 [-Werror,-Wconstant-conversion] nACK | SELECT | PERROR | nBUSY)) { ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~ /usr/src/sys/dev/ppbus/ppb_1284.c:841:37: error: implicit conversion from 'int' to 'char' changes value from 200 to -56 [-Werror,-Wconstant-conversion] if (do_1284_wait(bus, nACK | nBUSY | nFAULT, nFAULT)) { ~~~~~~~~~~~~ ~~~~~~~~~~~~~^~~~~~~~ 4 errors generated. *** Error code 1 Stop. make[2]: stopped in /usr/obj/usr/src/sys/IPFW *** Error code 1 Stop. make[1]: stopped in /usr/src *** Error code 1 Stop. make: stopped in /usr/src q [1] Exit 1 ( make buildkernel KERNCONF=IPFW && make installkernel KERNCONF=IPFW ) root@****:/usr/src # uname -a FreeBSD **** 11.0-STABLE FreeBSD 11.0-STABLE #0 r314941: Thu Mar 9 19:39:31 UTC 2017 root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 From owner-freebsd-stable@freebsd.org Wed Mar 15 04:33:12 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1A6F9D0D88A for ; Wed, 15 Mar 2017 04:33:12 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-211-175.reflexion.net [208.70.211.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C98131D4C for ; Wed, 15 Mar 2017 04:33:11 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 17698 invoked from network); 15 Mar 2017 04:33:09 -0000 Received: from unknown (HELO mail-cs-02.app.dca.reflexion.local) (10.81.19.2) by 0 (rfx-qmail) with SMTP; 15 Mar 2017 04:33:09 -0000 Received: by mail-cs-02.app.dca.reflexion.local (Reflexion email security v8.30.2) with SMTP; Wed, 15 Mar 2017 00:33:09 -0400 (EDT) Received: (qmail 1852 invoked from network); 15 Mar 2017 04:33:09 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 15 Mar 2017 04:33:09 -0000 Received: from [192.168.1.111] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id DDA81EC8534; Tue, 14 Mar 2017 21:33:08 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!] From: Mark Millard In-Reply-To: Date: Tue, 14 Mar 2017 21:33:08 -0700 Cc: FreeBSD-STABLE Mailing List Content-Transfer-Encoding: quoted-printable Message-Id: <10F50F1C-FD26-4142-9350-966312822438@dsl-only.net> References: <01735A68-FED6-4E63-964F-0820FE5C446C@dsl-only.net> <16B3D614-62E1-4E58-B409-8DB9DBB35BCB@dsl-only.net> <5BEAFC6C-DA80-4D7B-AB55-977E585D1ACC@dsl-only.net> To: Andrew Turner , freebsd-arm , FreeBSD Current X-Mailer: Apple Mail (2.3259) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Mar 2017 04:33:12 -0000 A single Byte access to a 4K Byte aligned region between the fork and wait/sleep/swap-out prevents that specific 4K Byte region from having the (bad) zeros. Sounds like a page sized unit of behavior to me. Details follow. On 2017-Mar-14, at 3:28 PM, Mark Millard wrote: > [test_check() between the fork and the wait/sleep prevents the > failure from occurring. Even a small access to the memory at > that stage prevents the failure. Details follow.] >=20 > On 2017-Mar-14, at 11:07 AM, Mark Millard wrote: >=20 >> [This is just a correction to the subject-line text to say arm64 >> instead of amd64.] >>=20 >> On 2017-Mar-14, at 12:58 AM, Mark Millard = wrote: >>=20 >> [Another correction I'm afraid --about alternative program variations >> this time.] >>=20 >> On 2017-Mar-13, at 11:52 PM, Mark Millard = wrote: >>=20 >>> I'm still at a loss about how to figure out what stages are messed >>> up. (Memory coherency? Some memory not swapped out? Bad data swapped >>> out? Wrong data swapped in?) >>>=20 >>> But at least I've found a much smaller/simpler example to = demonstrate >>> some problem with in my Pine64+_ 2GB context. >>>=20 >>> The Pine64+ 2GB is the only amd64 context that I have access to. >>=20 >> Someday I'll learn to type arm64 the first time instead of amd64. >>=20 >>> The following program fails its check for data >>> having its expected byte pattern in dynamically >>> allocated memory after a fork/swap-out/swap-in >>> sequence. >>>=20 >>> I'll note that the program sleeps for 60s after >>> forking to give time to do something else to >>> cause the parent and child processes to swap >>> out (RES=3D0 as seen in top). >>=20 >> The following about the extra test_check() was >> wrong. >>=20 >>> Note the source code line: >>>=20 >>> // test_check(); // Adding this line prevents failure. >>>=20 >>> It seem that accessing the region contents before forking >>> and swapping avoids the problem. But there is a problem >>> if the region was only written-to before the fork/swap. >=20 > There is a place that if a test_check call is put then the > problem does not happen at any stage: I tried putting a > call between the fork and the later wait/sleep code: I changed the byte sequence patterns to avoid zero values since the bad values are zeros: static value_type value(size_t v) { return (value_type)((v&0xFEu)|0x1u); = } // value now avoids the zero value since the failures // are zeros. With that I can then test accurately what bytes have bad values vs. do not. I also changed to: void partial_test_check(void) { if (value(0u)!=3Dgbl_region.array[0]) raise(SIGABRT); if (value(0u)!=3D(*dyn_region).array[0]) raise(SIGABRT); } since previously [0] had a zero value and so I'd used [1]. On this basis I'm now using the below. See the comments tied to partial_test_check() calls: extern void test_setup(void); // Sets up the memory byte = patterns. extern void test_check(void); // Tests the memory byte patterns. extern void partial_test_check(void); // Tests just [0] of each region // (gbl_region and dyn_region). int main(void) { test_setup(); test_check(); // Before fork() [passes] pid_t pid =3D fork(); int wait_status =3D 0;; // After fork; before waitsleep/swap-out. if (0=3D=3Dpid) partial_test_check(); // Even the above is sufficient by // itself to prevent failure for // region_size 1u through // 4u*1024u! // But 4u*1024u+1u and above fail // with this access to memory. // The failing test is of // (*dyn_region).array[4096u]. // This test never fails here. if (0 This suggests to me that the small access is forcing one or more = things to > be initialized for memory access that fork is not establishing of = itself. > It appears that if established correctly then the swap-out/swap-in > sequence would work okay without needing the manual access to the = memory. >=20 >=20 > So far via this test I've not seen any evidence of problems with the = global > region but only the dynamically allocated region. >=20 > However, the symptoms that started this investigation in a much more > complicated context had an area of global memory from a .so that ended > up being zero. >=20 > I think that things should be fixed for this simpler context first and > that further investigation of the sh/su related should wait to see = what > things are like after this test case works. =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-stable@freebsd.org Wed Mar 15 12:42:35 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 131D4D0DFBA for ; Wed, 15 Mar 2017 12:42:35 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 018F4B14 for ; Wed, 15 Mar 2017 12:42:35 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: by mailman.ysv.freebsd.org (Postfix) id 00F67D0DFB9; Wed, 15 Mar 2017 12:42:35 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 00A5DD0DFB8 for ; Wed, 15 Mar 2017 12:42:35 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from constantine.ingresso.co.uk (unknown [IPv6:2a02:b90:3002:411::3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B53D4B12 for ; Wed, 15 Mar 2017 12:42:34 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from dilbert.london-internal.ingresso.co.uk ([10.64.50.6] helo=dilbert.ingresso.co.uk) by constantine.ingresso.co.uk with esmtps (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.88 (FreeBSD)) (envelope-from ) id 1co8G8-000OLy-3I for stable@freebsd.org; Wed, 15 Mar 2017 12:42:32 +0000 Received: from petefrench by dilbert.ingresso.co.uk with local (Exim 4.89 (FreeBSD)) (envelope-from ) id 1co8G8-00014I-0u for stable@freebsd.org; Wed, 15 Mar 2017 12:42:32 +0000 To: stable@freebsd.org Subject: 11-STABLE fails to build with MK_OFED enabled Message-Id: From: Pete French Date: Wed, 15 Mar 2017 12:42:32 +0000 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Mar 2017 12:42:35 -0000 /usr/src/sys/modules/mlx4ib/../../ofed/drivers/infiniband/hw/mlx4/sysfs.c:90:22: error: format specifies type 'unsigned long long *' but the argument has type 'u64 *' (aka 'unsigned long *') [-Werror,-Wformat] sscanf(buf, "%llx", &sysadmin_ag_val); ~~~~ ^~~~~~~~~~~~~~~~ %lx Fairly trivial to fix obviously - I chnaged it to %lx - but not sure that would work on non 64 bit platforms. -pete. From owner-freebsd-stable@freebsd.org Wed Mar 15 12:47:01 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 40E98D0C276 for ; Wed, 15 Mar 2017 12:47:01 +0000 (UTC) (envelope-from david@catwhisker.org) Received: from albert.catwhisker.org (mx.catwhisker.org [198.144.209.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id EAC62D8B for ; Wed, 15 Mar 2017 12:47:00 +0000 (UTC) (envelope-from david@catwhisker.org) Received: from albert.catwhisker.org (localhost [127.0.0.1]) by albert.catwhisker.org (8.15.2/8.15.2) with ESMTP id v2FCkwjY048158; Wed, 15 Mar 2017 12:46:58 GMT (envelope-from david@albert.catwhisker.org) Received: (from david@localhost) by albert.catwhisker.org (8.15.2/8.15.2/Submit) id v2FCkwIK048157; Wed, 15 Mar 2017 05:46:58 -0700 (PDT) (envelope-from david) Date: Wed, 15 Mar 2017 05:46:58 -0700 From: David Wolfskill To: Steven Borrelli Cc: freebsd-stable@freebsd.org Subject: Re: IPFW kernel build failing on 11.0-STABLE Message-ID: <20170315124658.GR1341@albert.catwhisker.org> Mail-Followup-To: David Wolfskill , Steven Borrelli , freebsd-stable@freebsd.org References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="QD1r0wI5tTrL9hxl" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.8.0 (2017-02-23) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Mar 2017 12:47:01 -0000 --QD1r0wI5tTrL9hxl Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Mar 14, 2017 at 10:09:06PM -0500, Steven Borrelli via freebsd-stabl= e wrote: > Tried building IPFW into my kernel and it failed midway with this: > .... My laptop's kernel (config file named "CANARY") includes IPFW; this morning's stable/11 update from: FreeBSD g1-252.catwhisker.org 11.0-STABLE FreeBSD 11.0-STABLE #280 r315241= M/315241:1100509: Tue Mar 14 04:48:24 PDT 2017 root@g1-252.catwhisker.o= rg:/common/S1/obj/usr/src/sys/CANARY amd64 to=20 FreeBSD g1-252.catwhisker.org 11.0-STABLE FreeBSD 11.0-STABLE #281 r315302= M/315302:1100509: Wed Mar 15 04:56:42 PDT 2017 root@g1-252.catwhisker.o= rg:/common/S1/obj/usr/src/sys/CANARY amd64 was ... "uneventful" -- it Just Worked. Peace, david --=20 David H. Wolfskill david@catwhisker.org Claims that lack evidence are not a basis for rational decision-making. See http://www.catwhisker.org/~david/publickey.gpg for my public key. --QD1r0wI5tTrL9hxl Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQF8BAEBCgBmBQJYyTfCXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRDQ0I3Q0VGOTE3QTgwMUY0MzA2NEQ3N0Ix NTM5Q0M0MEEwNDlFRTE3AAoJEBU5zECgSe4XwpEH/iyBSTHRv347WSe6K6fPWXya 7VmPAogDOQXNyvkOT6hxltnSYnyOzfcxv0+qWRa13gkQxfQA5Pmi1CsgwKvdNFey IV5EGptwug1gqe69ldSQjfFLWZRIGs1EBCM0XTuKZOzjUaGS8exx6RX7YVv/w/38 DQrnvgEeBmoAZu0vvuRwoAvuk7faeEwVXnyhqXwUuI2ZADKv1jWY8r5e6AUBMJ8e KmiWgn0emjCiAHFvSpMX4VSefXqSQIQgWqr/T2EVqt6WWdoOHxcqusxR9JZTgWXk GWInQwcA4qFL2XapEtafDaNGn4tbI64jXuuLtYscYyCSSwPvvDMZjWog/GcZ1nc= =bQK2 -----END PGP SIGNATURE----- --QD1r0wI5tTrL9hxl-- From owner-freebsd-stable@freebsd.org Wed Mar 15 18:51:59 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 08CC3D0EFA7 for ; Wed, 15 Mar 2017 18:51:59 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-211-173.reflexion.net [208.70.211.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C205DEFE for ; Wed, 15 Mar 2017 18:51:58 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 459 invoked from network); 15 Mar 2017 18:51:51 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 15 Mar 2017 18:51:51 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v8.30.2) with SMTP; Wed, 15 Mar 2017 14:51:51 -0400 (EDT) Received: (qmail 16003 invoked from network); 15 Mar 2017 18:51:51 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 15 Mar 2017 18:51:51 -0000 Received: from [192.168.1.111] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id 796CBEC9026; Wed, 15 Mar 2017 11:51:50 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!] From: Mark Millard In-Reply-To: <345EE889-A429-4C13-9B08-B762DA3F4D71@dsl-only.net> Date: Wed, 15 Mar 2017 11:51:49 -0700 Cc: freebsd-arm , FreeBSD Current , FreeBSD-STABLE Mailing List Content-Transfer-Encoding: quoted-printable Message-Id: References: <201703151315.v2FDFWOr028842@sdf.org> <345EE889-A429-4C13-9B08-B762DA3F4D71@dsl-only.net> To: Scott Bennett X-Mailer: Apple Mail (2.3259) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Mar 2017 18:51:59 -0000 [Something strange happened to the automatic CC: fill-in for my original reply. Also I should have mentioned that for my test program if a variant is made that does not fork the swapping works fine.] On 2017-Mar-15, at 9:37 AM, Mark Millard wrote: > On 2017-Mar-15, at 6:15 AM, Scott Bennett wrote: >=20 >> On Tue, 14 Mar 2017 18:18:56 -0700 Mark Millard >> wrote: >>> On 2017-Mar-14, at 4:44 PM, Bernd Walter = wrote: >>>=20 >>>> On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote: >>>>> [test_check() between the fork and the wait/sleep prevents the >>>>> failure from occurring. Even a small access to the memory at >>>>> that stage prevents the failure. Details follow.] >>>>=20 >>>> Maybe a stupid question, since you might have written it somewhere. >>>> What medium do you swap to? >>>> I've seen broken firmware on microSD cards doing silent data >>>> corruption for some access patterns. >>>=20 >>> The root filesystem is on a USB SSD on a powered hub. >>>=20 >>> Only the kernel is from the microSD card. >>>=20 >>> I have several examples of the USB SSD model and have >>> never observed such problems in any other context. >>>=20 >>> [remainder of irrelevant material deleted --SB] >>=20 >> You gave a very long-winded non-answer to Bernd's question, so = I'll >> repeat it here. What medium do you swap to? >=20 > My wording of: >=20 > The root filesystem is on a USB SSD on a powered hub. >=20 > was definitely poor. It should have explicitly mentioned the > swap partition too: >=20 > The root filesystem and swap partition are both on the same > USB SSD on a powered hub. >=20 > More detail from dmesg -a for usb: >=20 > usbus0: 12Mbps Full Speed USB v1.0 > usbus1: 480Mbps High Speed USB v2.0 > usbus2: 12Mbps Full Speed USB v1.0 > usbus3: 480Mbps High Speed USB v2.0 > ugen0.1: at usbus0 > uhub0: on = usbus0 > ugen1.1: at usbus1 > uhub1: on = usbus1 > ugen2.1: at usbus2 > uhub2: on = usbus2 > ugen3.1: at usbus3 > uhub3: on = usbus3 > . . . > uhub0: 1 port with 1 removable, self powered > uhub2: 1 port with 1 removable, self powered > uhub1: 1 port with 1 removable, self powered > uhub3: 1 port with 1 removable, self powered > ugen3.2: at usbus3 > uhub4 on uhub3 > uhub4: on = usbus3 > uhub4: MTT enabled > uhub4: 4 ports with 4 removable, self powered > ugen3.3: at usbus3 > umass0 on uhub4 > umass0: on = usbus3 > umass0: SCSI over Bulk-Only; quirks =3D 0x0100 > umass0:0:0: Attached to scbus0 > . . . > da0 at umass-sim0 bus 0 scbus0 target 0 lun 0 > da0: Fixed Direct Access SPC-4 SCSI device > da0: Serial Number > da0: 40.000MB/s transfers >=20 > (Edited a bit because there is other material interlaced, even > internal to some lines. Also: I removed the serial number of the > specific example device.) >=20 >> I will further note that any kind of USB device cannot = automatically >> be trusted to behave properly. USB devices are notorious, for = example, >> for momentarily dropping off-line and then immediately reconnecting. = (ZFS >> reacts very poorly to such events, BTW.) This misbehavior can be = caused >> by either processor involved, i.e., the one controlling either the >> upstream or the downstream device. Hubs are really bad about this, = but >> any USB device can be guilty. You may have a defective storage = device, >> its controller may be defective, or any controller in the chain all = the >> way back to the motherboard may be defective or, not defective, but >> corrupted by having been connected to another device with corrupted >> (infected) firmware that tries to flash itself into the firmware = flash >> chips in its potential victim. >> Flash memory chips, spinning disks, or {S,}{D,}RAM chips can be >> defective. Without parity bits, the devices may return bad data and = lie >> about the presence of corrupted data. That, for example, is where = ZFS >> is better than any kind of classical RAID because ZFS keeps checksums = on >> everything, so it has a reasonable chance of detecting corruption = even >> without parity support and, if there is any redundancy in the pool or = the >> data set, fixing the bad data machine. Even having parity generally >> allows only the detection of single-bit errors, but not of repairing = them. >> You should identify where you page/swap to and then try = substituting >> a different device for that function as a test to eliminate the = possibility >> of a bad storage device/controller. If the problem still occurs, = that >> means there still remains the possibility that another controller or = its >> firmware is defective instead. It could be a kernel bug, it is true, = but >> making sure there is no hardware or firmware error occurring is = important, >> and as I say, USB devices should always be considered suspect unless = and >> until proven innocent. >=20 > [FYI: This is a ufs context, not a zfs one.] >=20 > I'm aware of such things. There is no evidence that has resulted in > suggesting the USB devices that I can replace are a problem. Otherwise > I'd not be going down this path. I only have access to the one arm64 > device (a Pine64+ 2GB) so I've no ability to substitution-test what > is on that board. >=20 > It would be neat if some folks used my code to test other arm64 > contexts and reported the results. I'd be very interested. > (This is easier to do on devices that do not have massive > amounts of RAM, which may limit the range of devices or > device configurations that are reasonable to test.) >=20 > There is that other people using other devices have reported > the behavior that started this investigation. I can produce the > behavior that they reported, although I've not seen anyone else > listing specific steps that lead to the problem or ways to tell > if the symptom is going to happen before it actually does. Nor > have I seen any other core dump analysis. (I have bugzilla > submittals 217138 and 217239 tied to symptoms others have > reported as well as this test program material.) >=20 > Also, considering that for my test program I can control which pages > get the zeroed-problem by read-accessing even one byte of any 4K > Byte page that I want to make work normally, doing so in the child > process of the fork, between the fork and the sleep/swap-out, it does > not suggest USB-device-specific behavior. The read-access is changing > the status of the page in some way as far as I can tell. >=20 > (Such read-accesses in the parent process make no difference to the > behavior.) I should have noted another comparison/contrast between having memory corruption and not in my context: I've tried variants of my test program that do not fork but just sleep for 60s to allow me to force the swap-out. I did this before adding fork and before using parital_test_check, for example. I gradually added things apparently involved in the reports others had made until I found a combination that produced a memory corruption test failure. These tests without fork involved find no problems with the memory content after the swap-in. For my test program it appears that fork-before-swap-out or the like is essential to having the problem occur. =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-stable@freebsd.org Wed Mar 15 20:12:23 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1A477D0E779 for ; Wed, 15 Mar 2017 20:12:23 +0000 (UTC) (envelope-from dim@FreeBSD.org) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 040951588 for ; Wed, 15 Mar 2017 20:12:23 +0000 (UTC) (envelope-from dim@FreeBSD.org) Received: by mailman.ysv.freebsd.org (Postfix) id 00707D0E778; Wed, 15 Mar 2017 20:12:23 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 00198D0E777 for ; Wed, 15 Mar 2017 20:12:22 +0000 (UTC) (envelope-from dim@FreeBSD.org) Received: from tensor.andric.com (tensor.andric.com [IPv6:2001:7b8:3a7:1:2d0:b7ff:fea0:8c26]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "tensor.andric.com", Issuer "COMODO RSA Domain Validation Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BD7891587 for ; Wed, 15 Mar 2017 20:12:22 +0000 (UTC) (envelope-from dim@FreeBSD.org) Received: from [IPv6:2001:7b8:3a7::446d:6557:1d94:f441] (unknown [IPv6:2001:7b8:3a7:0:446d:6557:1d94:f441]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by tensor.andric.com (Postfix) with ESMTPSA id 92B932EC65; Wed, 15 Mar 2017 21:12:19 +0100 (CET) From: Dimitry Andric Message-Id: <469F5DD8-7493-41C5-B999-931AFF207A5B@FreeBSD.org> Content-Type: multipart/signed; boundary="Apple-Mail=_7E10409F-AD4A-46AB-B6C7-39081416F8CA"; protocol="application/pgp-signature"; micalg=pgp-sha1 Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: Re: 11-STABLE fails to build with MK_OFED enabled Date: Wed, 15 Mar 2017 21:12:11 +0100 In-Reply-To: Cc: stable@freebsd.org To: Pete French References: X-Mailer: Apple Mail (2.3259) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Mar 2017 20:12:23 -0000 --Apple-Mail=_7E10409F-AD4A-46AB-B6C7-39081416F8CA Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On 15 Mar 2017, at 13:42, Pete French wrote: >=20 >=20 >=20 > = /usr/src/sys/modules/mlx4ib/../../ofed/drivers/infiniband/hw/mlx4/sysfs.c:= 90:22: error: > format specifies type 'unsigned long long *' but the argument has = type > 'u64 *' (aka 'unsigned long *') [-Werror,-Wformat] > sscanf(buf, "%llx", &sysadmin_ag_val); > ~~~~ ^~~~~~~~~~~~~~~~ > %lx >=20 > Fairly trivial to fix obviously - I chnaged it to %lx - but not sure = that would > work on non 64 bit platforms. Hi Pete, I have merged the fix (r310232) in r315328. -Dimitry --Apple-Mail=_7E10409F-AD4A-46AB-B6C7-39081416F8CA Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.30 iEYEARECAAYFAljJoCIACgkQsF6jCi4glqObmACcD3RmB3CL2iK3YpwrjhnzuR7g c1UAn1eWOvOqnKJtoPHQc9OTeCv+xZvA =utxF -----END PGP SIGNATURE----- --Apple-Mail=_7E10409F-AD4A-46AB-B6C7-39081416F8CA-- From owner-freebsd-stable@freebsd.org Wed Mar 15 20:23:50 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AACD0D0EC43 for ; Wed, 15 Mar 2017 20:23:50 +0000 (UTC) (envelope-from delborrell@googlemail.com) Received: from mail-vk0-x22f.google.com (mail-vk0-x22f.google.com [IPv6:2607:f8b0:400c:c05::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 664CF1C5A for ; Wed, 15 Mar 2017 20:23:50 +0000 (UTC) (envelope-from delborrell@googlemail.com) Received: by mail-vk0-x22f.google.com with SMTP id t8so13926509vke.3 for ; Wed, 15 Mar 2017 13:23:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=E0b8VT7R9ESTjKtvYFVyGyeuABXwUoDL0sX6WnVcplo=; b=fc4OjcNIW5quyTs28QkLuTSAu2h26ah2GI9KyczLHDq3DAxdKNc/ne1Qdh1beTiRzz RvNEe3vBRYc/EX8Juw6ipvcO7rp6G+S9tMmbB53JphiT/YRiApSRqmjUXmp62VRd5Rkv qEeNN9YjhRn/bWT4xDmown7Ge/3NxNdNpQTOYFNMyJBeH4XKZamQ5YIEwXePDOac1tNU QHoIUM1mqf56txkEFpwyIC1kiYoar8jsvfTbdORT1U15qG4FDe+tAZjyFqRxEymyamVz J+TlohPAFFlH22gfJ/Wu2tPTonb/BcG65JGed2u9D1tMU3lEcrs10M6KpCafo40E7QUX LoOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=E0b8VT7R9ESTjKtvYFVyGyeuABXwUoDL0sX6WnVcplo=; b=MrE30bQv4l9/07L1V2bJz/0FRW56gndcFFOJk/qiQq7/sKVuYnqgPBgQkpAu6uUyMt ri9Ui+GHqtUJPsl20t1tkPXs3a7YD1YghfRWk6DVeSSRR4IEtzH/FnGazFZ+wR00mK7u HOv1mlK4OHOivVSKSB+pKeMBHBAUm0vGoSWuMUueZ3mRqkCgNiKZ2Dq401eFBPULBUF8 iyVWVsq9dZVij+aixqciRLU1zVy4bwGxUPNoHObLSgLEswSAdIsjcoMQatZcAYaxiYBJ MLebeguPbDhi41F3hOkl3cdaiDxu+bedqsqu2jQ72yFo496K1uOzJSdSD4m9IS2URjhu d+sQ== X-Gm-Message-State: AFeK/H06AzFPRfqewUJTRF+dJmmoecNN4Ko0lcEeg3SauNF5BTXb0TBwB4NC0rCtkQz5XYHhrd6izzxON/KASw== X-Received: by 10.31.196.194 with SMTP id u185mr2036459vkf.71.1489609429629; Wed, 15 Mar 2017 13:23:49 -0700 (PDT) MIME-Version: 1.0 Received: by 10.31.48.213 with HTTP; Wed, 15 Mar 2017 13:23:49 -0700 (PDT) In-Reply-To: <20170315124658.GR1341@albert.catwhisker.org> References: <20170315124658.GR1341@albert.catwhisker.org> From: Steven Borrelli Date: Wed, 15 Mar 2017 15:23:49 -0500 Message-ID: Subject: Re: IPFW kernel build failing on 11.0-STABLE To: David Wolfskill , Steven Borrelli , freebsd-stable@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Mar 2017 20:23:50 -0000 I figured it out. I had been running on a 11.0-STABLE image without /usr/src/*, went to download the src tarball and the closest match I could find on FreeBSD's FTP site was for 11.0-RELEASE. Ended up reloading 11.0-RELEASE (including /usr/src/*) and building 11.0-STABLE with IPFW from that. Worked like a charm. On Wed, Mar 15, 2017 at 7:46 AM, David Wolfskill wrote: > On Tue, Mar 14, 2017 at 10:09:06PM -0500, Steven Borrelli via freebsd-stable wrote: >> Tried building IPFW into my kernel and it failed midway with this: >> .... > > My laptop's kernel (config file named "CANARY") includes IPFW; this > morning's stable/11 update from: > > FreeBSD g1-252.catwhisker.org 11.0-STABLE FreeBSD 11.0-STABLE #280 r315241M/315241:1100509: Tue Mar 14 04:48:24 PDT 2017 root@g1-252.catwhisker.org:/common/S1/obj/usr/src/sys/CANARY amd64 > > to > > FreeBSD g1-252.catwhisker.org 11.0-STABLE FreeBSD 11.0-STABLE #281 r315302M/315302:1100509: Wed Mar 15 04:56:42 PDT 2017 root@g1-252.catwhisker.org:/common/S1/obj/usr/src/sys/CANARY amd64 > > was ... "uneventful" -- it Just Worked. > > Peace, > david > -- > David H. Wolfskill david@catwhisker.org > Claims that lack evidence are not a basis for rational decision-making. > > See http://www.catwhisker.org/~david/publickey.gpg for my public key. From owner-freebsd-stable@freebsd.org Thu Mar 16 06:08:06 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E86EAD0E0B1; Thu, 16 Mar 2017 06:08:06 +0000 (UTC) (envelope-from bennett@sdf.org) Received: from mx.sdf.org (mx.sdf.org [205.166.94.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "ol.sdf.org", Issuer "ol.sdf.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id B57F315F3; Thu, 16 Mar 2017 06:08:06 +0000 (UTC) (envelope-from bennett@sdf.org) Received: from sdf.org (IDENT:bennett@otaku.freeshell.org [205.166.94.9]) by mx.sdf.org (8.15.2/8.14.5) with ESMTPS id v2G67WMk026837 (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256 bits) verified NO); Thu, 16 Mar 2017 06:07:32 GMT Received: (from bennett@localhost) by sdf.org (8.15.2/8.12.8/Submit) id v2G67Vwe023153; Thu, 16 Mar 2017 01:07:31 -0500 (CDT) From: Scott Bennett Message-Id: <201703160607.v2G67Vwe023153@sdf.org> Date: Thu, 16 Mar 2017 01:07:31 -0500 To: markmi@dsl-only.net Subject: Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!] Cc: freebsd-current@freebsd.org, freebsd-arm@freebsd.org, freebsd-stable@freebsd.org References: <201703151315.v2FDFWOr028842@sdf.org> <345EE889-A429-4C13-9B08-B762DA3F4D71@dsl-only.net> In-Reply-To: User-Agent: Heirloom mailx 12.5 6/20/10 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Mar 2017 06:08:07 -0000 Mark Millard wrote: > [Something strange happened to the automatic CC: fill-in for my original > reply. Also I should have mentioned that for my test program if a > variant is made that does not fork the swapping works fine.] > > On 2017-Mar-15, at 9:37 AM, Mark Millard wrote: > > > On 2017-Mar-15, at 6:15 AM, Scott Bennett wrote: > > > >> On Tue, 14 Mar 2017 18:18:56 -0700 Mark Millard > >> wrote: > >>> On 2017-Mar-14, at 4:44 PM, Bernd Walter wrote: > >>> > >>>> On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote: > >>>>> [test_check() between the fork and the wait/sleep prevents the > >>>>> failure from occurring. Even a small access to the memory at > >>>>> that stage prevents the failure. Details follow.] > >>>> > >>>> Maybe a stupid question, since you might have written it somewhere. > >>>> What medium do you swap to? > >>>> I've seen broken firmware on microSD cards doing silent data > >>>> corruption for some access patterns. > >>> > >>> The root filesystem is on a USB SSD on a powered hub. > >>> > >>> Only the kernel is from the microSD card. > >>> > >>> I have several examples of the USB SSD model and have > >>> never observed such problems in any other context. > >>> > >>> [remainder of irrelevant material deleted --SB] > >> > >> You gave a very long-winded non-answer to Bernd's question, so I'll > >> repeat it here. What medium do you swap to? > > > > My wording of: > > > > The root filesystem is on a USB SSD on a powered hub. > > > > was definitely poor. It should have explicitly mentioned the > > swap partition too: > > > > The root filesystem and swap partition are both on the same > > USB SSD on a powered hub. > > > > More detail from dmesg -a for usb: > > > > usbus0: 12Mbps Full Speed USB v1.0 > > usbus1: 480Mbps High Speed USB v2.0 > > usbus2: 12Mbps Full Speed USB v1.0 > > usbus3: 480Mbps High Speed USB v2.0 > > ugen0.1: at usbus0 > > uhub0: on usbus0 > > ugen1.1: at usbus1 > > uhub1: on usbus1 > > ugen2.1: at usbus2 > > uhub2: on usbus2 > > ugen3.1: at usbus3 > > uhub3: on usbus3 > > . . . > > uhub0: 1 port with 1 removable, self powered > > uhub2: 1 port with 1 removable, self powered > > uhub1: 1 port with 1 removable, self powered > > uhub3: 1 port with 1 removable, self powered > > ugen3.2: at usbus3 > > uhub4 on uhub3 > > uhub4: on usbus3 > > uhub4: MTT enabled > > uhub4: 4 ports with 4 removable, self powered > > ugen3.3: at usbus3 > > umass0 on uhub4 > > umass0: on usbus3 > > umass0: SCSI over Bulk-Only; quirks = 0x0100 > > umass0:0:0: Attached to scbus0 > > . . . > > da0 at umass-sim0 bus 0 scbus0 target 0 lun 0 > > da0: Fixed Direct Access SPC-4 SCSI device > > da0: Serial Number > > da0: 40.000MB/s transfers > > > > (Edited a bit because there is other material interlaced, even > > internal to some lines. Also: I removed the serial number of the > > specific example device.) Thank you. That presents a much clearer picture. > > > >> I will further note that any kind of USB device cannot automatically > >> be trusted to behave properly. USB devices are notorious, for example, > >> > >> [reasons why deleted --SB] > >> > >> You should identify where you page/swap to and then try substituting > >> a different device for that function as a test to eliminate the possibility > >> of a bad storage device/controller. If the problem still occurs, that > >> means there still remains the possibility that another controller or its > >> firmware is defective instead. It could be a kernel bug, it is true, but > >> making sure there is no hardware or firmware error occurring is important, > >> and as I say, USB devices should always be considered suspect unless and > >> until proven innocent. > > > > [FYI: This is a ufs context, not a zfs one.] Right. It's only a Pi, after all. :-) > > > > I'm aware of such things. There is no evidence that has resulted in > > suggesting the USB devices that I can replace are a problem. Otherwise > > I'd not be going down this path. I only have access to the one arm64 > > device (a Pine64+ 2GB) so I've no ability to substitution-test what > > is on that board. There isn't even one open port on that hub that you could plug a flash drive into temporarily to be the paging device? You could then try your tests before returning to the normal configuration. If there isn't an open port, then how about plugging a second hub into one of the first hub's ports and moving the displaced device to the second hub? A flash drive could then be plugged in. That kind of configuration is obviously a bad idea for the long run, but just to try your tests it ought to work well enough. (BTW, if a USB storage device containing a paging area drops off=line even momentarily and the system needs to use it, that is the beginning of the end, even though it may take up to a few minutes for everything to lock up. You probably won't be able to do an orderly shutdown, but will instead have to crash it with the power switch. In the case of something like a Pi, this is an unpleasant fact of life, to be sure.) I think I buy your arguments, given the evidence you've collected thus far, including what you've added below. I just like to eliminate possibilities that are much simpler to deal with before facing nastinesses like bugs in the VM subsystem. :-) > > > > It would be neat if some folks used my code to test other arm64 > > contexts and reported the results. I'd be very interested. > > (This is easier to do on devices that do not have massive > > amounts of RAM, which may limit the range of devices or > > device configurations that are reasonable to test.) > > > > There is that other people using other devices have reported > > the behavior that started this investigation. I can produce the > > behavior that they reported, although I've not seen anyone else > > listing specific steps that lead to the problem or ways to tell > > if the symptom is going to happen before it actually does. Nor > > have I seen any other core dump analysis. (I have bugzilla > > submittals 217138 and 217239 tied to symptoms others have > > reported as well as this test program material.) > > > > Also, considering that for my test program I can control which pages > > get the zeroed-problem by read-accessing even one byte of any 4K > > Byte page that I want to make work normally, doing so in the child > > process of the fork, between the fork and the sleep/swap-out, it does > > not suggest USB-device-specific behavior. The read-access is changing > > the status of the page in some way as far as I can tell. > > > > (Such read-accesses in the parent process make no difference to the > > behavior.) > > I should have noted another comparison/contrast between > having memory corruption and not in my context: > > I've tried variants of my test program that do not fork but > just sleep for 60s to allow me to force the swap-out. I > did this before adding fork and before using > parital_test_check, for example. I gradually added things > apparently involved in the reports others had made > until I found a combination that produced a memory > corruption test failure. > > These tests without fork involved find no problems with > the memory content after the swap-in. > > For my test program it appears that fork-before-swap-out > or the like is essential to having the problem occur. > A comment about terminology seems in order here. It bothers me considerably to see you writing "swap out" or "swapping" where it seems like you mean to write "page out" or "paging". A BSD system whose swapping mechanism gets activated has already waded very deeply into the quicksand and frequently cannot be gotten out in a reasonable amount of time even with manual assistance. It is often quicker to crash it, reboot, and wait for the fsck(8) cleanups to complete. Orderly shutdowns, even of the kind that results from a quick poke to the power button, typically get mired in the same mess that already has the system in knots. Also, BSD systems since 3.0BSD, unlike older AT&T (pre-SysVR2.3) systems, do not swap in, just out. A swapped out process, once the system determines that it has adequate resources again to attempt to run the process, will have the interrupted text page paged in and the rest will be paged in by the normal mechanism of page faults and page-in operations. I assume you must already know all this, which is a large part of why it grates on me that you appear to be using the wrong terms. Scott Bennett, Comm. ASMELG, CFIAG ********************************************************************** * Internet: bennett at sdf.org *xor* bennett at freeshell.org * *--------------------------------------------------------------------* * "A well regulated and disciplined militia, is at all times a good * * objection to the introduction of that bane of all free governments * * -- a standing army." * * -- Gov. John Hancock, New York Journal, 28 January 1790 * ********************************************************************** From owner-freebsd-stable@freebsd.org Thu Mar 16 07:52:19 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8DE59D0F928 for ; Thu, 16 Mar 2017 07:52:19 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 7AD521F7D for ; Thu, 16 Mar 2017 07:52:19 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: by mailman.ysv.freebsd.org (Postfix) id 7A21BD0F927; Thu, 16 Mar 2017 07:52:19 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 79C61D0F926 for ; Thu, 16 Mar 2017 07:52:19 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citapm.icyb.net.ua (citapm.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id C943F1F7A for ; Thu, 16 Mar 2017 07:52:18 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citapm.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id JAA09827; Thu, 16 Mar 2017 09:52:09 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1coQCf-0000qU-8U; Thu, 16 Mar 2017 09:52:09 +0200 Subject: Re: moutnroot failing on zpools in Azure after upgrade from 10 to 11 due to lack of waiting for da0 To: stable@FreeBSD.org References: <20170313190728.GA2967@brick> Cc: Pete French From: Andriy Gapon Message-ID: Date: Thu, 16 Mar 2017 09:51:13 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <20170313190728.GA2967@brick> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Mar 2017 07:52:19 -0000 On 13/03/2017 21:07, Edward Tomasz Napierała wrote: > Are you sure the above transcript is right? There are three reasons > I'm asking. First, you'll see the "Root mount waiting" message, > which means the root mount code is, well, waiting for storvsc, exactly > as expected. Second - there is no "Trying to mount root". But most > of all - for some reason the "Mounting failed" is shown _before_ the > "Root mount waiting", and I have no idea how this could ever happen. Edward, your observation is not completely correct. https://www.twisted.org.uk/~pete/914893a3-249e-4a91-851c-f467fc185eec.txt We have: Trying to mount root from zfs:rpool/ROOT/default []... <=== vmbus0: version 3.0 ... storvsc0: on vmbus0 Solaris: NOTICE: Cannot find the pool label for 'rpool' Mounting from zfs:rpool/ROOT/default failed with error 5. <=== Root mount waiting for: storvsc <=== ... So, the kernel attempted to mount the root even before vmbus was attached and, thus, before storvsc appeared and informed the kernel that it might be holding the root. How ZFS was supposed to know that vmbus is ever going to appear? To me this sounds more like a problem with the Hyper-V drivers. -- Andriy Gapon From owner-freebsd-stable@freebsd.org Thu Mar 16 09:07:32 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 417B3D0F9E9 for ; Thu, 16 Mar 2017 09:07:32 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-211-181.reflexion.net [208.70.211.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0310E102F for ; Thu, 16 Mar 2017 09:07:31 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 3026 invoked from network); 16 Mar 2017 09:08:15 -0000 Received: from unknown (HELO mail-cs-02.app.dca.reflexion.local) (10.81.19.2) by 0 (rfx-qmail) with SMTP; 16 Mar 2017 09:08:15 -0000 Received: by mail-cs-02.app.dca.reflexion.local (Reflexion email security v8.30.2) with SMTP; Thu, 16 Mar 2017 05:07:25 -0400 (EDT) Received: (qmail 28883 invoked from network); 16 Mar 2017 09:07:25 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 16 Mar 2017 09:07:25 -0000 Received: from [192.168.1.111] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id 29159EC7ED9; Thu, 16 Mar 2017 02:07:24 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!] From: Mark Millard In-Reply-To: <201703160607.v2G67Vwe023153@sdf.org> Date: Thu, 16 Mar 2017 02:07:23 -0700 Cc: FreeBSD Current , freebsd-arm@freebsd.org, freebsd-stable@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <1019DBB4-5A92-41FE-90B5-63F3F658CF3D@dsl-only.net> References: <201703151315.v2FDFWOr028842@sdf.org> <345EE889-A429-4C13-9B08-B762DA3F4D71@dsl-only.net> <201703160607.v2G67Vwe023153@sdf.org> To: Scott Bennett X-Mailer: Apple Mail (2.3259) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Mar 2017 09:07:32 -0000 On 2017-Mar-15, at 11:07 PM, Scott Bennett wrote: > Mark Millard wrote: >=20 >> [Something strange happened to the automatic CC: fill-in for my = original >> reply. Also I should have mentioned that for my test program if a >> variant is made that does not fork the swapping works fine.] >>=20 >> On 2017-Mar-15, at 9:37 AM, Mark Millard = wrote: >>=20 >>> On 2017-Mar-15, at 6:15 AM, Scott Bennett = wrote: >>>=20 >>>> On Tue, 14 Mar 2017 18:18:56 -0700 Mark Millard >>>> wrote: >>>>> On 2017-Mar-14, at 4:44 PM, Bernd Walter = wrote: >>>>>=20 >>>>>> On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote: >>>>>>> [test_check() between the fork and the wait/sleep prevents the >>>>>>> failure from occurring. Even a small access to the memory at >>>>>>> that stage prevents the failure. Details follow.] >>>>>>=20 >>>>>> Maybe a stupid question, since you might have written it = somewhere. >>>>>> What medium do you swap to? >>>>>> I've seen broken firmware on microSD cards doing silent data >>>>>> corruption for some access patterns. >>>>>=20 >>>>> The root filesystem is on a USB SSD on a powered hub. >>>>>=20 >>>>> Only the kernel is from the microSD card. >>>>>=20 >>>>> I have several examples of the USB SSD model and have >>>>> never observed such problems in any other context. >>>>>=20 >>>>> [remainder of irrelevant material deleted --SB] >>>>=20 >>>> You gave a very long-winded non-answer to Bernd's question, so = I'll >>>> repeat it here. What medium do you swap to? >>>=20 >>> My wording of: >>>=20 >>> The root filesystem is on a USB SSD on a powered hub. >>>=20 >>> was definitely poor. It should have explicitly mentioned the >>> swap partition too: >>>=20 >>> The root filesystem and swap partition are both on the same >>> USB SSD on a powered hub. >>>=20 >>> More detail from dmesg -a for usb: >>>=20 >>> usbus0: 12Mbps Full Speed USB v1.0 >>> usbus1: 480Mbps High Speed USB v2.0 >>> usbus2: 12Mbps Full Speed USB v1.0 >>> usbus3: 480Mbps High Speed USB v2.0 >>> ugen0.1: at usbus0 >>> uhub0: on = usbus0 >>> ugen1.1: at usbus1 >>> uhub1: = on usbus1 >>> ugen2.1: at usbus2 >>> uhub2: on = usbus2 >>> ugen3.1: at usbus3 >>> uhub3: = on usbus3 >>> . . . >>> uhub0: 1 port with 1 removable, self powered >>> uhub2: 1 port with 1 removable, self powered >>> uhub1: 1 port with 1 removable, self powered >>> uhub3: 1 port with 1 removable, self powered >>> ugen3.2: at usbus3 >>> uhub4 on uhub3 >>> uhub4: = on usbus3 >>> uhub4: MTT enabled >>> uhub4: 4 ports with 4 removable, self powered >>> ugen3.3: at usbus3 >>> umass0 on uhub4 >>> umass0: on = usbus3 >>> umass0: SCSI over Bulk-Only; quirks =3D 0x0100 >>> umass0:0:0: Attached to scbus0 >>> . . . >>> da0 at umass-sim0 bus 0 scbus0 target 0 lun 0 >>> da0: Fixed Direct Access SPC-4 SCSI device >>> da0: Serial Number >>> da0: 40.000MB/s transfers >>>=20 >>> (Edited a bit because there is other material interlaced, even >>> internal to some lines. Also: I removed the serial number of the >>> specific example device.) >=20 > Thank you. That presents a much clearer picture. >>>=20 >>>> I will further note that any kind of USB device cannot = automatically >>>> be trusted to behave properly. USB devices are notorious, for = example, >>>>=20 >>>> [reasons why deleted --SB] >>>>=20 >>>> You should identify where you page/swap to and then try = substituting >>>> a different device for that function as a test to eliminate the = possibility >>>> of a bad storage device/controller. If the problem still occurs, = that >>>> means there still remains the possibility that another controller = or its >>>> firmware is defective instead. It could be a kernel bug, it is = true, but >>>> making sure there is no hardware or firmware error occurring is = important, >>>> and as I say, USB devices should always be considered suspect = unless and >>>> until proven innocent. >>>=20 >>> [FYI: This is a ufs context, not a zfs one.] >=20 > Right. It's only a Pi, after all. :-) It is a Pine64+ 2GB, not an rpi3. >>>=20 >>> I'm aware of such things. There is no evidence that has resulted in >>> suggesting the USB devices that I can replace are a problem. = Otherwise >>> I'd not be going down this path. I only have access to the one arm64 >>> device (a Pine64+ 2GB) so I've no ability to substitution-test what >>> is on that board. >=20 > There isn't even one open port on that hub that you could plug a > flash drive into temporarily to be the paging device? Why do you think that I've never tried alternative devices? It is just that the result was no evidence that my usually-in-use SSD is having a special/local problem: the behavior continues across all such contexts when the Pine64+ 2GB is involved. (Again I have not had access to an alternate to the one arm64 board. That limits my substitution testing possibilities.) Why would you expect a Flash drive to be better than another SSD for such testing? (The SSD that I usually use even happens to be a USB 3.0 SSD, capable of USB 3.0 speeds in USB 3.0 contexts. So is the hub that I usually use for that matter.) > You could then > try your tests before returning to the normal configuration. If there > isn't an open port, then how about plugging a second hub into one of > the first hub's ports and moving the displaced device to the second > hub? A flash drive could then be plugged in. That kind of = configuration > is obviously a bad idea for the long run, but just to try your tests = it > ought to work well enough. I have access to more SSDs that I can use than I do to Flash drives. I see no reason to specifically use a Flash drive. > (BTW, if a USB storage device containing a > paging area drops off=3Dline even momentarily and the system needs to = use > it, that is the beginning of the end, even though it may take up to a = few > minutes for everything to lock up. The system does not lock up, even days or weeks later, with having done dozens of experiments that show memory corruption failures over those days. The only processes showing memory corruption so far are those that were the parent or child for a fork that were later swapped out to have zero RES(ident memory) and then even later swapped back in. The context has no such issues. You are inventing problems that do not exist in my context. That is why none of my list submittals mention such problems: they did not occur. > You probably won't be able to do an > orderly shutdown, but will instead have to crash it with the power = switch. > In the case of something like a Pi, this is an unpleasant fact of = life, > to be sure.) Such things did not occur and has nothing to do with my context so far. > I think I buy your arguments, given the evidence you've collected > thus far, including what you've added below. I just like to eliminate > possibilities that are much simpler to deal with before facing = nastinesses > like bugs in the VM subsystem. :-) When I started this I found no evidence of device-specific problems. My investigation activity goes back to long before my list submittals. And I repeat: Other people have reported the symptoms that started this investigation. They did so before I ever started my activities. They were using none of the specific devices that I have access to. Likely the types of devices were frequently even different, such as a rpi3 instead of a Pine64+ 2GB or a different USB drive. I was able to get the symptoms that they reported. >>> It would be neat if some folks used my code to test other arm64 >>> contexts and reported the results. I'd be very interested. >>> (This is easier to do on devices that do not have massive >>> amounts of RAM, which may limit the range of devices or >>> device configurations that are reasonable to test.) >>>=20 >>> There is that other people using other devices have reported >>> the behavior that started this investigation. I can produce the >>> behavior that they reported, although I've not seen anyone else >>> listing specific steps that lead to the problem or ways to tell >>> if the symptom is going to happen before it actually does. Nor >>> have I seen any other core dump analysis. (I have bugzilla >>> submittals 217138 and 217239 tied to symptoms others have >>> reported as well as this test program material.) >>>=20 >>> Also, considering that for my test program I can control which pages >>> get the zeroed-problem by read-accessing even one byte of any 4K >>> Byte page that I want to make work normally, doing so in the child >>> process of the fork, between the fork and the sleep/swap-out, it = does >>> not suggest USB-device-specific behavior. The read-access is = changing >>> the status of the page in some way as far as I can tell. >>>=20 >>> (Such read-accesses in the parent process make no difference to the >>> behavior.) >>=20 >> I should have noted another comparison/contrast between >> having memory corruption and not in my context: >>=20 >> I've tried variants of my test program that do not fork but >> just sleep for 60s to allow me to force the swap-out. I >> did this before adding fork and before using >> parital_test_check, for example. I gradually added things >> apparently involved in the reports others had made >> until I found a combination that produced a memory >> corruption test failure. >>=20 >> These tests without fork involved find no problems with >> the memory content after the swap-in. >>=20 >> For my test program it appears that fork-before-swap-out >> or the like is essential to having the problem occur. >>=20 > A comment about terminology seems in order here. It bothers > me considerably to see you writing "swap out" or "swapping" where > it seems like you mean to write "page out" or "paging". A BSD > system whose swapping mechanism gets activated has already waded > very deeply into the quicksand and frequently cannot be gotten out > in a reasonable amount of time even with manual assistance. It is > often quicker to crash it, reboot, and wait for the fsck(8) cleanups > to complete. Orderly shutdowns, even of the kind that results from > a quick poke to the power button, typically get mired in the same > mess that already has the system in knots. Also, BSD systems since > 3.0BSD, unlike older AT&T (pre-SysVR2.3) systems, do not swap in, > just out. A swapped out process, once the system determines that it > has adequate resources again to attempt to run the process, will have > the interrupted text page paged in and the rest will be paged in by > the normal mechanism of page faults and page-in operations. I assume > you must already know all this, which is a large part of why it grates > on me that you appear to be using the wrong terms. You apparently did not read any of the material about how the test is done or are unfamiliar with what "stress -m 1 --vm-bytes 1800M" does when there is only 2GB of RAM. I am deliberately inducing swapping in other processes, including the 2 from my test program (after the fork), not just paging. (stress is a port, not part of the base system.) When I say swap-out and swap-in I mean it. =46rom the source code of my test program: sleep(60); // During this manually force this process to // swap out. I use something like: // stress -m 1 --vm-bytes 1800M // in another shell and ^C'ing it after top // shows the swapped status desired. 1800M // just happened to work on the Pine64+ 2GB // that I was using. I watch with top -PCwaopid . That type of stress run uses about 1.8 GiBytes after a bit, which is enough to cause the swapping of other processes, including the two that I am testing (post-fork). (Some RAM is in use already before the stress run, which explains not needing 2 GiBytes to be in use by stress.) Look at a "top -PCwaopid" display: there are columns for RES(ident memory) and SWAP. I cause my 2 test processes to show zero RES and everything under SWAP, starting sometime during the 60s sleep/wait. Why would I cause swapping? Because buildworld causes such swap-outs at times when there is only 2GBytes of RAM, including processes that forked earlier, and as a result the corrupted memory problems show up later in some processes that were swapped out at the time. The build eventually stops for process failures tied to the corruptions of memory in the failing processes. (At least that is what my testing strongly suggests.) But that is a very complicated context to use for analysis or testing of the problem. My test program is vastly simpler and easier/quicker to set up and test when used with stress as well. Such was the kind of thing I was trying to find. I want the Pine64+ 2GB to work well enough to be able to have buildworld (-j 4) complete correctly without having to restart the build --even when everything has to be rebuilt. So I'm trying to find and provide enough evidence to help someone fix the problems that are observed to block such buildworld activity. Again: others have reported such arm64 problems on the lists before I ever got into this activity. The evidence is that the issues are not a local property of my environment. Swapping is supposed to work. I can do buildworld (-j 4) on armv6 (really -mcpu=3Dcortex-a7 so armv7-a) and the swapping it causes works fine. This is true for both a bpim3 (2 GiBytes of RAM) and a rpi2 (1 GiByte of RAM so even more swapping). On a powerpc64 with 16 GiBytes I've built things that caused 26 GiBytes of swap to be in use some of the time (during 4 ld's running in parallel), with lots of processes having zero for RES(ident memory) and all their space listed under SWAP in a "top -PCwaopid" display. This too has no problems with swapping of previously forked processes (or of any other processes). For the likes of a Pine64+ 2GB to be "self hosted"=20 for source-code based updates, swapping of previously forked processes must work and currently such swapping is unreliable. =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-stable@freebsd.org Thu Mar 16 10:28:13 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E093CD0E730 for ; Thu, 16 Mar 2017 10:28:13 +0000 (UTC) (envelope-from filippomore@platinum.ca) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id C46A51FFD for ; Thu, 16 Mar 2017 10:28:13 +0000 (UTC) (envelope-from filippomore@platinum.ca) Received: by mailman.ysv.freebsd.org (Postfix) id C0C1FD0E72E; Thu, 16 Mar 2017 10:28:13 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C06B7D0E72C for ; Thu, 16 Mar 2017 10:28:13 +0000 (UTC) (envelope-from filippomore@platinum.ca) Received: from smtprelay.hostedemail.com (smtprelay0186.hostedemail.com [216.40.44.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 99A8A1FFC for ; Thu, 16 Mar 2017 10:28:13 +0000 (UTC) (envelope-from filippomore@platinum.ca) Received: from smtprelay.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by smtpgrave06.hostedemail.com (Postfix) with ESMTP id D0E731732FE for ; Thu, 16 Mar 2017 10:28:05 +0000 (UTC) Received: from filter.hostedemail.com (clb03-v110.bra.tucows.net [216.40.38.60]) by smtprelay02.hostedemail.com (Postfix) with ESMTP id 3933912BA12 for ; Thu, 16 Mar 2017 10:27:59 +0000 (UTC) X-Session-Marker: 6A7573746A6163716940706C6174696E756D2E6361 X-Spam-Summary: 50, 0, 0, , d41d8cd98f00b204, filippomore@platinum.ca, :, RULES_HIT:41:355:379:541:555:967:988:989:1260:1263:1277:1311:1313:1314:1345:1381:1513:1515:1516:1518:1521:1525:1536:1569:1593:1594:1711:1714:1730:1747:1777:1792:2393:2525:2560:2563:2682:2685:2859:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3876:3877:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:5007:6114:6261:6642:8599:8829:8985:8987:9025:10004:10400:10848:11658:11914:12043:12196:12682:12903:13019:13025:13069:13071:13099:13110:13311:13357:13439:14180:14725:14764:14849:21060:21080:21227:21554, 0, RBL:none, CacheIP:none, Bayesian:0.5, 0.5, 0.5, Netcheck:none, DomainCache:0, MSF:not bulk, SPF:fn, MSBL:0, DNSBL:none, Custom_rules:0:0:0, LFtime:2, LUA_SUMMARY:none X-HE-Tag: wheel52_b5b04eba922c X-Filterd-Recvd-Size: 656 Received: from mail.platinum.ca (unknown [177.54.77.214]) (Authenticated sender: justjacqi@platinum.ca) by omf05.hostedemail.com (Postfix) with ESMTPA for ; Thu, 16 Mar 2017 10:27:58 +0000 (UTC) From: "filippomore" Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (1.0) Subject: Message-Id: Date: Thu, 16 Mar 2017 10:27:57 +0000 To: "stable" X-Mailer: iPhone Mail (13Y825) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Mar 2017 10:28:14 -0000 Hi Stable http://www.imoniufilmai.lt/downloadfile.php?method=3Dw2t50guystra8de7 From owner-freebsd-stable@freebsd.org Thu Mar 16 10:47:30 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 52FD9D0F0E0 for ; Thu, 16 Mar 2017 10:47:30 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 4001F10B0 for ; Thu, 16 Mar 2017 10:47:30 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: by mailman.ysv.freebsd.org (Postfix) id 3C575D0F0DF; Thu, 16 Mar 2017 10:47:30 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3BFE3D0F0DE for ; Thu, 16 Mar 2017 10:47:30 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from constantine.ingresso.co.uk (unknown [IPv6:2a02:b90:3002:411::3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id DFFC210AF; Thu, 16 Mar 2017 10:47:29 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from dilbert.london-internal.ingresso.co.uk ([10.64.50.6]) by constantine.ingresso.co.uk with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.88 (FreeBSD)) (envelope-from ) id 1coSwH-000MdS-Pp; Thu, 16 Mar 2017 10:47:25 +0000 Subject: Re: 11-STABLE fails to build with MK_OFED enabled To: Dimitry Andric References: <469F5DD8-7493-41C5-B999-931AFF207A5B@FreeBSD.org> Cc: stable@freebsd.org From: Pete French Message-ID: <09213e48-9ee5-20f0-ed10-de11e45fbb37@ingresso.co.uk> Date: Thu, 16 Mar 2017 10:47:25 +0000 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <469F5DD8-7493-41C5-B999-931AFF207A5B@FreeBSD.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Mar 2017 10:47:30 -0000 Thanks - that is a better fix than my hack ;-) On 03/15/17 20:12, Dimitry Andric wrote: > On 15 Mar 2017, at 13:42, Pete French wrote: >> >> >> >> /usr/src/sys/modules/mlx4ib/../../ofed/drivers/infiniband/hw/mlx4/sysfs.c:90:22: error: >> format specifies type 'unsigned long long *' but the argument has type >> 'u64 *' (aka 'unsigned long *') [-Werror,-Wformat] >> sscanf(buf, "%llx", &sysadmin_ag_val); >> ~~~~ ^~~~~~~~~~~~~~~~ >> %lx >> >> Fairly trivial to fix obviously - I chnaged it to %lx - but not sure that would >> work on non 64 bit platforms. > > Hi Pete, > > I have merged the fix (r310232) in r315328. > > -Dimitry > From owner-freebsd-stable@freebsd.org Thu Mar 16 11:18:15 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 64080D0FB98 for ; Thu, 16 Mar 2017 11:18:15 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 5059410F1 for ; Thu, 16 Mar 2017 11:18:15 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: by mailman.ysv.freebsd.org (Postfix) id 4FAF9D0FB97; Thu, 16 Mar 2017 11:18:15 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4F50FD0FB96 for ; Thu, 16 Mar 2017 11:18:15 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from constantine.ingresso.co.uk (unknown [IPv6:2a02:b90:3002:411::3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1EE5610F0; Thu, 16 Mar 2017 11:18:15 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from dilbert.london-internal.ingresso.co.uk ([10.64.50.6]) by constantine.ingresso.co.uk with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.88 (FreeBSD)) (envelope-from ) id 1coTQ5-0006KP-Qb; Thu, 16 Mar 2017 11:18:13 +0000 Subject: Re: moutnroot failing on zpools in Azure after upgrade from 10 to 11 due to lack of waiting for da0 To: Andriy Gapon , stable@FreeBSD.org References: <20170313190728.GA2967@brick> From: Pete French Message-ID: <5da1d6d9-27fd-4b53-4d8f-8fe52b5ac846@ingresso.co.uk> Date: Thu, 16 Mar 2017 11:18:13 +0000 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Mar 2017 11:18:15 -0000 > So, the kernel attempted to mount the root even before vmbus was attached and, > thus, before storvsc appeared and informed the kernel that it might be holding > the root. > How ZFS was supposed to know that vmbus is ever going to appear? > To me this sounds more like a problem with the Hyper-V drivers. I am currently running with the patch which waits for a number fo seconds and retries the mount, and that appears t fix it. However I dont really like rnning a patched OS. How would I set about reporting this to Microsoft and getting it fixed, or getting the timeoutpatch commited ? Preferably both, as the timeout patch is generally a useful thing to have working for ZFS I think. -pete. From owner-freebsd-stable@freebsd.org Thu Mar 16 11:32:41 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 39385D0D25A for ; Thu, 16 Mar 2017 11:32:41 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 256561B0E for ; Thu, 16 Mar 2017 11:32:41 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: by mailman.ysv.freebsd.org (Postfix) id 24C10D0D258; Thu, 16 Mar 2017 11:32:41 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 24673D0D257 for ; Thu, 16 Mar 2017 11:32:41 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citapm.icyb.net.ua (citapm.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 75B711B0B for ; Thu, 16 Mar 2017 11:32:40 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citapm.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA10486; Thu, 16 Mar 2017 13:32:37 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1coTe1-00011T-N8; Thu, 16 Mar 2017 13:32:37 +0200 Subject: Re: moutnroot failing on zpools in Azure after upgrade from 10 to 11 due to lack of waiting for da0 To: Pete French , stable@FreeBSD.org References: <20170313190728.GA2967@brick> <5da1d6d9-27fd-4b53-4d8f-8fe52b5ac846@ingresso.co.uk> From: Andriy Gapon Message-ID: <6b397d83-e802-78ca-e24e-6d0713f07212@FreeBSD.org> Date: Thu, 16 Mar 2017 13:31:36 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <5da1d6d9-27fd-4b53-4d8f-8fe52b5ac846@ingresso.co.uk> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Mar 2017 11:32:41 -0000 On 16/03/2017 13:18, Pete French wrote: > >> So, the kernel attempted to mount the root even before vmbus was attached and, >> thus, before storvsc appeared and informed the kernel that it might be holding >> the root. >> How ZFS was supposed to know that vmbus is ever going to appear? >> To me this sounds more like a problem with the Hyper-V drivers. > > I am currently running with the patch which waits for a number fo seconds and > retries the mount, and that appears t fix it. However I dont really like rnning > a patched OS. How would I set about reporting this to Microsoft and getting it > fixed, or getting the timeoutpatch commited ? Preferably both, as the timeout > patch is generally a useful thing to have working for ZFS I think. I don't like the delay and retry approach at all. Imagine that you told the kernel that you want to mount your root from a ZFS pool which is on a USB driver which you have already thrown out. Should the kernel just keep waiting for that pool to appear? Microsoft provides support for FreeBSD Hyper-V drivers. Please try to discuss this problem on virtualization@ or with sephe@ directly. -- Andriy Gapon From owner-freebsd-stable@freebsd.org Thu Mar 16 12:06:24 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C80B2D0EC6D for ; Thu, 16 Mar 2017 12:06:24 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id B398C1726 for ; Thu, 16 Mar 2017 12:06:24 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: by mailman.ysv.freebsd.org (Postfix) id B011FD0EC6C; Thu, 16 Mar 2017 12:06:24 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AFC2AD0EC6B for ; Thu, 16 Mar 2017 12:06:24 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from constantine.ingresso.co.uk (unknown [IPv6:2a02:b90:3002:411::3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 30BBC1725; Thu, 16 Mar 2017 12:06:24 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from dilbert.london-internal.ingresso.co.uk ([10.64.50.6] helo=dilbert.ingresso.co.uk) by constantine.ingresso.co.uk with esmtps (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.88 (FreeBSD)) (envelope-from ) id 1coUAY-0007mN-AU; Thu, 16 Mar 2017 12:06:14 +0000 Received: from petefrench by dilbert.ingresso.co.uk with local (Exim 4.89 (FreeBSD)) (envelope-from ) id 1coUAY-0000ou-8i; Thu, 16 Mar 2017 12:06:14 +0000 To: avg@FreeBSD.org, petefrench@ingresso.co.uk, stable@FreeBSD.org Subject: Re: moutnroot failing on zpools in Azure after upgrade from 10 to 11 due to lack of waiting for da0 In-Reply-To: <6b397d83-e802-78ca-e24e-6d0713f07212@FreeBSD.org> Message-Id: From: Pete French Date: Thu, 16 Mar 2017 12:06:14 +0000 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Mar 2017 12:06:24 -0000 > I don't like the delay and retry approach at all. Its not ideal, but it is what we do for UFS after all... > Imagine that you told the kernel that you want to mount your root from a ZFS > pool which is on a USB driver which you have already thrown out. Should the > kernel just keep waiting for that pool to appear? I'm not talking about an infinite loop here, just making it honour the 'vfs.mountroot.timeout' setting like it does ofr UFS. So it should wait for the timeout I have set and then proceed as it would if there had been no timeout. Default behaviout is for it to behave as it does now, its onyl when you need the retry that you enable it. Right now this works for UFS, but not for ZFS, which is an inconsistency that I dont like, and also means I am being forced down a UFS root path if I require this. > Microsoft provides support for FreeBSD Hyper-V drivers. > Please try to discuss this problem on virtualization@ or with sephe@ directly. OK, will do, thanks... -pete. From owner-freebsd-stable@freebsd.org Thu Mar 16 12:46:11 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A1036D06C94 for ; Thu, 16 Mar 2017 12:46:11 +0000 (UTC) (envelope-from gabriel.araujo.arauj@terra.com.br) Received: from terra.com (if10-mail-fb01-mia.mta.terra.com [98.142.233.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7D4051D52 for ; Thu, 16 Mar 2017 12:46:10 +0000 (UTC) (envelope-from gabriel.araujo.arauj@terra.com.br) Received: from mail-smtp04-mia.tpn.terra.com (unknown [10.235.200.59]) by mail-sr-ugly01-mia.tpn.terra.com (Postfix) with ESMTP id 422AFC2E5F85 for ; Thu, 16 Mar 2017 11:09:23 +0000 (UTC) X-Terra-Karma: -2% X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.98.4 at mail-smtp04-mia.tpn.terra.com X-Terra-Hash: bc3ccf1c079c380e7ac19639a7c8bc4f Received: from smtp.terra.com.br (pas38-7-83-153-93-169.fbx.proxad.net [83.153.93.169]) (authenticated user gabriel.araujo.arauj@terra.com.br) by mail-smtp04-mia.tpn.terra.com (Postfix) with ESMTPA id 857E14001253 for ; Thu, 16 Mar 2017 11:09:21 +0000 (UTC) From: "filippomore" Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (1.0) Subject: Message-Id: <54FA9B2C-3439-4684-811F-66FBE23A579E@terra.com.br> Date: Thu, 16 Mar 2017 11:09:21 +0000 To: "freebsd stable" X-Mailer: iPhone Mail (14A551) X-CMAE-Score: 0 X-CMAE-Analysis: v=2.2 cv=S+ip+MkP c=1 sm=1 tr=0 a=GYYOYL9CbVzNrIr0Vu6V9g==:117 a=GYYOYL9CbVzNrIr0Vu6V9g==:17 a=kj9zAlcOel0A:10 a=6Iz7jQTuP9IA:10 a=1iKVNUK3AAAA:8 a=vgx9uugD8SIp0ZiDLccA:9 a=CjuIK1q_8ugA:10 a=k_f5bZDJ2ogA:10 a=_pVdrYgc9PAA:10 a=PQiQ6GUGS8Ng5cw-Xzze:22 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Mar 2017 12:46:11 -0000 sup Freebsd http://www.kullaniciyorum.com/timeout.php?arm=3Dvh2wbdd50v8mn7gz filippomore From owner-freebsd-stable@freebsd.org Thu Mar 16 16:01:37 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 24994D0F085 for ; Thu, 16 Mar 2017 16:01:37 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 0301C1142 for ; Thu, 16 Mar 2017 16:01:37 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mailman.ysv.freebsd.org (Postfix) id 02114D0F084; Thu, 16 Mar 2017 16:01:37 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 01B12D0F083 for ; Thu, 16 Mar 2017 16:01:37 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-it0-x234.google.com (mail-it0-x234.google.com [IPv6:2607:f8b0:4001:c0b::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C6E22113E for ; Thu, 16 Mar 2017 16:01:36 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-it0-x234.google.com with SMTP id g138so49170333itb.0 for ; Thu, 16 Mar 2017 09:01:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=oZmRsnBGn0zPaE2QXetdFys2m1AtFZhubF+gVNfUbcE=; b=i/lC6sPEHrEhnN1KNHRSyBG/DtxQ/5QRy5NdwcaoF3nETLJdRLQIWPnL1uNhnpvG/W 7xqhuzqF6CQTw33wO0r2R3KhdvaqdQQs1N4l/5HuKjt0wvVY45MOk7CCiyyybqY2kZYu zWebdtnvAevJKxI5FIxk52D1LNA+51hxvs5nJTenYL050Jak6VCUtkhwfQSmmKjQwqmI d4CgwM3wgcs2XnyYGC7c78ji/mkd4LZHGKnYpkCbNxs7NkfaHgV7EHw+eow1hlVMMd6I 6HfIFMk+D4bShWCIov18v2JWngNzyQI9RaMTJyMS+gEF0pLzM4awRfdPgV2bIhOgN0ma XNWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=oZmRsnBGn0zPaE2QXetdFys2m1AtFZhubF+gVNfUbcE=; b=N019ocPefj0OT7sa8A8QIJVGCrQkMtmpH/2B/b7KW2hBcSVfcjMs8nB0WyePIkWRTY 3OA2CL/IMVMl9WqRqXJuU6ZTjlfA/YmhQrSzN1rjV+bLlvSyYBUMI/lAEJmWJ/z84jfk wEFzBwExIKHtLjlfLRK013Gg1uFmGWW72Kpo+NOVCQUfNsY0UaWaVBAZu6aCzD+GxBFC Rq77BuAtKi36UN8G7pSi4vtF+AGaMzrrSr/2PGQliOFLvt1UI2c4XN39cn8rhmfbaSw3 J9OEIeOUx0SSTkry2zcFbSlzu+9Y+K4eu9WQ/SouBh9R1fcNnoRodYeAvX1qdCayhIU1 VzvQ== X-Gm-Message-State: AFeK/H2Bc641SSbkUqvtcd540fGK1DJUG2ANe1XJlHR62McpmhihyrfClrKFGXQEPdl8uqeJ15T2vJez2BO+Gw== X-Received: by 10.107.174.220 with SMTP id n89mr11693976ioo.166.1489680093844; Thu, 16 Mar 2017 09:01:33 -0700 (PDT) MIME-Version: 1.0 Sender: wlosh@bsdimp.com Received: by 10.79.134.129 with HTTP; Thu, 16 Mar 2017 09:01:33 -0700 (PDT) X-Originating-IP: [2607:fb10:7021:1::ad] In-Reply-To: References: <6b397d83-e802-78ca-e24e-6d0713f07212@FreeBSD.org> From: Warner Losh Date: Thu, 16 Mar 2017 10:01:33 -0600 X-Google-Sender-Auth: uz-sR8VUeqlE6YQ9dj2u8Xpx1io Message-ID: Subject: Re: moutnroot failing on zpools in Azure after upgrade from 10 to 11 due to lack of waiting for da0 To: Pete French Cc: Andriy Gapon , stable@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Mar 2017 16:01:37 -0000 On Thu, Mar 16, 2017 at 6:06 AM, Pete French wrote: >> I don't like the delay and retry approach at all. > > Its not ideal, but it is what we do for UFS after all... > >> Imagine that you told the kernel that you want to mount your root from a ZFS >> pool which is on a USB driver which you have already thrown out. Should the >> kernel just keep waiting for that pool to appear? > > I'm not talking about an infinite loop here, just making it honour > the 'vfs.mountroot.timeout' setting like it does ofr UFS. So it > should wait for the timeout I have set and then proceed as it would if > there had been no timeout. Default behaviout is for it to behave as it > does now, its onyl when you need the retry that you enable it. Put another way: With UFS is keeps retrying until the timeout expires. If the first try succeeds, the boot is immediate. > Right now this works for UFS, but not for ZFS, which is an inconsistency > that I dont like, and also means I am being forced down a UFS root > path if I require this. Yes. ZFS is special, but I don't think the assumptions behind its specialness are quite right: /* * In case of ZFS and NFS we don't have a way to wait for * specific device. Also do the wait if the user forced that * behaviour by setting vfs.root_mount_always_wait=1. */ if (strcmp(fs, "zfs") == 0 || strstr(fs, "nfs") != NULL || dev[0] == '\0' || root_mount_always_wait != 0) { vfs_mountroot_wait(); return (0); } So you can make it always succeed by forcing the wait, but that's lame... From owner-freebsd-stable@freebsd.org Thu Mar 16 16:04:29 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9572DD0F2C6 for ; Thu, 16 Mar 2017 16:04:29 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 734E114C6 for ; Thu, 16 Mar 2017 16:04:29 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mailman.ysv.freebsd.org (Postfix) id 72B39D0F2C5; Thu, 16 Mar 2017 16:04:29 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 725CAD0F2C4 for ; Thu, 16 Mar 2017 16:04:29 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-it0-x244.google.com (mail-it0-x244.google.com [IPv6:2607:f8b0:4001:c0b::244]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3A7F314C4 for ; Thu, 16 Mar 2017 16:04:29 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-it0-x244.google.com with SMTP id u69so7676970ita.3 for ; Thu, 16 Mar 2017 09:04:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=/k6LudOukSOZvTaJTp2IV6yhnOL+n4EJL/YQAjadGWM=; b=eWAu8osUQvIubuOeOY9LafstAh6D7p6fkS1dJD9Uk9t36a8VJrSaaBERYUZFTcLAHl ZsDhP8iIR+snmKJsLoffNtOmU8vqE6VWDMpQNwQddpW9W87h6Gic54pc3wWtnAozXCon +k466NLou9GT4oahJppEgR23UsavL2H2p7ggbDBylFo93XIQEVCuy7gGQQnwA21DN8pN AsRh/YCAm8Lf2GK1LJPaSllOvXaUGspdPUJGistBVyzEVY/1FhihD3ErOxXbRqO3UQye q7PlrPBEKbbhfk3BbG/nexL2MOIg8+5Ci8ueamLxx19qVOP2wOPZv1iet/mPE5jucihR hcfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=/k6LudOukSOZvTaJTp2IV6yhnOL+n4EJL/YQAjadGWM=; b=AUUcY6ZN/U03AnVSnh2RE7g3wUxg6A79U48NYEDfBxEsNQ3NF0Xw0cVF36sD2c1pLf CmqeU/7xrLsiFqNHLs+sFKrJDK4Hv/9yX3zU2LwWMcgJah65sYnoC4AOOfop7AjB6tJ0 AgA8U4uQKO/eadMYVHoUAk/eG/oMMRuCmVx3qH6kVpG3JqS/qhlmWlUGIFzWjt+AqnQ7 a4MEu5aoaIiULxiunyB129wpkMwSCPr+gwKmcSw0jg7zUZlK7fXLRO60Rv+U/fOTsx+E 9UMqUmFP0B9E1O+oZXVCnYGcgagTjtuX6XgZ64MzNd9W9uTt2xWMT0Lp6xQ9zygPH++I lT1g== X-Gm-Message-State: AFeK/H1qapUzcAx2E+9efKXymQ20m9nqPMf7NmKz2CVsSny7/d4zRw0fqoqTdsAuU9A0FVtekGumlPUCUNh05A== X-Received: by 10.107.198.193 with SMTP id w184mr11044452iof.19.1489680268458; Thu, 16 Mar 2017 09:04:28 -0700 (PDT) MIME-Version: 1.0 Sender: wlosh@bsdimp.com Received: by 10.79.134.129 with HTTP; Thu, 16 Mar 2017 09:04:27 -0700 (PDT) X-Originating-IP: [2607:fb10:7021:1::ad] In-Reply-To: References: <6b397d83-e802-78ca-e24e-6d0713f07212@FreeBSD.org> From: Warner Losh Date: Thu, 16 Mar 2017 10:04:27 -0600 X-Google-Sender-Auth: 0Hp18A9KOqPwy5N30aO1Vq_6coc Message-ID: Subject: Re: moutnroot failing on zpools in Azure after upgrade from 10 to 11 due to lack of waiting for da0 To: Pete French Cc: Andriy Gapon , stable@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Mar 2017 16:04:29 -0000 [[ stupid mouse ]] On Thu, Mar 16, 2017 at 10:01 AM, Warner Losh wrote: > On Thu, Mar 16, 2017 at 6:06 AM, Pete French wrote: >>> I don't like the delay and retry approach at all. >> >> Its not ideal, but it is what we do for UFS after all... >> >>> Imagine that you told the kernel that you want to mount your root from a ZFS >>> pool which is on a USB driver which you have already thrown out. Should the >>> kernel just keep waiting for that pool to appear? >> >> I'm not talking about an infinite loop here, just making it honour >> the 'vfs.mountroot.timeout' setting like it does ofr UFS. So it >> should wait for the timeout I have set and then proceed as it would if >> there had been no timeout. Default behaviout is for it to behave as it >> does now, its onyl when you need the retry that you enable it. > > Put another way: With UFS is keeps retrying until the timeout expires. > If the first try succeeds, the boot is immediate. > >> Right now this works for UFS, but not for ZFS, which is an inconsistency >> that I dont like, and also means I am being forced down a UFS root >> path if I require this. > > Yes. ZFS is special, but I don't think the assumptions behind its > specialness are quite right: > > /* > * In case of ZFS and NFS we don't have a way to wait for > * specific device. Also do the wait if the user forced that > * behaviour by setting vfs.root_mount_always_wait=1. > */ > if (strcmp(fs, "zfs") == 0 || strstr(fs, "nfs") != NULL || > dev[0] == '\0' || root_mount_always_wait != 0) { > vfs_mountroot_wait(); > return (0); > } > > So you can make it always succeed by forcing the wait, but that's lame... Later we check to see if a device by a given name is present. Since ZFS doesn't present its pool names as devices to the rest of the system, that's not going to work quite right. That's the real reason that ZFS is special. It isn't that we can't wait for individual devices, it's that we can't wait for the 'mount token' that we use for what to mount to be 'ready'. NFS suffers from the same problem, but since its device is always ready since it's stateless, it isn't as noticeable. Warner From owner-freebsd-stable@freebsd.org Sat Mar 18 13:26:58 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 15DCED12C7E for ; Sat, 18 Mar 2017 13:26:58 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-211-173.reflexion.net [208.70.211.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id BF56E1791 for ; Sat, 18 Mar 2017 13:26:57 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 14352 invoked from network); 18 Mar 2017 13:27:41 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 18 Mar 2017 13:27:41 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v8.30.2) with SMTP; Sat, 18 Mar 2017 09:26:50 -0400 (EDT) Received: (qmail 28762 invoked from network); 18 Mar 2017 13:26:49 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 18 Mar 2017 13:26:49 -0000 Received: from [192.168.1.111] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id 274FEEC805D; Sat, 18 Mar 2017 06:26:49 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!] From: Mark Millard In-Reply-To: <1019DBB4-5A92-41FE-90B5-63F3F658CF3D@dsl-only.net> Date: Sat, 18 Mar 2017 06:26:48 -0700 Cc: freebsd-arm , FreeBSD Current , FreeBSD-STABLE Mailing List Content-Transfer-Encoding: quoted-printable Message-Id: <826D525A-BDAF-4352-AD9F-A238B797BFAF@dsl-only.net> References: <201703151315.v2FDFWOr028842@sdf.org> <345EE889-A429-4C13-9B08-B762DA3F4D71@dsl-only.net> <201703160607.v2G67Vwe023153@sdf.org> <1019DBB4-5A92-41FE-90B5-63F3F658CF3D@dsl-only.net> To: Scott Bennett X-Mailer: Apple Mail (2.3259) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Mar 2017 13:26:58 -0000 [Summary: I've now tested on a rpi3 in addition to a pine64+ 2GB. Both contexts show the problem.] On 2017-Mar-16, at 2:07 AM, Mark Millard wrote: > On 2017-Mar-15, at 11:07 PM, Scott Bennett wrote: >=20 >> Mark Millard wrote: >>=20 >>> [Something strange happened to the automatic CC: fill-in for my = original >>> reply. Also I should have mentioned that for my test program if a >>> variant is made that does not fork the swapping works fine.] >>>=20 >>> On 2017-Mar-15, at 9:37 AM, Mark Millard = wrote: >>>=20 >>>> On 2017-Mar-15, at 6:15 AM, Scott Bennett = wrote: >>>>=20 >>>>> On Tue, 14 Mar 2017 18:18:56 -0700 Mark Millard >>>>> wrote: >>>>>> On 2017-Mar-14, at 4:44 PM, Bernd Walter = wrote: >>>>>>=20 >>>>>>> On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote: >>>>>>>> [test_check() between the fork and the wait/sleep prevents the >>>>>>>> failure from occurring. Even a small access to the memory at >>>>>>>> that stage prevents the failure. Details follow.] >>>>>>>=20 >>>>>>> Maybe a stupid question, since you might have written it = somewhere. >>>>>>> What medium do you swap to? >>>>>>> I've seen broken firmware on microSD cards doing silent data >>>>>>> corruption for some access patterns. >>>>>>=20 >>>>>> The root filesystem is on a USB SSD on a powered hub. >>>>>>=20 >>>>>> Only the kernel is from the microSD card. >>>>>>=20 >>>>>> I have several examples of the USB SSD model and have >>>>>> never observed such problems in any other context. >>>>>>=20 >>>>>> [remainder of irrelevant material deleted --SB] >>>>>=20 >>>>> You gave a very long-winded non-answer to Bernd's question, so = I'll >>>>> repeat it here. What medium do you swap to? >>>>=20 >>>> My wording of: >>>>=20 >>>> The root filesystem is on a USB SSD on a powered hub. >>>>=20 >>>> was definitely poor. It should have explicitly mentioned the >>>> swap partition too: >>>>=20 >>>> The root filesystem and swap partition are both on the same >>>> USB SSD on a powered hub. >>>>=20 >>>> More detail from dmesg -a for usb: >>>>=20 >>>> usbus0: 12Mbps Full Speed USB v1.0 >>>> usbus1: 480Mbps High Speed USB v2.0 >>>> usbus2: 12Mbps Full Speed USB v1.0 >>>> usbus3: 480Mbps High Speed USB v2.0 >>>> ugen0.1: at usbus0 >>>> uhub0: on = usbus0 >>>> ugen1.1: at usbus1 >>>> uhub1: = on usbus1 >>>> ugen2.1: at usbus2 >>>> uhub2: on = usbus2 >>>> ugen3.1: at usbus3 >>>> uhub3: = on usbus3 >>>> . . . >>>> uhub0: 1 port with 1 removable, self powered >>>> uhub2: 1 port with 1 removable, self powered >>>> uhub1: 1 port with 1 removable, self powered >>>> uhub3: 1 port with 1 removable, self powered >>>> ugen3.2: at usbus3 >>>> uhub4 on uhub3 >>>> uhub4: = on usbus3 >>>> uhub4: MTT enabled >>>> uhub4: 4 ports with 4 removable, self powered >>>> ugen3.3: at usbus3 >>>> umass0 on uhub4 >>>> umass0: on = usbus3 >>>> umass0: SCSI over Bulk-Only; quirks =3D 0x0100 >>>> umass0:0:0: Attached to scbus0 >>>> . . . >>>> da0 at umass-sim0 bus 0 scbus0 target 0 lun 0 >>>> da0: Fixed Direct Access SPC-4 SCSI device >>>> da0: Serial Number >>>> da0: 40.000MB/s transfers >>>>=20 >>>> (Edited a bit because there is other material interlaced, even >>>> internal to some lines. Also: I removed the serial number of the >>>> specific example device.) >>=20 >> Thank you. That presents a much clearer picture. >>>>=20 >>>>> I will further note that any kind of USB device cannot = automatically >>>>> be trusted to behave properly. USB devices are notorious, for = example, >>>>>=20 >>>>> [reasons why deleted --SB] >>>>>=20 >>>>> You should identify where you page/swap to and then try = substituting >>>>> a different device for that function as a test to eliminate the = possibility >>>>> of a bad storage device/controller. If the problem still occurs, = that >>>>> means there still remains the possibility that another controller = or its >>>>> firmware is defective instead. It could be a kernel bug, it is = true, but >>>>> making sure there is no hardware or firmware error occurring is = important, >>>>> and as I say, USB devices should always be considered suspect = unless and >>>>> until proven innocent. >>>>=20 >>>> [FYI: This is a ufs context, not a zfs one.] >>=20 >> Right. It's only a Pi, after all. :-) >=20 > It is a Pine64+ 2GB, not an rpi3. >=20 >>>>=20 >>>> I'm aware of such things. There is no evidence that has resulted = in >>>> suggesting the USB devices that I can replace are a problem. = Otherwise >>>> I'd not be going down this path. I only have access to the one = arm64 >>>> device (a Pine64+ 2GB) so I've no ability to substitution-test what >>>> is on that board. >>=20 >> There isn't even one open port on that hub that you could plug a >> flash drive into temporarily to be the paging device? >=20 > Why do you think that I've never tried alternative devices? It > is just that the result was no evidence that my usually-in-use > SSD is having a special/local problem: the behavior continues > across all such contexts when the Pine64+ 2GB is involved. (Again > I have not had access to an alternate to the one arm64 board. > That limits my substitution testing possibilities.) >=20 > Why would you expect a Flash drive to be better than another SSD > for such testing? (The SSD that I usually use even happens to be > a USB 3.0 SSD, capable of USB 3.0 speeds in USB 3.0 contexts. So > is the hub that I usually use for that matter.) FYI: I now have access to a rpi3 in addition to a pine64+ 2GB. I've tested on the rpi3 using a different USB hub and a different SSD: no hardware device in common with the recent Pine64+ 2GB tests (other than console cabling and what handles the serial console). The fork-then-swap-out-then-swap-in failure happens in the rpi3 context as well. Because the rpi3 has only 1 GiByte of RAM the stress commands that I used were more like: stress -m 1 --vm-bytes 1000M to get zero RES(ident memory) for the two processes from my test program after it forks while they are waiting/sleeping. >> You could then >> try your tests before returning to the normal configuration. If = there >> isn't an open port, then how about plugging a second hub into one of >> the first hub's ports and moving the displaced device to the second >> hub? A flash drive could then be plugged in. That kind of = configuration >> is obviously a bad idea for the long run, but just to try your tests = it >> ought to work well enough. >=20 > I have access to more SSDs that I can use than I do to Flash drives. I > see no reason to specifically use a Flash drive. >=20 >> (BTW, if a USB storage device containing a >> paging area drops off=3Dline even momentarily and the system needs to = use >> it, that is the beginning of the end, even though it may take up to a = few >> minutes for everything to lock up. >=20 > The system does not lock up, even days or weeks later, with having = done > dozens of experiments that show memory corruption failures over those > days. The only processes showing memory corruption so far are those > that were the parent or child for a fork that were later swapped out > to have zero RES(ident memory) and then even later swapped back in. >=20 > The context has no such issues. You are inventing problems that do > not exist in my context. That is why none of my list submittals > mention such problems: they did not occur. >=20 >> You probably won't be able to do an >> orderly shutdown, but will instead have to crash it with the power = switch. >> In the case of something like a Pi, this is an unpleasant fact of = life, >> to be sure.) >=20 > Such things did not occur and has nothing to do with my context so = far. >=20 >> I think I buy your arguments, given the evidence you've collected >> thus far, including what you've added below. I just like to = eliminate >> possibilities that are much simpler to deal with before facing = nastinesses >> like bugs in the VM subsystem. :-) >=20 > When I started this I found no evidence of device-specific problems. > My investigation activity goes back to long before my list submittals. >=20 > And I repeat: Other people have reported the symptoms that started > this investigation. They did so before I ever started my activities. > They were using none of the specific devices that I have access to. > Likely the types of devices were frequently even different, such as > a rpi3 instead of a Pine64+ 2GB or a different USB drive. I was able > to get the symptoms that they reported. >=20 >>>> It would be neat if some folks used my code to test other arm64 >>>> contexts and reported the results. I'd be very interested. >>>> (This is easier to do on devices that do not have massive >>>> amounts of RAM, which may limit the range of devices or >>>> device configurations that are reasonable to test.) >>>>=20 >>>> There is that other people using other devices have reported >>>> the behavior that started this investigation. I can produce the >>>> behavior that they reported, although I've not seen anyone else >>>> listing specific steps that lead to the problem or ways to tell >>>> if the symptom is going to happen before it actually does. Nor >>>> have I seen any other core dump analysis. (I have bugzilla >>>> submittals 217138 and 217239 tied to symptoms others have >>>> reported as well as this test program material.) >>>>=20 >>>> Also, considering that for my test program I can control which = pages >>>> get the zeroed-problem by read-accessing even one byte of any 4K >>>> Byte page that I want to make work normally, doing so in the child >>>> process of the fork, between the fork and the sleep/swap-out, it = does >>>> not suggest USB-device-specific behavior. The read-access is = changing >>>> the status of the page in some way as far as I can tell. >>>>=20 >>>> (Such read-accesses in the parent process make no difference to the >>>> behavior.) >>>=20 >>> I should have noted another comparison/contrast between >>> having memory corruption and not in my context: >>>=20 >>> I've tried variants of my test program that do not fork but >>> just sleep for 60s to allow me to force the swap-out. I >>> did this before adding fork and before using >>> parital_test_check, for example. I gradually added things >>> apparently involved in the reports others had made >>> until I found a combination that produced a memory >>> corruption test failure. >>>=20 >>> These tests without fork involved find no problems with >>> the memory content after the swap-in. >>>=20 >>> For my test program it appears that fork-before-swap-out >>> or the like is essential to having the problem occur. >>>=20 >> A comment about terminology seems in order here. It bothers >> me considerably to see you writing "swap out" or "swapping" where >> it seems like you mean to write "page out" or "paging". A BSD >> system whose swapping mechanism gets activated has already waded >> very deeply into the quicksand and frequently cannot be gotten out >> in a reasonable amount of time even with manual assistance. It is >> often quicker to crash it, reboot, and wait for the fsck(8) cleanups >> to complete. Orderly shutdowns, even of the kind that results from >> a quick poke to the power button, typically get mired in the same >> mess that already has the system in knots. Also, BSD systems since >> 3.0BSD, unlike older AT&T (pre-SysVR2.3) systems, do not swap in, >> just out. A swapped out process, once the system determines that it >> has adequate resources again to attempt to run the process, will have >> the interrupted text page paged in and the rest will be paged in by >> the normal mechanism of page faults and page-in operations. I assume >> you must already know all this, which is a large part of why it = grates >> on me that you appear to be using the wrong terms. >=20 > You apparently did not read any of the material about how the test > is done or are unfamiliar with what "stress -m 1 --vm-bytes 1800M" > does when there is only 2GB of RAM. I am deliberately inducing > swapping in other processes, including the 2 from my test program > (after the fork), not just paging. (stress is a port, not part of > the base system.) >=20 > When I say swap-out and swap-in I mean it. >=20 > =46rom the source code of my test program: >=20 > sleep(60); >=20 > // During this manually force this process to > // swap out. I use something like: >=20 > // stress -m 1 --vm-bytes 1800M >=20 > // in another shell and ^C'ing it after top > // shows the swapped status desired. 1800M > // just happened to work on the Pine64+ 2GB > // that I was using. I watch with top -PCwaopid . >=20 > That type of stress run uses about 1.8 GiBytes after a bit, > which is enough to cause the swapping of other processes, > including the two that I am testing (post-fork). (Some RAM > is in use already before the stress run, which explains not > needing 2 GiBytes to be in use by stress.) >=20 > Look at a "top -PCwaopid" display: there are columns for > RES(ident memory) and SWAP. I cause my 2 test processes to > show zero RES and everything under SWAP, starting sometime > during the 60s sleep/wait. >=20 > Why would I cause swapping? Because buildworld causes such > swap-outs at times when there is only 2GBytes of RAM, > including processes that forked earlier, and as a result > the corrupted memory problems show up later in some processes > that were swapped out at the time. The build eventually > stops for process failures tied to the corruptions of memory > in the failing processes. (At least that is what my testing > strongly suggests.) >=20 > But that is a very complicated context to use for analysis or > testing of the problem. My test program is vastly simpler > and easier/quicker to set up and test when used with stress > as well. Such was the kind of thing I was trying to find. >=20 > I want the Pine64+ 2GB to work well enough to be able to have > buildworld (-j 4) complete correctly without having to restart > the build --even when everything has to be rebuilt. So I'm > trying to find and provide enough evidence to help someone fix > the problems that are observed to block such buildworld > activity. >=20 > Again: others have reported such arm64 problems on the lists > before I ever got into this activity. The evidence is that > the issues are not a local property of my environment. >=20 > Swapping is supposed to work. I can do buildworld (-j 4) > on armv6 (really -mcpu=3Dcortex-a7 so armv7-a) and the > swapping it causes works fine. This is true for both a > bpim3 (2 GiBytes of RAM) and a rpi2 (1 GiByte of RAM > so even more swapping). On a powerpc64 with 16 GiBytes > I've built things that caused 26 GiBytes of swap to be > in use some of the time (during 4 ld's running in > parallel), with lots of processes having zero for > RES(ident memory) and all their space listed under SWAP > in a "top -PCwaopid" display. This too has no problems > with swapping of previously forked processes (or of any > other processes). >=20 > For the likes of a Pine64+ 2GB to be "self hosted"=20 > for source-code based updates, swapping of previously > forked processes must work and currently such > swapping is unreliable. =3D=3D=3D Mark Millard markmi at dsl-only.net