From owner-svn-src-head@freebsd.org Sat Feb 25 09:06:03 2017 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9B661CEA700 for ; Sat, 25 Feb 2017 09:06:03 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-88.reflexion.net [208.70.210.88]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5B8C085B for ; Sat, 25 Feb 2017 09:06:02 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 10423 invoked from network); 25 Feb 2017 09:08:12 -0000 Received: from unknown (HELO mail-cs-02.app.dca.reflexion.local) (10.81.19.2) by 0 (rfx-qmail) with SMTP; 25 Feb 2017 09:08:12 -0000 Received: by mail-cs-02.app.dca.reflexion.local (Reflexion email security v8.30.0) with SMTP; Sat, 25 Feb 2017 04:06:01 -0500 (EST) Received: (qmail 22712 invoked from network); 25 Feb 2017 09:06:01 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 25 Feb 2017 09:06:01 -0000 Received: from [192.168.1.111] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id 66939EC86E2; Sat, 25 Feb 2017 01:06:00 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works] From: Mark Millard In-Reply-To: Date: Sat, 25 Feb 2017 01:05:59 -0800 Cc: Justin Hibbits , mjg@freebsd.org, FreeBSD Current , svn-src-head@freebsd.org, FreeBSD PowerPC ML Content-Transfer-Encoding: quoted-printable Message-Id: <477BA631-AB85-4E77-8BA3-CD2AFAD5E405@dsl-only.net> References: <2FD12B8F-2255-470A-98D4-2DCE9C7495F5@dsl-only.net> <20170220191044.GA8526@dft-labs.eu> <5D5235E1-6F84-4329-8ED5-35FCDB0A6A71@dsl-only.net> <20170225002300.GC19697@dft-labs.eu> <12339EDD-5663-40E0-8553-821EF9B6CDEB@dsl-only.net> To: Mateusz Guzik X-Mailer: Apple Mail (2.3259) X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Feb 2017 09:06:03 -0000 On 2017-Feb-24, at 11:46 PM, Mark Millard = wrote: > On 2017-Feb-24, at 8:25 PM, Mark Millard = wrote: >=20 >> On 2017-Feb-24, at 4:23 PM, Mateusz Guzik = wrote: >>>=20 >>> On Tue, Feb 21, 2017 at 01:37:25AM -0800, Mark Millard wrote: >>>> [Back to the powerpc64 context.] >>>>=20 >>>> On 2017-Feb-20, at 11:10 AM, Mateusz Guzik = wrote: >>>>=20 >>>>> On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote: >>>>>> [Note: I experiment with clang based powerpc64 builds, >>>>>> reporting problems that I find. Justin is familiar >>>>>> with this, as is Nathan.] >>>>>>=20 >>>>>> I tried to update the PowerMac G5 (a so-called "Quad Core") >>>>>> that I have access to from head -r312761 to -r313864 and >>>>>> ended up with random panics and hang ups in fairly short >>>>>> order after booting. >>>>>>=20 >>>>>> Some approximate bisecting for the kernel lead to: >>>>>> (sometimes getting part way into a buildkernel attempt >>>>>> for a different version before a failure happens) >>>>>>=20 >>>>>> -r313266: works (just before use of atomic_fcmpset) >>>>>> vs. >>>>>> -r313271: fails (last of the "use atomic_fcmpset" check-ins) >>>>>>=20 >>>>>> (I did not try -r313268 through -r313270 as the use was >>>>>> gradually added.) >>>>>>=20 >>>>>> So I'm currently running a -r313864 world with a -r313266 >>>>>> kernel. >>>>>>=20 >>>>>> No kernel that I tried that was from before -r313266 had the >>>>>> problems. >>>>>>=20 >>>>>> Any kernel that I tried that was from after -r313271 had the >>>>>> problems. >>>>>>=20 >>>>>> Of course I did not try them all in other direction. :) >>>>>>=20 >>>>>=20 >>>>> I found that spin mutexes were not properly handling this, fixed = in >>>>> r313996. >>>>>=20 >>>>> Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64 >>>>> fcmpset to simulate failures. Everything works, while it would = easily >>>>> fail without the patch. >>>>>=20 >>>>> That said, I hope this concludes the 'missing check for not-reread = value >>>>> of failed fcmpset' saga. >>>>>=20 >>>>> --=20 >>>>> Mateusz Guzik >>>>=20 >>>> -r313999 is an improvement for powerpc64: it boots and I can >>>> log in on the old PowerMac G5 so-called "Quad Core". >>>>=20 >>>> But, e.g., buildworld buildkernel eventually hangs and later >>>> the powerpc64 panics for "spin lock held too long". >>>>=20 >>>=20 >>> Allright, play time is over. >>>=20 >>> Can you please: >>> 1. verify r313254 is stable for you >>> 2. apply https://people.freebsd.org/~mjg/patches/complete-locks.diff = and >>> https://people.freebsd.org/~mjg/.junk/ppc.diff on top of it and = retry >>> the test? >>>=20 >>> This is a workaround which effectively disables the powerpc-specific >>> primitive and makes it use a cmpset wrapper instead. I don't have = the >>> hardware to test right now and my attempts to boot in qemu also = failed. >>>=20 >>> That said, does not look like there are general fcmpset bugs left = and >>> the remaining issue seems powerpc-specific. >>>=20 >>> If this works, I'll commit the workaround for the time being as in = few >>> weeks I'd like to start merging the work back to stable/11. >>>=20 >>> --=20 >>> Mateusz Guzik >>=20 >> I've started a self-hosted powerpc64 -r313254 build >> based on running the -r313266 kernel. (The context=20 >> sometimes do cross builds in is tied up with other >> things. -r313266 is what my prior bisection came up >> with as the last appearently-working kernel at the >> time.) >>=20 >> So it will be a while before I have a -r313254 in >> place to try: the self-hosted build takes longer >> and so will not be installed for a while. >>=20 >> To judge stability I'll probably have -e313254 build >> the patched update that you want me to test, initially >> doing a cleanworld. So that too will take a while. >>=20 >> (The above wording presumes all goes well.) >>=20 >> I'll let you know as I go along if I run into anything >> interesting. >>=20 >>=20 >> My builds are rebuilding both world and kernel since >> what turns into /usr/include/sys/* has changes in your >> patch. >>=20 >> The builds are without MALLOC_PRODUCTION but are >> otherwise not debug builds. >>=20 >>=20 >> I've not seen anything indicating that anyone has >> been trying TARGET_ARCH=3Dpowerpc. I've been trying >> TARGET_ARCH=3Dpowerpc64 . >>=20 >> While I do not have access to a true >> TARGET_ARCH=3Dpowerpc machine currently, such a build >> can be used on a PowerMac G5 so-called "Quad Core". >> So I could eventually build and try such on the one >> powerpc family machine that I currently have access >> to. >>=20 >> clang 3.9.1 has a significant code generation problem >> for TARGET_ARCH=3Dpowerpc and so I'd have to use >> a gcc 4.2.1 based build for that sort of experiment. >> (There is no xtoolchain for 32-bit powerpc.) >>=20 >> I use clang 3.9.1 or xtoolchain for >> TARGET_ARCH=3Dpowerpc64 and have been using clang 3.9.1 >> in recent times. My primary powerpc family use has >> been to experiment with building based on the >> modern libc++ and reporting issues discovered in the >> attempts. This explains the clang/xtoolchain context. >>=20 >> clang 3.9.1 has major problems for C++ exception >> handling for both powerpc64 and powerpc but a >> lot of FreeBSD is independent of throwing C++ >> exceptions. By contrast xtoolchain-based works >> for C++ exception handling but lib32 fails >> to operate when built by a xtoolchain build. >=20 > -r313254 had no trouble booting or building > the patched version or anything else involved > in getting there or installing. >=20 > But the patched version failed quickly just > attempting cleanworld's recursive remove. (So > it did boot and let me log in.) The panic > description was: >=20 > panic: vn_finished_secondary_write: neg cnt >=20 >=20 > The sources that are different from svn's -r313254 > are (some tied to arm64 experiments, most everything > else tied to powerpc64 and/or powerpc, those not > from your patches are long standing from my > investigations or from Justin H.): >=20 > # svnlite status /usr/src | sort > . . . (ignoring the ? lines) . . . > M /usr/src/bin/sh/jobs.c > M /usr/src/bin/sh/miscbltin.c > M /usr/src/contrib/llvm/lib/Target/PowerPC/PPCInstrInfo.td > M /usr/src/contrib/llvm/tools/lld/ELF/Target.cpp > M /usr/src/lib/csu/powerpc64/Makefile > M /usr/src/libexec/rtld-elf/Makefile > M /usr/src/sys/arm/arm/gic.c > M /usr/src/sys/boot/ofw/Makefile.inc > M /usr/src/sys/boot/powerpc/Makefile.inc > M /usr/src/sys/boot/powerpc/kboot/Makefile > M /usr/src/sys/boot/uboot/Makefile.inc > M /usr/src/sys/conf/kmod.mk > M /usr/src/sys/ddb/db_main.c > M /usr/src/sys/ddb/db_script.c > M /usr/src/sys/kern/init_main.c > M /usr/src/sys/kern/kern_condvar.c > M /usr/src/sys/kern/kern_lock.c > M /usr/src/sys/kern/kern_lockstat.c > M /usr/src/sys/kern/kern_mutex.c > M /usr/src/sys/kern/kern_rwlock.c > M /usr/src/sys/kern/kern_sx.c > M /usr/src/sys/kern/kern_synch.c > M /usr/src/sys/kern/kern_thread.c > M /usr/src/sys/kern/subr_lock.c > M /usr/src/sys/kern/vfs_default.c > M /usr/src/sys/kern/vfs_subr.c > M /usr/src/sys/powerpc/include/atomic.h > M /usr/src/sys/powerpc/ofw/ofw_machdep.c > M /usr/src/sys/sys/lock.h > M /usr/src/sys/sys/lockmgr.h > M /usr/src/sys/sys/lockstat.h > M /usr/src/sys/sys/mutex.h > M /usr/src/sys/sys/rwlock.h > M /usr/src/sys/sys/sdt.h > M /usr/src/sys/sys/sx.h > M /usr/src/sys/sys/systm.h To recover from the problem and again have a buildworld buildkernel present I've booted based on: A) The -r313254 kernel without your patches (kernel.old). B) The -r313254 world (which had your patches in its build). I've reverted the /usr/src/ to not have your patches (but does have my prior ones from prior activity). I repeated the cleanworld to let it finish after its prior failure (that failed during a SSD trim activity). I've started buildworld buildkernel (with -j 4 as is normal for my context). So far this combination seems to be working fine. This suggests that the sys/sys/*.h files that ended up in /usr/include/sys/ and the sys/powerpc/include/atomic.h that ended up in /usr/include/machine/ were not problems as used in the world code --since those uses are still in place in the binaries being used. Only the kernel binaries seem to be a problem (not necessarily all of them). =3D=3D=3D Mark Millard markmi at dsl-only.net