From owner-svn-src-all@freebsd.org Mon Nov 27 18:06:49 2017 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3C8E3DBB222; Mon, 27 Nov 2017 18:06:49 +0000 (UTC) (envelope-from andrew@fubar.geek.nz) Received: from fry.fubar.geek.nz (fry.fubar.geek.nz [139.59.165.16]) by mx1.freebsd.org (Postfix) with ESMTP id CAF537E0A6; Mon, 27 Nov 2017 18:06:48 +0000 (UTC) (envelope-from andrew@fubar.geek.nz) Received: from dhcp-10-248-127-139.eduroam.wireless.private.cam.ac.uk (global-5-144.nat-2.net.cam.ac.uk [131.111.5.144]) by fry.fubar.geek.nz (Postfix) with ESMTPSA id 00B9F4E6DC; Mon, 27 Nov 2017 18:06:11 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: svn commit: r326218 - head/sys/kern From: Andrew Turner In-Reply-To: <62864c22-e458-2f13-2b33-9dc075ed2ef2@freebsd.org> Date: Mon, 27 Nov 2017 18:06:11 +0000 Cc: src-committers , svn-src-all@freebsd.org, svn-src-head@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <201711252341.vAPNf5Qx001464@repo.freebsd.org> <59383A7D-FC99-4748-9561-53D968C4B150@fubar.geek.nz> <62864c22-e458-2f13-2b33-9dc075ed2ef2@freebsd.org> To: Nathan Whitehorn X-Mailer: Apple Mail (2.3273) X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Nov 2017 18:06:49 -0000 > On 26 Nov 2017, at 17:07, Nathan Whitehorn = wrote: >=20 >=20 >=20 > On 11/26/17 01:46, Andrew Turner wrote: >> On 25 Nov 2017, at 23:41, Nathan Whitehorn = wrote: >>> Author: nwhitehorn >>> Date: Sat Nov 25 23:41:05 2017 >>> New Revision: 326218 >>> URL: https://svnweb.freebsd.org/changeset/base/326218 >>>=20 >>> Log: >>> Remove some, but not all, assumptions that the BSP is CPU 0 and = that CPUs >>> are numbered densely from there to n_cpus. >>>=20 >>> MFC after: 1 month >>>=20 >>> Modified: >>> head/sys/kern/kern_clock.c >>> head/sys/kern/kern_clocksource.c >>> head/sys/kern/kern_shutdown.c >>> head/sys/kern/kern_timeout.c >>> head/sys/kern/sched_ule.c >>> head/sys/kern/subr_pcpu.c >>>=20 >> ... >>> Modified: head/sys/kern/subr_pcpu.c >>> = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D >>> --- head/sys/kern/subr_pcpu.c Sat Nov 25 23:23:24 2017 = (r326217) >>> +++ head/sys/kern/subr_pcpu.c Sat Nov 25 23:41:05 2017 = (r326218) >>> @@ -279,6 +279,8 @@ pcpu_destroy(struct pcpu *pcpu) >>> struct pcpu * >>> pcpu_find(u_int cpuid) >>> { >>> + KASSERT(cpuid_to_pcpu[cpuid] !=3D NULL, >>> + ("Getting uninitialized PCPU %d", cpuid)); >>>=20 >>> return (cpuid_to_pcpu[cpuid]); >>> } >> This breaks on one arm64 simulator I have where the device tree lists = 8 cpus, but only 2 are enabled in the simulation. The ofw_cpu device = nodes for these are cpu0 and cpu4 so the call to cpu_find in = ofw_cpu_attach will hit this KASSERT when the CPU has not been enabled. >>=20 >> Andrew >>=20 >> cpu0: on cpulist0 >> cpu1: on cpulist0 >> panic: Getting uninitialized PCPU 1 >> cpuid =3D 0 >> time =3D 1 >> KDB: stack backtrace: >> db_trace_self() at db_trace_self_wrapper+0x28 >> pc =3D 0xffff0000005f03e8 lr =3D 0xffff000000087048 >> sp =3D 0xffff0000000105a0 fp =3D 0xffff0000000107b0 >=20 > That's unfortunate. Removing the new KASSERT is probably not the right = option, since all it is doing is indicating a pre-existing bug. = Specifically, it is preventing ofw_cpu from using a NULL pointer for CPU = 4, which I suppose it was previously happily doing (it would only get = dereferenced if you had cpufreq support, hence no previous panic). I would expect cpu4 to have a non-NULL value as its cpu pointer will = have been set. It=E2=80=99s cpu1 that is the issue as it=E2=80=99s in = the device tree, but doesn=E2=80=99t exist in =E2=80=9Chardware=E2=80=9D. = Because of this it fails when we try to enable it so free it=E2=80=99s = pcpu data clear the pointer. >=20 > A super quick-and-dirty fix would be to be fail attach on = CPU_ABSENT(device_get_unit(dev)), which restores the previous buggy = behavior but without the panic. If you do not actually use ofw_cpu for = anything, we could also just disable it temporarily on your platform = (it's off for some PPC systems, you will note, for similar reasons). We used to use it to find CPUs to start, however this is no longer the = case. >=20 > A real fix is somewhat complex, since the driver relies on being able = to get a mapping from platform-specific numbers in the device tree to an = entry in pcpu, which intrinsically relies on reverse-engineering the = platform's mapping between some kind of hardware CPU ID and the kernel = CPU numbering. I can't think of a way to do that internally. We could = make some kind of platform macro, but the whole concept is even somewhat = dubious at the moment since a number of IBM systems with SMT don't even = have a 1:1 relationship between CPUs as FreeBSD defines them and device = tree nodes, so it's possible we need a serious rearchitecture of the = driver. On arm64 we use the ID from the device tree to assign the cpuid so there = is a 1:1 mapping between the two. In most cases both are identical, the = only case that=E2=80=99s not correct is when we boot from a non-zero = CPU, and I know of only one SoC where this is true. Andrew