Date: Mon, 27 Nov 2017 10:45:39 -0800 From: Nathan Whitehorn <nwhitehorn@freebsd.org> To: Andrew Turner <andrew@fubar.geek.nz> Cc: src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r326218 - head/sys/kern Message-ID: <3deeb9e5-4f6b-0f7f-3303-c4c8ad7ad624@freebsd.org> In-Reply-To: <F4676F95-5AA5-4838-92D2-0A164F7EB79C@fubar.geek.nz> References: <201711252341.vAPNf5Qx001464@repo.freebsd.org> <59383A7D-FC99-4748-9561-53D968C4B150@fubar.geek.nz> <62864c22-e458-2f13-2b33-9dc075ed2ef2@freebsd.org> <F4676F95-5AA5-4838-92D2-0A164F7EB79C@fubar.geek.nz>
next in thread | previous in thread | raw e-mail | index | archive | help
On 11/27/17 10:06, Andrew Turner wrote: >> On 26 Nov 2017, at 17:07, Nathan Whitehorn <nwhitehorn@freebsd.org> wrote: >> >> >> >> On 11/26/17 01:46, Andrew Turner wrote: >>> On 25 Nov 2017, at 23:41, Nathan Whitehorn <nwhitehorn@freebsd.org> wrote: >>>> Author: nwhitehorn >>>> Date: Sat Nov 25 23:41:05 2017 >>>> New Revision: 326218 >>>> URL: https://svnweb.freebsd.org/changeset/base/326218 >>>> >>>> Log: >>>> Remove some, but not all, assumptions that the BSP is CPU 0 and that CPUs >>>> are numbered densely from there to n_cpus. >>>> >>>> MFC after: 1 month >>>> >>>> Modified: >>>> head/sys/kern/kern_clock.c >>>> head/sys/kern/kern_clocksource.c >>>> head/sys/kern/kern_shutdown.c >>>> head/sys/kern/kern_timeout.c >>>> head/sys/kern/sched_ule.c >>>> head/sys/kern/subr_pcpu.c >>>> >>> ... >>>> Modified: head/sys/kern/subr_pcpu.c >>>> ============================================================================== >>>> --- head/sys/kern/subr_pcpu.c Sat Nov 25 23:23:24 2017 (r326217) >>>> +++ head/sys/kern/subr_pcpu.c Sat Nov 25 23:41:05 2017 (r326218) >>>> @@ -279,6 +279,8 @@ pcpu_destroy(struct pcpu *pcpu) >>>> struct pcpu * >>>> pcpu_find(u_int cpuid) >>>> { >>>> + KASSERT(cpuid_to_pcpu[cpuid] != NULL, >>>> + ("Getting uninitialized PCPU %d", cpuid)); >>>> >>>> return (cpuid_to_pcpu[cpuid]); >>>> } >>> This breaks on one arm64 simulator I have where the device tree lists 8 cpus, but only 2 are enabled in the simulation. The ofw_cpu device nodes for these are cpu0 and cpu4 so the call to cpu_find in ofw_cpu_attach will hit this KASSERT when the CPU has not been enabled. >>> >>> Andrew >>> >>> cpu0: <Open Firmware CPU> on cpulist0 >>> cpu1: <Open Firmware CPU> on cpulist0 >>> panic: Getting uninitialized PCPU 1 >>> cpuid = 0 >>> time = 1 >>> KDB: stack backtrace: >>> db_trace_self() at db_trace_self_wrapper+0x28 >>> pc = 0xffff0000005f03e8 lr = 0xffff000000087048 >>> sp = 0xffff0000000105a0 fp = 0xffff0000000107b0 >> That's unfortunate. Removing the new KASSERT is probably not the right option, since all it is doing is indicating a pre-existing bug. Specifically, it is preventing ofw_cpu from using a NULL pointer for CPU 4, which I suppose it was previously happily doing (it would only get dereferenced if you had cpufreq support, hence no previous panic). > I would expect cpu4 to have a non-NULL value as its cpu pointer will have been set. It’s cpu1 that is the issue as it’s in the device tree, but doesn’t exist in “hardware”. Because of this it fails when we try to enable it so free it’s pcpu data clear the pointer. Yes, I phrased that badly. As you say, the CPU ID that ofw_cpu has is "1", which doesn't exist. To fix this, we need to change the CPU ID that ofw_cpu has to match what the kernel thinks it is using. > >> A super quick-and-dirty fix would be to be fail attach on CPU_ABSENT(device_get_unit(dev)), which restores the previous buggy behavior but without the panic. If you do not actually use ofw_cpu for anything, we could also just disable it temporarily on your platform (it's off for some PPC systems, you will note, for similar reasons). > We used to use it to find CPUs to start, however this is no longer the case. OK, so disabling it works fine for now. > >> A real fix is somewhat complex, since the driver relies on being able to get a mapping from platform-specific numbers in the device tree to an entry in pcpu, which intrinsically relies on reverse-engineering the platform's mapping between some kind of hardware CPU ID and the kernel CPU numbering. I can't think of a way to do that internally. We could make some kind of platform macro, but the whole concept is even somewhat dubious at the moment since a number of IBM systems with SMT don't even have a 1:1 relationship between CPUs as FreeBSD defines them and device tree nodes, so it's possible we need a serious rearchitecture of the driver. > On arm64 we use the ID from the device tree to assign the cpuid so there is a 1:1 mapping between the two. In most cases both are identical, the only case that’s not correct is when we boot from a non-zero CPU, and I know of only one SoC where this is true. The "ID from the device tree" being the value of "reg"? Or something else? I see a lot of variation in the encoding in sys/dts and there does not seem to be an obvious single answer. This is making me think that the only workable solution is something like a new ofw_machdep function that gets the phandle for a given CPU ID. -Nathan > > Andrew > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3deeb9e5-4f6b-0f7f-3303-c4c8ad7ad624>