From owner-svn-src-all@freebsd.org Sun Nov 26 17:07:32 2017 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 335E7DEBB8D; Sun, 26 Nov 2017 17:07:32 +0000 (UTC) (envelope-from nwhitehorn@freebsd.org) Received: from c.mail.sonic.net (c.mail.sonic.net [64.142.111.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id EFF007069A; Sun, 26 Nov 2017 17:07:31 +0000 (UTC) (envelope-from nwhitehorn@freebsd.org) Received: from comporellon.tachypleus.net (cpe-75-82-218-62.socal.res.rr.com [75.82.218.62]) (authenticated bits=0) by c.mail.sonic.net (8.15.1/8.15.1) with ESMTPSA id vAQH7Smu022084 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Sun, 26 Nov 2017 09:07:29 -0800 Subject: Re: svn commit: r326218 - head/sys/kern To: Andrew Turner Cc: src-committers , svn-src-all@freebsd.org, svn-src-head@freebsd.org References: <201711252341.vAPNf5Qx001464@repo.freebsd.org> <59383A7D-FC99-4748-9561-53D968C4B150@fubar.geek.nz> From: Nathan Whitehorn Message-ID: <62864c22-e458-2f13-2b33-9dc075ed2ef2@freebsd.org> Date: Sun, 26 Nov 2017 09:07:27 -0800 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <59383A7D-FC99-4748-9561-53D968C4B150@fubar.geek.nz> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Sonic-CAuth: UmFuZG9tSVbOx2+VNdBpwjTv0t7WeLmL/f9MFlSF+kMdCphldoJW1OC/H5DuW/WvOmN0mi8SkQTadR+QjIfgIpdF/k6r3TrKvJqdY78bre8= X-Sonic-ID: C;tBkvSMzS5xGNausnWtmBlw== M;0hCSSMzS5xGNausnWtmBlw== X-Spam-Flag: No X-Sonic-Spam-Details: 0.0/5.0 by cerberusd X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Nov 2017 17:07:32 -0000 On 11/26/17 01:46, Andrew Turner wrote: > On 25 Nov 2017, at 23:41, Nathan Whitehorn wrote: >> Author: nwhitehorn >> Date: Sat Nov 25 23:41:05 2017 >> New Revision: 326218 >> URL: https://svnweb.freebsd.org/changeset/base/326218 >> >> Log: >> Remove some, but not all, assumptions that the BSP is CPU 0 and that CPUs >> are numbered densely from there to n_cpus. >> >> MFC after: 1 month >> >> Modified: >> head/sys/kern/kern_clock.c >> head/sys/kern/kern_clocksource.c >> head/sys/kern/kern_shutdown.c >> head/sys/kern/kern_timeout.c >> head/sys/kern/sched_ule.c >> head/sys/kern/subr_pcpu.c >> > ... >> Modified: head/sys/kern/subr_pcpu.c >> ============================================================================== >> --- head/sys/kern/subr_pcpu.c Sat Nov 25 23:23:24 2017 (r326217) >> +++ head/sys/kern/subr_pcpu.c Sat Nov 25 23:41:05 2017 (r326218) >> @@ -279,6 +279,8 @@ pcpu_destroy(struct pcpu *pcpu) >> struct pcpu * >> pcpu_find(u_int cpuid) >> { >> + KASSERT(cpuid_to_pcpu[cpuid] != NULL, >> + ("Getting uninitialized PCPU %d", cpuid)); >> >> return (cpuid_to_pcpu[cpuid]); >> } > This breaks on one arm64 simulator I have where the device tree lists 8 cpus, but only 2 are enabled in the simulation. The ofw_cpu device nodes for these are cpu0 and cpu4 so the call to cpu_find in ofw_cpu_attach will hit this KASSERT when the CPU has not been enabled. > > Andrew > > cpu0: on cpulist0 > cpu1: on cpulist0 > panic: Getting uninitialized PCPU 1 > cpuid = 0 > time = 1 > KDB: stack backtrace: > db_trace_self() at db_trace_self_wrapper+0x28 > pc = 0xffff0000005f03e8 lr = 0xffff000000087048 > sp = 0xffff0000000105a0 fp = 0xffff0000000107b0 That's unfortunate. Removing the new KASSERT is probably not the right option, since all it is doing is indicating a pre-existing bug. Specifically, it is preventing ofw_cpu from using a NULL pointer for CPU 4, which I suppose it was previously happily doing (it would only get dereferenced if you had cpufreq support, hence no previous panic). A super quick-and-dirty fix would be to be fail attach on CPU_ABSENT(device_get_unit(dev)), which restores the previous buggy behavior but without the panic. If you do not actually use ofw_cpu for anything, we could also just disable it temporarily on your platform (it's off for some PPC systems, you will note, for similar reasons). A real fix is somewhat complex, since the driver relies on being able to get a mapping from platform-specific numbers in the device tree to an entry in pcpu, which intrinsically relies on reverse-engineering the platform's mapping between some kind of hardware CPU ID and the kernel CPU numbering. I can't think of a way to do that internally. We could make some kind of platform macro, but the whole concept is even somewhat dubious at the moment since a number of IBM systems with SMT don't even have a 1:1 relationship between CPUs as FreeBSD defines them and device tree nodes, so it's possible we need a serious rearchitecture of the driver. Please let me know how I can help get this fixed. -Nathan