From owner-freebsd-bugs@FreeBSD.ORG Tue Aug 24 22:50:06 2010 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 51F5E1065672 for ; Tue, 24 Aug 2010 22:50:06 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 3643A8FC13 for ; Tue, 24 Aug 2010 22:50:06 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o7OMo6JT031206 for ; Tue, 24 Aug 2010 22:50:06 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o7OMo6TN031205; Tue, 24 Aug 2010 22:50:06 GMT (envelope-from gnats) Date: Tue, 24 Aug 2010 22:50:06 GMT Message-Id: <201008242250.o7OMo6TN031205@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org From: Garrett Cooper Cc: Subject: Re: kern/145385: [cpu] Logical processor cannot be disabled for some SMT-enabled Intel procs X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Garrett Cooper List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Aug 2010 22:50:06 -0000 The following reply was made to PR kern/145385; it has been noted by GNATS. From: Garrett Cooper To: Jeff Roberson Cc: Garrett Cooper , bug-followup@freebsd.org, jkim@freebsd.org, Attilio Rao , jeff@freebsd.org Subject: Re: kern/145385: [cpu] Logical processor cannot be disabled for some SMT-enabled Intel procs Date: Tue, 24 Aug 2010 15:45:15 -0700 On Tue, Aug 24, 2010 at 2:51 PM, Garrett Cooper wrote: > On Aug 24, 2010, at 2:03 PM, Jeff Roberson wrote: > > > On Tue, 24 Aug 2010, Garrett Cooper wrote: > > On Tue, Aug 24, 2010 at 12:22 PM, Jeff Roberson > wrote: > > On Tue, 24 Aug 2010, Garrett Cooper wrote: > > On Mon, Aug 23, 2010 at 6:33 AM, John Baldwin wrote: > > On Sunday, August 22, 2010 4:17:37 am Garrett Cooper wrote: > > =A0 =A0 =A0 The following trivial patch fixes the issue on my W3520 proce= ssor; > > AFAICS > > it's what should be done after reading several of the specs because the > > logical count that's tracked with ebx is exactly what is needed for > > logical_cpus (it's an absolute quantity). I need to verify it with a > > multi-cpu > > topology at work (the two r710s I was testing with E-series Xeons on > > aren't > > available remotely right now). > > Thanks! > > -Garrett > > Jung-uk Kim and Attilio Rao have both been looking at this code recently > > and > > are in a better position to review the patch in the PR. > > (Moving jhb@ to BCC, adding jeff@ for possible input on ULE) > > The patch works as expected (it now properly detects the SMIT CPUs as > > logical CPUs), but setting machdep.hlt_logical_cpus=3D1 causes other > > problems with scheduling tasks because certain kernel threads get > > stuck at boot when netbooting (in particular I've seen problems with > > usbhub* and a few others bits), so in order for > > machdep.hlt_logical_cpus to be fixed on SMT processors, it might > > require some changes to the ULE scheduler to shuffle around the > > threads to available cores/processors? > > > hlt_logical_cpus should be rewritten to use cpusets to change the default > > system set rather than specifically halting those cpus. =A0There are a nu= mber > > of loops in the kernel that iterate over all cpus and attempt to bind and > > perform some task. =A0I think there are a number of other reasons to pref= er a > > less aggressive approach to avoiding the logical cpus as well. Simply > > preventing user thread schedule will achieve the intent of the sysctl in = any > > event. > > =A0=A0Ok... in that event then the bug is ok, but maybe I should add > > some code to the patch to warn the user about functional issues > > associated with halting logical CPUs? > > I don't think the bug is ok. =A0We probably shouldn't have sysctls which > readily break the kernel. =A0As I said we should instead have the sysctl > backend to cpuset. =A0It shouldn't take more than an hour to code and tes= t. Ok.. I'll look at this once I have my other system back online so I can actively break something until I get it to work. Thanks, -Garrett