From owner-freebsd-current@FreeBSD.ORG Thu May 29 21:02:34 2014 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 930F958D for ; Thu, 29 May 2014 21:02:34 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 69E1F2761 for ; Thu, 29 May 2014 21:02:34 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 532A7B986; Thu, 29 May 2014 17:02:33 -0400 (EDT) From: John Baldwin To: Konstantin Belousov Subject: Re: Processor cores not properly detected/activated? Date: Thu, 29 May 2014 16:22:12 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; ) References: <20140524014713.GF13462@carrick-users.bishnet.net> <201405291444.19497.jhb@freebsd.org> <20140529192756.GI3991@kib.kiev.ua> In-Reply-To: <20140529192756.GI3991@kib.kiev.ua> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201405291622.12543.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 29 May 2014 17:02:33 -0400 (EDT) Cc: freebsd-current X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 May 2014 21:02:34 -0000 On Thursday, May 29, 2014 3:27:57 pm Konstantin Belousov wrote: > On Thu, May 29, 2014 at 02:44:19PM -0400, John Baldwin wrote: > > On Thursday, May 29, 2014 2:24:45 pm Adrian Chadd wrote: > > > On 29 May 2014 10:18, John Baldwin wrote: > > > > > > >> > It costs wired memory to increase it for the kernel. The userland set size > > > >> > can be increased rather arbitrarily, so we don't need to make it but so large > > > >> > as it is easy to bump later (even with a branch). > > > >> > > > >> Well, what about making the API/KBI use cpuset_t pointers for things > > > >> rather than including it as a bitmask? Do you think there'd be a > > > >> noticable performance overhead for the bits where it's indirecting > > > >> through a pointer to get to the bitmask data? > > > > > > > > The wired memory is not due to cpuset_t. The wired memory usage is due to things > > > > that do 'struct foo foo_bits[MAXCPU]'. The KBI issues I mentioned above are > > > > 'struct rmlock' (so now you want any rmlock users to malloc space, or you > > > > want rmlock_init() call malloc? (that seems like a bad idea)). The other one > > > > is smp_rendezvous. Plus, it's not just a pointer, you really need a (pointer, > > > > size_t) tuple similar to what cpuset_getaffinity(), etc. use. > > > > > > Why would calling malloc be a problem? Except for the initial setup of > > > things, anything dynamically allocating structs with embedded things > > > like rmlocks are already dynamically allocating them via malloc or > > > uma. > > > > > > There's a larger fundamental problem with malloc, fragmentation and > > > getting the required larger allocations for things. But even a 4096 > > > CPU box would require a 512 byte malloc. That shouldn't be that hard > > > to do. It'd just be from some memory that isn't close to the rest of > > > the lock state. > > > > Other similar APIs like mtx_init() don't call malloc(), so it would be > > unusual behavior. However, we have several other problems before we can > > move beyond 256 anyway (like pf). > > What is pf ? The firewall, though it might be fixable without too much trouble: #define PFID_CPUBITS 8 #define PFID_CPUSHIFT (sizeof(uint64_t) * NBBY - PFID_CPUBITS) #define PFID_CPUMASK ((uint64_t)((1 << PFID_CPUBITS) - 1) << PFID_CPUSHIFT) #define PFID_MAXID (~PFID_CPUMASK) CTASSERT((1 << PFID_CPUBITS) >= MAXCPU); In theory we can just bump up PFID_CPUBITS to 32, though I'm not sure how many bits PFID_MAXID should have (e.g. does that cap your rules, or does that cap your state entries, etc.)? > We definitely have a problem with the legacy APIC mode, we must start > using x2APIC, I believe. Correct. Handling X2APIC entries isn't that hard, but right now the x86 MP code uses an array indexed by APIC ID to map to CPU IDs early on and we'll need to probably replace that with a flat array indexed by CPU ID and resort to linear searches to check for dupes, etc. The first thing we would need would be a machine that actually did X2APIC and created X2APIC MADT entries, etc. Shouldn't be too hard to simulate this in bhyve though. -- John Baldwin