From owner-freebsd-current@FreeBSD.ORG  Thu May 29 21:02:34 2014
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 930F958D
 for <freebsd-current@freebsd.org>; Thu, 29 May 2014 21:02:34 +0000 (UTC)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 69E1F2761
 for <freebsd-current@freebsd.org>; Thu, 29 May 2014 21:02:34 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 532A7B986;
 Thu, 29 May 2014 17:02:33 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Processor cores not properly detected/activated?
Date: Thu, 29 May 2014 16:22:12 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; )
References: <20140524014713.GF13462@carrick-users.bishnet.net>
 <201405291444.19497.jhb@freebsd.org> <20140529192756.GI3991@kib.kiev.ua>
In-Reply-To: <20140529192756.GI3991@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-15"
Content-Transfer-Encoding: 7bit
Message-Id: <201405291622.12543.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Thu, 29 May 2014 17:02:33 -0400 (EDT)
Cc: freebsd-current <freebsd-current@freebsd.org>
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current/>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 May 2014 21:02:34 -0000

On Thursday, May 29, 2014 3:27:57 pm Konstantin Belousov wrote:
> On Thu, May 29, 2014 at 02:44:19PM -0400, John Baldwin wrote:
> > On Thursday, May 29, 2014 2:24:45 pm Adrian Chadd wrote:
> > > On 29 May 2014 10:18, John Baldwin <jhb@freebsd.org> wrote:
> > > 
> > > >> > It costs wired memory to increase it for the kernel.  The userland set size
> > > >> > can be increased rather arbitrarily, so we don't need to make it but so large
> > > >> > as it is easy to bump later (even with a branch).
> > > >>
> > > >> Well, what about making the API/KBI use cpuset_t pointers for things
> > > >> rather than including it as a bitmask? Do you think there'd be a
> > > >> noticable performance overhead for the bits where it's indirecting
> > > >> through a pointer to get to the bitmask data?
> > > >
> > > > The wired memory is not due to cpuset_t.  The wired memory usage is due to things
> > > > that do 'struct foo foo_bits[MAXCPU]'.  The KBI issues I mentioned above are
> > > > 'struct rmlock' (so now you want any rmlock users to malloc space, or you
> > > > want rmlock_init() call malloc?  (that seems like a bad idea)).  The other one
> > > > is smp_rendezvous.  Plus, it's not just a pointer, you really need a (pointer,
> > > > size_t) tuple similar to what cpuset_getaffinity(), etc. use.
> > > 
> > > Why would calling malloc be a problem? Except for the initial setup of
> > > things, anything dynamically allocating structs with embedded things
> > > like rmlocks are already dynamically allocating them via malloc or
> > > uma.
> > > 
> > > There's a larger fundamental problem with malloc, fragmentation and
> > > getting the required larger allocations for things. But even a 4096
> > > CPU box would require a 512 byte malloc. That shouldn't be that hard
> > > to do. It'd just be from some memory that isn't close to the rest of
> > > the lock state.
> > 
> > Other similar APIs like mtx_init() don't call malloc(), so it would be
> > unusual behavior.  However, we have several other problems before we can
> > move beyond 256 anyway (like pf).
> 
> What is pf ?

The firewall, though it might be fixable without too much trouble:

#define	PFID_CPUBITS	8
#define	PFID_CPUSHIFT	(sizeof(uint64_t) * NBBY - PFID_CPUBITS)
#define	PFID_CPUMASK	((uint64_t)((1 << PFID_CPUBITS) - 1) <<	PFID_CPUSHIFT)
#define	PFID_MAXID	(~PFID_CPUMASK)
CTASSERT((1 << PFID_CPUBITS) >= MAXCPU);

In theory we can just bump up PFID_CPUBITS to 32, though I'm not sure how many
bits PFID_MAXID should have (e.g. does that cap your rules, or does that cap your
state entries, etc.)?

> We definitely have a problem with the legacy APIC mode, we must start
> using x2APIC, I believe.

Correct.  Handling X2APIC entries isn't that hard, but right now the x86
MP code uses an array indexed by APIC ID to map to CPU IDs early on and
we'll need to probably replace that with a flat array indexed by CPU ID
and resort to linear searches to check for dupes, etc.  The first thing
we would need would be a machine that actually did X2APIC and created
X2APIC MADT entries, etc.  Shouldn't be too hard to simulate this in
bhyve though.

-- 
John Baldwin