FreeBSD Mail Archives

Date:      Tue, 06 Aug 2002 15:17:46 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Julian Elischer <julian@elischer.org>
Cc:        Luigi Rizzo <rizzo@icir.org>, Peter Wemm <peter@wemm.org>, smp@freebsd.org
Subject:   Re: how to create per-cpu variables in SMP kernels ?
Message-ID:  <3D504B0A.9FDB3A47@mindspring.com>
References:  <Pine.BSF.4.21.0208061005150.65715-100000@InterJet.elischer.org>

index | next in thread | previous in thread | raw e-mail

Julian Elischer wrote:
> On Tue, 6 Aug 2002, Terry Lambert wrote:
> > Luigi Rizzo wrote:
> >
> > NO.  This can not work.
> >
> > The problem is that the per-CPU are is mapped into the same
> > location on each CPU -- and *totally inaccessible* to other CPUs.
> 
> Terry, -current does not have multile page directories , one per cpu.
> (not any more).  we use teh %fs register which is not used for anything
> else to have a special 'per-cpu segment'.  The per-cpu mappings were
> being used at one stage but not any more... each pcpu area lives at a
> different virtual address now.
> 
> basically, we do *all_pcpu[MAXCPU]  where the index is achieved
> using the unused %fs register to make teh indexing work.
> 
> in some places in the kernel we actually iterate through the elements.
> if they were not at different addreses this would not be possible.

Julian, I'm not sure that Luigi is dealing with SMP on -current
rather than SMP on -stable.

However, looking at the HEAD branch of /sys/i386/i386/locore.s,
I still see:

----------------------------------------------------------------------
#ifdef SMP
/*
 * Define layout of per-cpu address space.
 * This is "constructed" in locore.s on the BSP and in mp_machdep.c
 * for each AP.  DO NOT REORDER THESE WITHOUT UPDATING THE REST!
 */
        .globl  SMP_prvspace, lapic
        .set    SMP_prvspace,(MPPTDI << PDRSHIFT)
        .set    lapic,SMP_prvspace + (NPTEPG-1) * PAGE_SIZE
#endif /* SMP */
...
#ifdef SMP
                .globl  cpu0prvpage
cpu0pp:         .long   0               /* phys addr cpu0 private pg */
cpu0prvpage:    .long   0               /* relocated version */

                .globl  SMPpt
SMPptpa:        .long   0               /* phys addr SMP page table */
SMPpt:          .long   0               /* relocated version */
#endif /* SMP */
...
#ifdef SMP
        .globl  KPTphys
#endif
...
#ifdef SMP
/* Allocate cpu0's private data page */
        ALLOCPAGES(1)
        movl    %esi,R(cpu0pp)
        addl    $KERNBASE, %esi
        movl    %esi, R(cpu0prvpage)    /* relocated to KVM space */

/* Allocate SMP page table page */
        ALLOCPAGES(1)
        movl    %esi,R(SMPptpa)
        addl    $KERNBASE, %esi
        movl    %esi, R(SMPpt)          /* relocated to KVM space */
#endif  /* SMP */
----------------------------------------------------------------------

Which indicates that it still exists.

Now I understand how %FS is being used; however, I object to it;
I object to anything that moves away from a per CPU resource for
truly per-CPU things, and a shared resource for truly shared things.
IMO, this is being abused for information that should be maintained
in design state, rather than in memory state.

The main probem here is that Luigi is talking about per-CPU per
process stuff.

If you guys keep going down this road, you are not going to be
thinking about CPU cycles as if they were anonymous resources.

It would be an incredible mistake, IMO, for Luigi to maintain
per-CPU state off the %FS.  His problem is that he wants to have
CPU state that is accessible from other CPUs and is non-statistical,
which puts it into a contention domain where locking is required,
because a read of the data must return a presice value, rather
than a statistic (which can be a snapshot).

Doing this effectively breaks the ability to both maintain the
work he does, and provide for a future ability to support per-CPU
run queues with a non-blocking statistical interaction as the only
real interaction.

If this happens, then FreeBSD will forever be limited to 4 CPUs
before it hits the point of diminishing returns, and Hyperthreading
affinity can not be worked in hierarchically so that there are
multiple preference sets (e.g. "I prefer to stay on the same CPU,
but if I can't, I prefer to stay on a CPU on the same Hyperthreaded
chip, but if I can't, then I will migrate elsewhere").  Being able
to represent arbitrarily scoped preference arrangements is necessary
to support NUMA and clustering with cluster migration, at some time
in the future.  I would really prefer that you guys not "legislate"
against the ability to run on NUMA systems right now, before you've
even thought about the problem, or the benefits.  Perhaps you can
talk to Chuck about it?

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message

help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3D504B0A.9FDB3A47>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation