From owner-freebsd-smp Mon Aug 5 15:14:21 2002 Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 28A6537B400 for ; Mon, 5 Aug 2002 15:14:16 -0700 (PDT) Received: from canning.wemm.org (canning.wemm.org [192.203.228.65]) by mx1.FreeBSD.org (Postfix) with ESMTP id C923043E42 for ; Mon, 5 Aug 2002 15:14:15 -0700 (PDT) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by canning.wemm.org (Postfix) with ESMTP id B0C732A7D6; Mon, 5 Aug 2002 15:14:15 -0700 (PDT) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: Luigi Rizzo Cc: Terry Lambert , smp@freebsd.org Subject: Re: how to create per-cpu variables in SMP kernels ? In-Reply-To: <20020805015340.A17716@iguana.icir.org> Date: Mon, 05 Aug 2002 15:14:15 -0700 From: Peter Wemm Message-Id: <20020805221415.B0C732A7D6@canning.wemm.org> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Luigi Rizzo wrote: > On Sun, Aug 04, 2002 at 11:44:27PM -0700, Terry Lambert wrote: > > > I would like to know how does the FreeBSD kernel (both in -current > > > and -stable) handle per-cpu variables such as curproc/curthread, cpuid, > ... > > > How expensive is to access them compared to regular variables ? > > > > Depends on the specific variable's implementation. If you are asking > > because you want to add one, then don't. 8-). They damage symmetry > > i am asking because in the code I see several instance of things like > > p = curproc; > > > in a context where curproc is not supposed to change. Is there a > performance bonus in doing this, or not ? Sort-of. There is both a compile time issue and a runtime issue. Using the %fs:variable segment overrides doesn't make a lot of difference, but the compiler is effectively wired so that they are treated as volatile. ie: p = curproc; foo(curproc); bar(curproc); return curproc; .. will cause *4* memory references with segment overrides. However: p = curproc; foo(p); bar(p); return p; .. will use *1*. Actually, this isn't quite correct on -current since there isn't a curproc percpu variable. It is really: #define curproc (curthread->td_proc) so the example above has actually got 8 memory references vs 2. Sure, you will probably hit L1 cache, but there is no guarantee of that. In the 'p' cases, it will probably end up as a register, but that is up to the compiler to figure out the best use of resources. Secondly, there is a compile time issue. "curproc" and "curthread" expand to monster macros that the compiler has to untangle and optimize. It contributes to compile time and memory to represent it in the rtl tree. Minimizing unnecessary overuse of them adds up over time. An example from -current.. This: static __inline int sigonstack(size_t sp) { register struct thread *td = curthread; struct proc *p = td->td_proc; return ((p->p_flag & P_ALTSTACK) ? ((sp - (size_t)p->p_sigstk.ss_sp) < p->p_sigstk.ss_size) : 0); } Becomes: static __inline int sigonstack(size_t sp) { register struct thread *td = ({ __typeof(((struct pcpu *)0)->pc_curthrea d) __result; if (sizeof(__result) == 1) { u_char __b; __asm volatile("movb %%fs: %1,%0" : "=r" (__b) : "m" (*(u_char *)(((size_t)(&((struct pcpu *)0)->pc_curthre ad))))); __result = *(__typeof(((struct pcpu *)0)->pc_curthread) *)&__b; } else if (sizeof(__result) == 2) { u_short __w; __asm volatile("movw %%fs:%1,%0" : "=r " (__w) : "m" (*(u_short *)(((size_t)(&((struct pcpu *)0)->pc_curthread))))); __ result = *(__typeof(((struct pcpu *)0)->pc_curthread) *)&__w; } else if (sizeof( __result) == 4) { u_int __i; __asm volatile("movl %%fs:%1,%0" : "=r" (__i) : "m" (*(u_int *)(((size_t)(&((struct pcpu *)0)->pc_curthread))))); __result = *(__ty peof(((struct pcpu *)0)->pc_curthread) *)&__i; } else { __result = *({ __typeof( ((struct pcpu *)0)->pc_curthread) *__p; __asm volatile("movl %%fs:%1,%0; addl %2 ,%0" : "=r" (__p) : "m" (*(struct pcpu *)(((size_t)(&((struct pcpu *)0)->pc_prvs pace)))), "i" (((size_t)(&((struct pcpu *)0)->pc_curthread)))); __p; }); } __res ult; }); struct proc *p = td->td_proc; return ((p->p_flag & 0x4000000) ? ((sp - (size_t)p->p_sigstk.ss_sp) < p->p_sigstk.ss_size) : 0); } However, if I change it like this: static __inline int sigonstack(size_t sp) { return ((curproc->p_flag & P_ALTSTACK) ? ((sp - (size_t)curproc->p_sigstk.ss_sp) < curproc->p_sigstk.ss_size) : 0); } it becomes: static __inline int sigonstack(size_t sp) { return (((({ __typeof(((struct pcpu *)0)->pc_curthread) __result; if (si zeof(__result) == 1) { u_char __b; __asm volatile("movb %%fs:%1,%0" : "=r" (__b) : "m" (*(u_char *)(((size_t)(&((struct pcpu *)0)->pc_curthread))))); __result = *(__typeof(((struct pcpu *)0)->pc_curthread) *)&__b; } else if (sizeof(__result ) == 2) { u_short __w; __asm volatile("movw %%fs:%1,%0" : "=r" (__w) : "m" (*(u_ short *)(((size_t)(&((struct pcpu *)0)->pc_curthread))))); __result = *(__typeof (((struct pcpu *)0)->pc_curthread) *)&__w; } else if (sizeof(__result) == 4) { u _int __i; __asm volatile("movl %%fs:%1,%0" : "=r" (__i) : "m" (*(u_int *)(((size _t)(&((struct pcpu *)0)->pc_curthread))))); __result = *(__typeof(((struct pcpu *)0)->pc_curthread) *)&__i; } else { __result = *({ __typeof(((struct pcpu *)0)- >pc_curthread) *__p; __asm volatile("movl %%fs:%1,%0; addl %2,%0" : "=r" (__p) : "m" (*(struct pcpu *)(((size_t)(&((struct pcpu *)0)->pc_prvspace)))), "i" (((si ze_t)(&((struct pcpu *)0)->pc_curthread)))); __p; }); } __result; })->td_proc)-> p_flag & 0x4000000) ? ((sp - (size_t)(({ __typeof(((struct pcpu *)0)->pc_curthread) __resu lt; if (sizeof(__result) == 1) { u_char __b; __asm volatile("movb %%fs:%1,%0" : "=r" (__b) : "m" (*(u_char *)(((size_t)(&((struct pcpu *)0)->pc_curthread))))); __result = *(__typeof(((struct pcpu *)0)->pc_curthread) *)&__b; } else if (sizeo f(__result) == 2) { u_short __w; __asm volatile("movw %%fs:%1,%0" : "=r" (__w) : "m" (*(u_short *)(((size_t)(&((struct pcpu *)0)->pc_curthread))))); __result = *(__typeof(((struct pcpu *)0)->pc_curthread) *)&__w; } else if (sizeof(__result) == 4) { u_int __i; __asm volatile("movl %%fs:%1,%0" : "=r" (__i) : "m" (*(u_int *)(((size_t)(&((struct pcpu *)0)->pc_curthread))))); __result = *(__typeof(((st ruct pcpu *)0)->pc_curthread) *)&__i; } else { __result = *({ __typeof(((struct pcpu *)0)->pc_curthread) *__p; __asm volatile("movl %%fs:%1,%0; addl %2,%0" : "= r" (__p) : "m" (*(struct pcpu *)(((size_t)(&((struct pcpu *)0)->pc_prvspace)))), "i" (((size_t)(&((struct pcpu *)0)->pc_curthread)))); __p; }); } __result; })-> td_proc)->p_sigstk.ss_sp) < (({ __typeof(((struct pcpu *)0)->pc_curthread) __res ult; if (sizeof(__result) == 1) { u_char __b; __asm volatile("movb %%fs:%1,%0" : "=r" (__b) : "m" (*(u_char *)(((size_t)(&((struct pcpu *)0)->pc_curthread))))); __result = *(__typeof(((struct pcpu *)0)->pc_curthread) *)&__b; } else if (size of(__result) == 2) { u_short __w; __asm volatile("movw %%fs:%1,%0" : "=r" (__w) : "m" (*(u_short *)(((size_t)(&((struct pcpu *)0)->pc_curthread))))); __result = *(__typeof(((struct pcpu *)0)->pc_curthread) *)&__w; } else if (sizeof(__result ) == 4) { u_int __i; __asm volatile("movl %%fs:%1,%0" : "=r" (__i) : "m" (*(u_in t *)(((size_t)(&((struct pcpu *)0)->pc_curthread))))); __result = *(__typeof(((s truct pcpu *)0)->pc_curthread) *)&__i; } else { __result = *({ __typeof(((struct pcpu *)0)->pc_curthread) *__p; __asm volatile("movl %%fs:%1,%0; addl %2,%0" : " =r" (__p) : "m" (*(struct pcpu *)(((size_t)(&((struct pcpu *)0)->pc_prvspace)))) , "i" (((size_t)(&((struct pcpu *)0)->pc_curthread)))); __p; }); } __result; })- >td_proc)->p_sigstk.ss_size) : 0); } Also, when you get a syntax error due to a #define collision in the middle of that mess, which would you rather be trying to debug the preprocessor output from? Cheers, -Peter -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message