Date: Thu, 21 Feb 2013 10:43:49 -0500 From: John Baldwin <jhb@freebsd.org> To: Svatopluk Kraus <onwahe@gmail.com> Cc: Konstantin Belousov <kostikbel@gmail.com>, freebsd-current@freebsd.org Subject: Re: [patch] i386 pmap sysmaps_pcpu[] atomic access Message-ID: <201302211043.49906.jhb@freebsd.org> In-Reply-To: <CAFHCsPXEDiaKSkHNbycs%2B1DKkL7BcYJBk5eN%2BKpsLY8B%2B-u5gQ@mail.gmail.com> References: <CAFHCsPUVTM9jfrnzY72YsPszLWkg-UaJcycTR4xXcS%2BfPzS1Vg@mail.gmail.com> <201302201022.29294.jhb@freebsd.org> <CAFHCsPXEDiaKSkHNbycs%2B1DKkL7BcYJBk5eN%2BKpsLY8B%2B-u5gQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday, February 21, 2013 7:53:52 am Svatopluk Kraus wrote: > On Wed, Feb 20, 2013 at 4:22 PM, John Baldwin <jhb@freebsd.org> wrote: > > On Wednesday, February 20, 2013 7:31:08 am Svatopluk Kraus wrote: > >> On Tue, Feb 19, 2013 at 7:51 PM, Konstantin Belousov > >> <kostikbel@gmail.com> wrote: > >> > On Mon, Feb 18, 2013 at 11:18:16PM +0100, Svatopluk Kraus wrote: > >> >> On Mon, Feb 18, 2013 at 9:36 PM, Konstantin Belousov > >> >> <kostikbel@gmail.com> wrote: > >> >> Well, I'm taking a part on porting FreeBSD to ARM11mpcore. UP case was > >> >> simple. SMP case is more complex and rather new for me. Recently, I > >> >> was solving a problem with PCPU stuff. For example, PCPU_GET is > >> >> implemented by one instruction on i386 arch. So, a need of atomicity > >> >> with respect to interrupts can be overlooked. On load-store archs, the > >> >> implementation which works in SMP case is not so simple. And what > >> >> works in UP case (single PCPU), not works in SMP case. Believe me, > >> >> mysterious and sporadic 'mutex not owned' assertions and others ones > >> >> caused by curthreads mess, it takes a while ... > >> > Note that PCPU_GET() is not needed to be atomic. The reason is that the code > >> > which uses its result would not be atomic (single-instruction or whatever, see > >> > below). Thus, either the preemption should not matter since the action with > >> > the per-cpu data is advisory, as is in the case of i386 pmap you noted. > >> > or some external measures should be applied in advance to the containing > >> > region (which you proposed, by bracing with sched_pin()). > >> > >> So, it's advisory in the case of i386 pmap... Well, if you can live > >> with that, I can too. > >> > >> > > >> > Also, note that it is not interrupts but preemption which is concern. > >> > >> Yes and no. In theory, yes, a preemption is a concern. In FreeBSD, > >> however, sched_pin() and critical_enter() and their counterparts are > >> implemented with help of curthread. And curthread definition falls to > >> PCPU_GET(curthread) if not defined in other way. So, curthread should > >> be atomic with respect to interrupts and in general, PCPU_GET() should > >> be too. Note that spinlock_enter() definitions often (always) use > >> curthread too. Anyhow, it's defined in MD code and can be defined for > >> each arch separately. > > > > curthread is a bit magic. :) If you perform a context switch during an > > interrupt (which will change 'curthread') you also change your register state. > > When you resume, the register state is also restored. This means that while > > something like 'PCPU_GET(cpuid)' might be stale after you read it, 'curthread' > > never is. However, it is true that actually reading curthread has to be > > atomic. If your read of curthread looks like: > > > > mov <pcpu reg>, r0 > > add <offset of pc_curthread>, r0 > > ld r0, r1 > > > > Then that will indeed break. Alpha used a fixed register for 'pcpu_reg' > > (as does ia64 IIRC). OTOH, you might also be able to depend on the fact that > > pc_curthread is the first thing in 'struct pcpu' (and always will be, you could > > add a CTASSERT to future-proof). In that case, you can remove the 'add' > > instruction and instead just do: > > > > ld <pcpu reg>, r1 > > > > which is fine. > > Just for the record. There are three extra (coprocessor) registers per > a core in arm11mpcore (armv6k). Unfortunately only one is Privileged. > The other ones are respectively User Read only and User Read Write. > For now, we are using the Privileged one to store pcpu pointer > (curthread is correctly set during each context switch). Thus, using a > coprocessor register means that reading of curthread is not a single > instruction implementation now. After we figured out the curthread > issue in SMP case, using a fixed (processor) register for pcpu is an > option. Meanwhile, we disable interrupts before reading of curthread > and enable them after. The same is done for other PCPU stuff. For now > we have not stable system enough for profiling, however, when it will > be, it would be interesting to learn how various implementations of > curthread reading impact system performance. curthread is read _a lot_, so I would expect this to hurt. What we did on Alpha was to use a fixed register for pcpu access, but we used the equivalent of a coprocessor register to also store the value so we could set that fixed register on entry to the kernel (userland was free to use the register for its own purposes). -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201302211043.49906.jhb>