Date: Mon, 16 May 2011 23:54:54 +0200 From: Attilio Rao <attilio@freebsd.org> To: Max Laier <max@love2party.net> Cc: Andriy Gapon <avg@freebsd.org>, FreeBSD current <freebsd-current@freebsd.org>, Peter Grehan <grehan@freebsd.org>, neel@freebsd.org Subject: Re: proposed smp_rendezvous change Message-ID: <BANLkTik8CqtiP9OgvBpL08dqK6Aj%2BLQ3OA@mail.gmail.com> In-Reply-To: <201105161738.53366.max@love2party.net> References: <4DCD357D.6000109@FreeBSD.org> <201105161630.44577.max@love2party.net> <201105161646.03338.jhb@freebsd.org> <201105161738.53366.max@love2party.net>
next in thread | previous in thread | raw e-mail | index | archive | help
2011/5/16 Max Laier <max@love2party.net>: > On Monday 16 May 2011 16:46:03 John Baldwin wrote: >> On Monday, May 16, 2011 4:30:44 pm Max Laier wrote: >> > On Monday 16 May 2011 14:21:27 John Baldwin wrote: >> > > Yes, we need to fix that. =C2=A0Humm, it doesn't preempt when you do= a >> > > critical_exit() though? =C2=A0Or do you use a hand-rolled critical e= xit that >> > > doesn't do a deferred preemption? >> > >> > Right now I just did a manual td_critnest++/--, but I guess ... >> >> Ah, ok, so you would "lose" a preemption. =C2=A0That's not really ideal. >> >> > > Actually, I'm curious how the spin unlock inside the IPI could yield >> > > the CPU. =C2=A0Oh, is rmlock doing a wakeup inside the IPI handler? = =C2=A0I >> > > guess that is ok as long as the critical_exit() just defers the >> > > preemption to the end of the IPI handler. >> > >> > ... the earliest point where it is safe to preempt is after doing the >> > >> > =C2=A0 =C2=A0atomic_add_int(&smp_rv_waiters[2], 1); >> > >> > so that we can start other IPIs again. =C2=A0However, since we don't a= ccept >> > new IPIs until we signal EOI in the MD code (on amd64), this might sti= ll >> > not be a good place to do the yield?!? >> >> Hmm, yeah, you would want to do the EOI before you yield. =C2=A0However,= we >> could actually move the EOI up before calling the MI code so long as we >> leave interrupts disabled for the duration of the handler (which we do). >> >> > The spin unlock boils down to a critical_exit() and unless we did a >> > critical_enter() at some point during the redenvouz setup, we will >> > yield() if we owepreempt. =C2=A0I'm not quite sure how that can happen= , but >> > it seems like there is a path that allows the scheduler to set it from= a >> > foreign CPU. >> >> No, it is only set on curthread by curthread. =C2=A0This is actually my = main >> question. =C2=A0I've no idea how this could happen unless the rmlock cod= e is >> actually triggering a wakeup or sched_add() in its rendezvous handler. >> >> I don't see anything in rm_cleanIPI() that would do that however. >> >> I wonder if your original issue was really fixed =C2=A0just by the first= patch >> you had which fixed the race in smp_rendezvous()? > > I found the stack that lead me to this patch in the first place: > > #0 =C2=A0sched_switch (td=3D0xffffff011a970000, newtd=3D0xffffff006e6784b= 0, > flags=3D4) at src/sys/kern/sched_ule.c:1939 > #1 =C2=A00xffffffff80285c7f in mi_switch (flags=3D6, newtd=3D0x0) at > src/sys/kern/kern_synch.c:475 > #2 =C2=A00xffffffff802a2cb3 in critical_exit () at src/sys/kern/kern_swit= ch.c:185 > #3 =C2=A00xffffffff80465807 in spinlock_exit () at > src/sys/amd64/amd64/machdep.c:1458 > #4 =C2=A00xffffffff8027adea in rm_cleanIPI (arg=3D<value optimized out>) = at > src/sys/kern/kern_rmlock.c:180 > #5 =C2=A00xffffffff802b9887 in smp_rendezvous_action () at > src/sys/kern/subr_smp.c:402 > #6 =C2=A00xffffffff8045e2a4 in Xrendezvous () at > src/sys/amd64/amd64/apic_vector.S:235 > #7 =C2=A00xffffffff802a2c6e in critical_exit () at src/sys/kern/kern_swit= ch.c:179 > #8 =C2=A00xffffffff804365ba in uma_zfree_arg (zone=3D0xffffff009ff4b5a0, > item=3D0xffffff000f34cd40, udata=3D0xffffff000f34ce08) at > src/sys/vm/uma_core.c:2370 > . > . > . > > and now that I look at it again, it is clear that critical_exit() just is= n't > interrupt safe. =C2=A0I'm not sure how to fix that, yet ... but this: > > > =C2=A0 =C2=A0 =C2=A0 =C2=A0if (td->td_critnest =3D=3D 1) { > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0td->td_critnest = =3D 0; > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (td->td_owepree= mpt) { > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0td->td_critnest =3D 1; > > clearly doesn't work. I'm sorry if I didn't reply to the whole rendezvous thread, but as long as there is so many people taking care of it, I'll stay hidden in my corner. I just wanted to tell that I think you are misunderstanding what critical section is supposed to do. When an interrupt fires, it goes on the old "interrupt/kernel context" which means it has not a context of his own. That is the reason why we disable interrupts on spinlocks (or similar workaround for !x86 architectures) and this is why spinlocks are the only protection usable in code that runs in interrupt context. Preempting just means another thread will be scheduler in the middle of another thread execution path. This code is perfectly fine if you consider curthread won't be descheduled while it is executing. Attilio --=20 Peace can only be achieved by understanding - A. Einstein
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BANLkTik8CqtiP9OgvBpL08dqK6Aj%2BLQ3OA>