From owner-freebsd-current@FreeBSD.ORG Mon May 16 22:05:55 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 55258106566C; Mon, 16 May 2011 22:05:55 +0000 (UTC) (envelope-from max@love2party.net) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.17.10]) by mx1.freebsd.org (Postfix) with ESMTP id E6BC88FC14; Mon, 16 May 2011 22:05:54 +0000 (UTC) Received: from maxlap.localnet (gw-100.extranet.sea01.isilon.com [74.85.160.100]) by mrelayeu.kundenserver.de (node=mrbap1) with ESMTP (Nemesis) id 0MN728-1QJrwG2q9g-007Joq; Tue, 17 May 2011 00:05:53 +0200 From: Max Laier To: Attilio Rao Date: Mon, 16 May 2011 18:05:50 -0400 User-Agent: KMail/1.13.6 (FreeBSD/9.0-CURRENT; KDE/4.6.1; amd64; ; ) References: <4DCD357D.6000109@FreeBSD.org> <201105161738.53366.max@love2party.net> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <201105161805.51188.max@love2party.net> X-Provags-ID: V02:K0:J1eHK2hscrXA73KTICIFY/Ps0IbGJK51Y/Byo2xpKRr mlq+/ZfDyVf4iPcBt+XI+q4raXjt8WwMbNQazbd/Ou+gQ4Th3V 59Xf+wPTpz6SLJyu6jr3cwy+qZIbC3atvEcSAvE2Rzt1c4/H83 w0yEFqsWe216F8Mqh3Y6wyPDG2jhcIuLQvVH3qELqV/jZQrV/A OWvfHEYdOG36iE+925Fog== Cc: Andriy Gapon , FreeBSD current , Peter Grehan , neel@freebsd.org Subject: Re: proposed smp_rendezvous change X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 May 2011 22:05:55 -0000 On Monday 16 May 2011 17:54:54 Attilio Rao wrote: > 2011/5/16 Max Laier : > > On Monday 16 May 2011 16:46:03 John Baldwin wrote: > >> On Monday, May 16, 2011 4:30:44 pm Max Laier wrote: > >> > On Monday 16 May 2011 14:21:27 John Baldwin wrote: > >> > > Yes, we need to fix that. Humm, it doesn't preempt when you do a > >> > > critical_exit() though? Or do you use a hand-rolled critical exit > >> > > that doesn't do a deferred preemption? > >> > > >> > Right now I just did a manual td_critnest++/--, but I guess ... > >> > >> Ah, ok, so you would "lose" a preemption. That's not really ideal. > >> > >> > > Actually, I'm curious how the spin unlock inside the IPI could yield > >> > > the CPU. Oh, is rmlock doing a wakeup inside the IPI handler? I > >> > > guess that is ok as long as the critical_exit() just defers the > >> > > preemption to the end of the IPI handler. > >> > > >> > ... the earliest point where it is safe to preempt is after doing the > >> > > >> > atomic_add_int(&smp_rv_waiters[2], 1); > >> > > >> > so that we can start other IPIs again. However, since we don't accept > >> > new IPIs until we signal EOI in the MD code (on amd64), this might > >> > still not be a good place to do the yield?!? > >> > >> Hmm, yeah, you would want to do the EOI before you yield. However, we > >> could actually move the EOI up before calling the MI code so long as we > >> leave interrupts disabled for the duration of the handler (which we do). > >> > >> > The spin unlock boils down to a critical_exit() and unless we did a > >> > critical_enter() at some point during the redenvouz setup, we will > >> > yield() if we owepreempt. I'm not quite sure how that can happen, but > >> > it seems like there is a path that allows the scheduler to set it from > >> > a foreign CPU. > >> > >> No, it is only set on curthread by curthread. This is actually my main > >> question. I've no idea how this could happen unless the rmlock code is > >> actually triggering a wakeup or sched_add() in its rendezvous handler. > >> > >> I don't see anything in rm_cleanIPI() that would do that however. > >> > >> I wonder if your original issue was really fixed just by the first > >> patch you had which fixed the race in smp_rendezvous()? > > > > I found the stack that lead me to this patch in the first place: > > > > #0 sched_switch (td=0xffffff011a970000, newtd=0xffffff006e6784b0, > > flags=4) at src/sys/kern/sched_ule.c:1939 > > #1 0xffffffff80285c7f in mi_switch (flags=6, newtd=0x0) at > > src/sys/kern/kern_synch.c:475 > > #2 0xffffffff802a2cb3 in critical_exit () at > > src/sys/kern/kern_switch.c:185 #3 0xffffffff80465807 in spinlock_exit > > () at > > src/sys/amd64/amd64/machdep.c:1458 > > #4 0xffffffff8027adea in rm_cleanIPI (arg=) at > > src/sys/kern/kern_rmlock.c:180 > > #5 0xffffffff802b9887 in smp_rendezvous_action () at > > src/sys/kern/subr_smp.c:402 > > #6 0xffffffff8045e2a4 in Xrendezvous () at > > src/sys/amd64/amd64/apic_vector.S:235 > > #7 0xffffffff802a2c6e in critical_exit () at > > src/sys/kern/kern_switch.c:179 #8 0xffffffff804365ba in uma_zfree_arg > > (zone=0xffffff009ff4b5a0, item=0xffffff000f34cd40, > > udata=0xffffff000f34ce08) at > > src/sys/vm/uma_core.c:2370 > > . > > . > > . > > > > and now that I look at it again, it is clear that critical_exit() just > > isn't interrupt safe. I'm not sure how to fix that, yet ... but this: > > > > > > if (td->td_critnest == 1) { > > td->td_critnest = 0; > > if (td->td_owepreempt) { > > td->td_critnest = 1; > > > > clearly doesn't work. > > I'm sorry if I didn't reply to the whole rendezvous thread, but as > long as there is so many people taking care of it, I'll stay hidden in > my corner. > > I just wanted to tell that I think you are misunderstanding what > critical section is supposed to do. > > When an interrupt fires, it goes on the old "interrupt/kernel context" > which means it has not a context of his own. That is the reason why we > disable interrupts on spinlocks (or similar workaround for !x86 > architectures) and this is why spinlocks are the only protection > usable in code that runs in interrupt context. > > Preempting just means another thread will be scheduler in the middle > of another thread execution path. > > This code is perfectly fine if you consider curthread won't be > descheduled while it is executing. Well, no - it is not. With this you can end up with a curthread that has td_critnest=0 and td_owepreempt=1 in interrupt context. If you use a spinlock on such a thread, it will do the preemption at the point where you drop the spinlock, this is bad in some circumstances. One example is the smp_rendevous case we are discussing here. Max