From owner-freebsd-current@FreeBSD.ORG Mon May 16 22:12:30 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 44C08106566B; Mon, 16 May 2011 22:12:30 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-yw0-f54.google.com (mail-yw0-f54.google.com [209.85.213.54]) by mx1.freebsd.org (Postfix) with ESMTP id B7FC98FC15; Mon, 16 May 2011 22:12:29 +0000 (UTC) Received: by ywf7 with SMTP id 7so2106033ywf.13 for ; Mon, 16 May 2011 15:12:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=4MRN5oeAB+eMpTmIWPz8Ewly09ZafaL+31zB8/0seSw=; b=kowkfuf11TvAN7YN6K0D0k9NCrJVFhy8KoKcVcMO0mgaGed3a8sah0vJNOhLURJiw2 SDbPbBctLQsI+GJSz4qVO0y37XyQCYPUgLxSWv+sj9/ltPZdNPcRy7bFz/a+S/rJR8tD HgvnXjhtMnWvGbz5wGAhAM4fZjeko6jGxt6HM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=ECtXGWeQP4YQyB+v2/ROntwOl27E8uwyIK1XJLxDPNF9O+UARLIM9CljPpSUVjLEBK nuvYCS4fD9QoqtC4gd9/8BbCG0WClQmTBk9FUP1KMbw/AXgqoW2Iu6+wjndPTQlWsemA Hh66hPBbqmn2F/Q5Z/iL6/dW6AgfsXfsx5f/8= MIME-Version: 1.0 Received: by 10.236.80.41 with SMTP id j29mr4554606yhe.214.1305583948935; Mon, 16 May 2011 15:12:28 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.236.103.130 with HTTP; Mon, 16 May 2011 15:12:28 -0700 (PDT) In-Reply-To: <201105161805.51188.max@love2party.net> References: <4DCD357D.6000109@FreeBSD.org> <201105161738.53366.max@love2party.net> <201105161805.51188.max@love2party.net> Date: Tue, 17 May 2011 00:12:28 +0200 X-Google-Sender-Auth: ihWvFVVwVW1IrnT_S9j2BSX-VWQ Message-ID: From: Attilio Rao To: Max Laier Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: FreeBSD current , Peter Grehan , Andriy Gapon , neel@freebsd.org Subject: Re: proposed smp_rendezvous change X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 May 2011 22:12:30 -0000 2011/5/17 Max Laier : > On Monday 16 May 2011 17:54:54 Attilio Rao wrote: >> 2011/5/16 Max Laier : >> > On Monday 16 May 2011 16:46:03 John Baldwin wrote: >> >> On Monday, May 16, 2011 4:30:44 pm Max Laier wrote: >> >> > On Monday 16 May 2011 14:21:27 John Baldwin wrote: >> >> > > Yes, we need to fix that. =C2=A0Humm, it doesn't preempt when you= do a >> >> > > critical_exit() though? =C2=A0Or do you use a hand-rolled critica= l exit >> >> > > that doesn't do a deferred preemption? >> >> > >> >> > Right now I just did a manual td_critnest++/--, but I guess ... >> >> >> >> Ah, ok, so you would "lose" a preemption. =C2=A0That's not really ide= al. >> >> >> >> > > Actually, I'm curious how the spin unlock inside the IPI could yi= eld >> >> > > the CPU. =C2=A0Oh, is rmlock doing a wakeup inside the IPI handle= r? =C2=A0I >> >> > > guess that is ok as long as the critical_exit() just defers the >> >> > > preemption to the end of the IPI handler. >> >> > >> >> > ... the earliest point where it is safe to preempt is after doing t= he >> >> > >> >> > =C2=A0 =C2=A0atomic_add_int(&smp_rv_waiters[2], 1); >> >> > >> >> > so that we can start other IPIs again. =C2=A0However, since we don'= t accept >> >> > new IPIs until we signal EOI in the MD code (on amd64), this might >> >> > still not be a good place to do the yield?!? >> >> >> >> Hmm, yeah, you would want to do the EOI before you yield. =C2=A0Howev= er, we >> >> could actually move the EOI up before calling the MI code so long as = we >> >> leave interrupts disabled for the duration of the handler (which we d= o). >> >> >> >> > The spin unlock boils down to a critical_exit() and unless we did a >> >> > critical_enter() at some point during the redenvouz setup, we will >> >> > yield() if we owepreempt. =C2=A0I'm not quite sure how that can hap= pen, but >> >> > it seems like there is a path that allows the scheduler to set it f= rom >> >> > a foreign CPU. >> >> >> >> No, it is only set on curthread by curthread. =C2=A0This is actually = my main >> >> question. =C2=A0I've no idea how this could happen unless the rmlock = code is >> >> actually triggering a wakeup or sched_add() in its rendezvous handler= . >> >> >> >> I don't see anything in rm_cleanIPI() that would do that however. >> >> >> >> I wonder if your original issue was really fixed =C2=A0just by the fi= rst >> >> patch you had which fixed the race in smp_rendezvous()? >> > >> > I found the stack that lead me to this patch in the first place: >> > >> > #0 =C2=A0sched_switch (td=3D0xffffff011a970000, newtd=3D0xffffff006e67= 84b0, >> > flags=3D4) at src/sys/kern/sched_ule.c:1939 >> > #1 =C2=A00xffffffff80285c7f in mi_switch (flags=3D6, newtd=3D0x0) at >> > src/sys/kern/kern_synch.c:475 >> > #2 =C2=A00xffffffff802a2cb3 in critical_exit () at >> > src/sys/kern/kern_switch.c:185 #3 =C2=A00xffffffff80465807 in spinlock= _exit >> > () at >> > src/sys/amd64/amd64/machdep.c:1458 >> > #4 =C2=A00xffffffff8027adea in rm_cleanIPI (arg=3D) at >> > src/sys/kern/kern_rmlock.c:180 >> > #5 =C2=A00xffffffff802b9887 in smp_rendezvous_action () at >> > src/sys/kern/subr_smp.c:402 >> > #6 =C2=A00xffffffff8045e2a4 in Xrendezvous () at >> > src/sys/amd64/amd64/apic_vector.S:235 >> > #7 =C2=A00xffffffff802a2c6e in critical_exit () at >> > src/sys/kern/kern_switch.c:179 #8 =C2=A00xffffffff804365ba in uma_zfre= e_arg >> > (zone=3D0xffffff009ff4b5a0, item=3D0xffffff000f34cd40, >> > udata=3D0xffffff000f34ce08) at >> > src/sys/vm/uma_core.c:2370 >> > . >> > . >> > . >> > >> > and now that I look at it again, it is clear that critical_exit() just >> > isn't interrupt safe. =C2=A0I'm not sure how to fix that, yet ... but = this: >> > >> > >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0if (td->td_critnest =3D=3D 1) { >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0td->td_critnest= =3D 0; >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (td->td_owep= reempt) { >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0td->td_critnest =3D 1; >> > >> > clearly doesn't work. >> >> I'm sorry if I didn't reply to the whole rendezvous thread, but as >> long as there is so many people taking care of it, I'll stay hidden in >> my corner. >> >> I just wanted to tell that I think you are misunderstanding what >> critical section is supposed to do. >> >> When an interrupt fires, it goes on the old "interrupt/kernel context" >> which means it has not a context of his own. That is the reason why we >> disable interrupts on spinlocks (or similar workaround for !x86 >> architectures) and this is why spinlocks are the only protection >> usable in code that runs in interrupt context. >> >> Preempting just means another thread will be scheduler in the middle >> of another thread execution path. >> >> This code is perfectly fine if you consider curthread won't be >> descheduled while it is executing. > > Well, no - it is not. =C2=A0With this you can end up with a curthread tha= t has > td_critnest=3D0 and td_owepreempt=3D1 in interrupt context. =C2=A0If you = use a spinlock > on such a thread, it will do the preemption at the point where you drop t= he > spinlock, this is bad in some circumstances. =C2=A0One example is the smp= _rendevous > case we are discussing here. This circumstances are further protected by another call to critical_enter(), by consumers or however upper layer calls. This is why, for example, spinlock_enter() does call critical_enter() even if it actually disables interrupts or why we disable preemption in other cases where the interrupts are already disabled. Attilio --=20 Peace can only be achieved by understanding - A. Einstein