From owner-freebsd-current@FreeBSD.ORG Mon May 16 22:22:39 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7B8581065670; Mon, 16 May 2011 22:22:39 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com [209.85.213.182]) by mx1.freebsd.org (Postfix) with ESMTP id 1CA2C8FC12; Mon, 16 May 2011 22:22:38 +0000 (UTC) Received: by yxl31 with SMTP id 31so2113350yxl.13 for ; Mon, 16 May 2011 15:22:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=sEEGKaB60OhtBpbWkg18Mh4/aFvdGMGZQJMqmQVwt+k=; b=U9qrKfJDe75GxxEEsAg7M/H17AdvX2f8Z2xRjZyZ8NkSjrIfe+hIpPUWX1tLRj82DK SdudYZgK/ru5kvHmuBcu1Dw7NUZADPEMyvpwzuD2l11ik28RKguiS90GaiOckvnwzP3f NG22ygXevTZUmEF+ODYZdeZ02AXBA9MhHlmZ0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=Bld46MqwrPbpD8cTDu4l6eph7wkXr+M1OdwudU5T9nroE3xKR+ERGNBARo4lnjVNPu kEkK8kFbg6Bc1aiRqg9d7jqZH7Dtm4vi7b21xkNMqOJCMk2PfsLLyXQUnexkjM94sNtC zcEZi5dyGOM9M0w9H/2UJ23TjvlzlBegaU1Ns= MIME-Version: 1.0 Received: by 10.236.80.41 with SMTP id j29mr4543221yhe.214.1305582894873; Mon, 16 May 2011 14:54:54 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.236.103.130 with HTTP; Mon, 16 May 2011 14:54:54 -0700 (PDT) In-Reply-To: <201105161738.53366.max@love2party.net> References: <4DCD357D.6000109@FreeBSD.org> <201105161630.44577.max@love2party.net> <201105161646.03338.jhb@freebsd.org> <201105161738.53366.max@love2party.net> Date: Mon, 16 May 2011 23:54:54 +0200 X-Google-Sender-Auth: YUynXOhpRrimBc3cZn_zGD9p2MM Message-ID: From: Attilio Rao To: Max Laier Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: Andriy Gapon , FreeBSD current , Peter Grehan , neel@freebsd.org Subject: Re: proposed smp_rendezvous change X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 May 2011 22:22:39 -0000 2011/5/16 Max Laier : > On Monday 16 May 2011 16:46:03 John Baldwin wrote: >> On Monday, May 16, 2011 4:30:44 pm Max Laier wrote: >> > On Monday 16 May 2011 14:21:27 John Baldwin wrote: >> > > Yes, we need to fix that. =C2=A0Humm, it doesn't preempt when you do= a >> > > critical_exit() though? =C2=A0Or do you use a hand-rolled critical e= xit that >> > > doesn't do a deferred preemption? >> > >> > Right now I just did a manual td_critnest++/--, but I guess ... >> >> Ah, ok, so you would "lose" a preemption. =C2=A0That's not really ideal. >> >> > > Actually, I'm curious how the spin unlock inside the IPI could yield >> > > the CPU. =C2=A0Oh, is rmlock doing a wakeup inside the IPI handler? = =C2=A0I >> > > guess that is ok as long as the critical_exit() just defers the >> > > preemption to the end of the IPI handler. >> > >> > ... the earliest point where it is safe to preempt is after doing the >> > >> > =C2=A0 =C2=A0atomic_add_int(&smp_rv_waiters[2], 1); >> > >> > so that we can start other IPIs again. =C2=A0However, since we don't a= ccept >> > new IPIs until we signal EOI in the MD code (on amd64), this might sti= ll >> > not be a good place to do the yield?!? >> >> Hmm, yeah, you would want to do the EOI before you yield. =C2=A0However,= we >> could actually move the EOI up before calling the MI code so long as we >> leave interrupts disabled for the duration of the handler (which we do). >> >> > The spin unlock boils down to a critical_exit() and unless we did a >> > critical_enter() at some point during the redenvouz setup, we will >> > yield() if we owepreempt. =C2=A0I'm not quite sure how that can happen= , but >> > it seems like there is a path that allows the scheduler to set it from= a >> > foreign CPU. >> >> No, it is only set on curthread by curthread. =C2=A0This is actually my = main >> question. =C2=A0I've no idea how this could happen unless the rmlock cod= e is >> actually triggering a wakeup or sched_add() in its rendezvous handler. >> >> I don't see anything in rm_cleanIPI() that would do that however. >> >> I wonder if your original issue was really fixed =C2=A0just by the first= patch >> you had which fixed the race in smp_rendezvous()? > > I found the stack that lead me to this patch in the first place: > > #0 =C2=A0sched_switch (td=3D0xffffff011a970000, newtd=3D0xffffff006e6784b= 0, > flags=3D4) at src/sys/kern/sched_ule.c:1939 > #1 =C2=A00xffffffff80285c7f in mi_switch (flags=3D6, newtd=3D0x0) at > src/sys/kern/kern_synch.c:475 > #2 =C2=A00xffffffff802a2cb3 in critical_exit () at src/sys/kern/kern_swit= ch.c:185 > #3 =C2=A00xffffffff80465807 in spinlock_exit () at > src/sys/amd64/amd64/machdep.c:1458 > #4 =C2=A00xffffffff8027adea in rm_cleanIPI (arg=3D) = at > src/sys/kern/kern_rmlock.c:180 > #5 =C2=A00xffffffff802b9887 in smp_rendezvous_action () at > src/sys/kern/subr_smp.c:402 > #6 =C2=A00xffffffff8045e2a4 in Xrendezvous () at > src/sys/amd64/amd64/apic_vector.S:235 > #7 =C2=A00xffffffff802a2c6e in critical_exit () at src/sys/kern/kern_swit= ch.c:179 > #8 =C2=A00xffffffff804365ba in uma_zfree_arg (zone=3D0xffffff009ff4b5a0, > item=3D0xffffff000f34cd40, udata=3D0xffffff000f34ce08) at > src/sys/vm/uma_core.c:2370 > . > . > . > > and now that I look at it again, it is clear that critical_exit() just is= n't > interrupt safe. =C2=A0I'm not sure how to fix that, yet ... but this: > > > =C2=A0 =C2=A0 =C2=A0 =C2=A0if (td->td_critnest =3D=3D 1) { > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0td->td_critnest = =3D 0; > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (td->td_owepree= mpt) { > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0td->td_critnest =3D 1; > > clearly doesn't work. I'm sorry if I didn't reply to the whole rendezvous thread, but as long as there is so many people taking care of it, I'll stay hidden in my corner. I just wanted to tell that I think you are misunderstanding what critical section is supposed to do. When an interrupt fires, it goes on the old "interrupt/kernel context" which means it has not a context of his own. That is the reason why we disable interrupts on spinlocks (or similar workaround for !x86 architectures) and this is why spinlocks are the only protection usable in code that runs in interrupt context. Preempting just means another thread will be scheduler in the middle of another thread execution path. This code is perfectly fine if you consider curthread won't be descheduled while it is executing. Attilio --=20 Peace can only be achieved by understanding - A. Einstein