From owner-freebsd-hackers@freebsd.org Tue Apr 11 23:11:06 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C627CD3A0D1 for ; Tue, 11 Apr 2017 23:11:06 +0000 (UTC) (envelope-from torek@elf.torek.net) Received: from elf.torek.net (mail.torek.net [96.90.199.121]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "elf.torek.net", Issuer "elf.torek.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id B27191B24 for ; Tue, 11 Apr 2017 23:11:06 +0000 (UTC) (envelope-from torek@elf.torek.net) Received: from elf.torek.net (localhost [127.0.0.1]) by elf.torek.net (8.15.2/8.15.2) with ESMTPS id v3BNB45w094086 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 11 Apr 2017 16:11:04 -0700 (PDT) (envelope-from torek@elf.torek.net) Received: (from torek@localhost) by elf.torek.net (8.15.2/8.15.2/Submit) id v3BNB4fc094085; Tue, 11 Apr 2017 16:11:04 -0700 (PDT) (envelope-from torek) Date: Tue, 11 Apr 2017 16:11:04 -0700 (PDT) From: Chris Torek Message-Id: <201704112311.v3BNB4fc094085@elf.torek.net> To: ablacktshirt@gmail.com, imp@bsdimp.com Subject: Re: Understanding the FreeBSD locking mechanism Cc: ed@nuxi.nl, freebsd-hackers@freebsd.org, rysto32@gmail.com In-Reply-To: <4768e26a-cdec-6f40-1463-ece9847ca34d@gmail.com> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (elf.torek.net [127.0.0.1]); Tue, 11 Apr 2017 16:11:04 -0700 (PDT) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Apr 2017 23:11:06 -0000 >The difference between the "ithread" and "interrupt filter" things >is that ithread has its own thread context, while interrupt handling >through interrupt filter shares the same kernel stack. Right -- though rather than "the same" I would just say "shares a stack", i.e., we're not concerned with *whose* stack and/or thread we're borrowing, just that we have one borrowed. >So, for ithread, we should use the MTX_DEF, which don't disable >interrupt, and for "interrupt filter", we should use the MTX_SPIN, which >disable interrupt. Right. >What really confuses me is that I don't really see how owning an >"independent" thread context(i.e ithread) makes a thread run in the >"top-half" and how sharing the same kernel stack makes a thread run in >the "bottom-half". It's not that it *makes* it run that way, it's that it *allows* it to run that way -- and then the scheduler *does* run it that way. >I did read your long explanation in the previous mail. For the non-SMP >case, the "top-half/bottom-half" model goes well and I understand how >the *code* path/*data* path things go. But I cannot still fully >understand the model for the SMP case. It's fundamentally fairly tricky, but we start with that same first notion: * If you have your own state (i.e., stack), you can be suspended (stopped in the scheduler, giving the CPU to other threads): *your* (private) state is preserved on *your* (private) stack. * If you have borrowed someone else's state, anything that suspends you, suspends them too. Since this may deadlock, you are not allowed to do it at all. Once we block interrupts locally (as for MTX_SPIN, or automatically inside a filter style or "bottom half" interrupt), we are in a special state: we may not take *any* MTX_DEF locks at all (the kernel should panic if we do). This in turn means that data structures are protected *either* by a spin mutex *or* by a default (non-spin) mutex, never both. So if you need to touch a spin-mutex data structure from thread-y ("top half") code, you obtain the spin mutex, and now no interrupts will occur *on this CPU*, and as a key side effect, you won't move *off* this CPU either. If an interrupt occurs on another CPU and it goes to take the spin lock that protects that CPU, it loops at that point, not switching tasks, waiting for the MTX_SPIN mutex to be released: CPU 1 CPU 2 ----------------------------|----------------------------- func() { | ... code not involving mtx mtx_lock_spin(&mtx); | ... do some work | mtx_lock_spin(&mtx); /* loops */ . | [stuck] . | [stuck] . | [stuck] mtx_unlock_spin(&mtx); | [unstuck] ... | do some work If an interrupt occurs on CPU 2, and that interrupt-handling code wants to touch the data protected by the spin lock, that code obtains the spin lock as usual. Meanwhile the interrupt *cannot* occur on CPU 1, as holding the spin lock has blocked interrupts. So the code path on CPU 2 blocks -- looping in mtx_lock_spin(), not giving CPU 2 over to the scheduler -- for as long as CPU 1 holds the spin lock. The corresponding code path is already blocked on CPU 1, the same way it was back in the non-SMP, single- CPU days. This means it is unwise to hold spin locks for long periods. In fact, if CPU 2 waits too long in that [stuck] section, it will panic, on the assumption that CPU 1 has done something terrible and the system is now hung. This is also waht gives rise to the constrant that you must take MTX_SPIN locks "inside" any outer MTX_DEF locks. Chris