Date: Sat, 24 Jan 2009 10:43:12 -0800 From: Maksim Yevmenkin <maksim.yevmenkin@gmail.com> To: Hans Petter Selasky <hselasky@c2i.net> Cc: freebsd-current@freebsd.org, current@freebsd.org, Alfred Perlstein <alfred@freebsd.org> Subject: Re: panic: mutex Giant not owned at /usr/src/sys/kern/tty_ttydisc.c:1127 Message-ID: <bb4a86c70901241043i107b3199hf0532f0824cc5972@mail.gmail.com> In-Reply-To: <200901241055.20054.hselasky@c2i.net> References: <20090123154336.GJ60948@e.0x20.net> <200901240952.21670.hselasky@c2i.net> <bb4a86c70901240138g6a221fd4rbab3945193e4617@mail.gmail.com> <200901241055.20054.hselasky@c2i.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Jan 24, 2009 at 1:55 AM, Hans Petter Selasky <hselasky@c2i.net> wrote: > On Saturday 24 January 2009, Maksim Yevmenkin wrote: >> Hans Petter, >> >> i'm sorry, did i mention there is no sleeping in netgraph? :-) > > Can you elaborate this? Is netgraph run from fast interrupt context? So that > only spin locks are possible? Or is it run from a thread? i already told you that netgraph is essentially single-threaded. for all intents and purposes think that only _ONE_ thread services _ALL_ netgraph nodes. if something can not be done immediately, netgraph queues it and returns back to it later. _ANY_ delay/sleep would stall _ENTIRE_ netgraph. think poll/select/callback driven programming model. >> from what i can see you are _NOT_ using _SPIN_ mutexes (aka spin >> locks). you are using regular mutexes. let me quote locking(9) man >> page >> >> " >> Mutexes >> Basically (regular) mutexes will deschedule the thread if the mutex >> can not be acquired. A non-spin mutex can be considered to be equivalent >> to getting a write lock on an rw_lock (see below), >> " >> >> so, if thread can not get mutex it will be descheduled. this >> absolutely can not happen in netgraph! > > There are mutexes inside the taskqueue aswell. The problem will be the same > there if you don't use a so-called fast tasqueue. well, yes. technically, its not 100% correct, but it has to be like it because there is simply no way around this. if you read the blob at the top of the ng_ubt2.c you know that it uses regular sc_mbufq_mtx (which can sleep). let me quote the blob here again " Access to the outgoing queues and task flags is controlled by the sc_mbufq_mtx lock. It is an unavoidable evil. Again, sc_mbufq_mtx should really be a spin lock." the sc_mbufq_mtx is _NOT_ marked as SPIN mutex only because WITNESS has to know about spin locks. it is generally NOT a good idea to add a spin lock when someone feels like it. now back to taskqueue_enqueue(), yes, taskqueue_enqueue_fast() is more appropriate here. however, taskqueue_enqueue() was chosen because taskqueue_enqueue() does not do anything that could cause sleep in TQ_LOCK()/TQ_UNLOCK() (its just basically a lookup). from taskqueue man page "Enqueueing a task does not perform any memory allocation which makes it suitable for calling from an interrupt handler." USB_BUS_LOCK()/USB_BUS_UNLOCK() are different is this regard because those locks used when usb2 does things with hardware. also, like i said, there could be few usb devices sharing the same bus and thus the same USB_BUS_LOCK. once again the rule is: NO SLEEPING/STALLING IN NETGRAPH. it DOES NOT mean that you can not use mutexes. it only means that you have to be very careful of how the mutex is used. >> first of all, i do not think crashes are caused by detach(). in fact, >> detach() is clean. i've tested it and it worked for me. i tried to >> start/stop device while doing flood l2ping. i also tried to yank the >> device while doing flood l2ping. it works correctly. i think, the >> issue is related to stalled transfers. there is still something wrong >> with the code path that deals with stalled transfers. stalls do not >> happen on my test box, so i can not test it. also there is NO code >> duplication. asynchronous path is required to decouple netgraph from >> usb2. > > Only if netgraph is run from fast interrupt context. no, no, no. we are going back in circles. i would very much like to do things synchronously, but i can not. imo, i've given you plenty reasoning behind async design. please understand, asyn design has to stay. even if "sync" code appears to work, its still wrong. >> regular mutexes can sleep. we are not allowed to sleep in netgraph. >> therefor we must transition out of netgraph context before calling >> into any code that tries to grab regular mutex. the async design is >> there not because i want to make things complicated. its there because >> it is needed. > > I think there are two definitions of sleeping. > > 1) When a thread is waiting for a mutex it is not sleeping in the same way > like if it was to call "tsleep()". > > 2) When a thread is waiting for a wakeup it is surely sleeping, which can > happen inside sx_lock() and tsleep(). when i say "sleeping" i mean that the thread that is trying to get the lock is de-scheduled. that is, it is not running anymore. i'm not talking about [tm]sleep() etc. just the fact that thread is not running. oh, and to possibly answer your next question, freebsd's mutexes are ADAPTIVE, meaning that thread will will spin on a mutex for a little bit if it can not grab it right away BEFORE going to sleep. this means that the sc_mbufq_mtx mutex is very likely to be an equivalent of a spin lock because it is mosty used to protect queue while en/dequeuing mbuf etc. the same COULD apply for TQ_LOCK in taskqueue_enqeue(). thanks, max
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bb4a86c70901241043i107b3199hf0532f0824cc5972>