From owner-freebsd-current@FreeBSD.ORG Fri Jan 23 21:45:34 2009 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E4B9F106564A; Fri, 23 Jan 2009 21:45:34 +0000 (UTC) (envelope-from maksim.yevmenkin@gmail.com) Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.235]) by mx1.freebsd.org (Postfix) with ESMTP id A940E8FC1B; Fri, 23 Jan 2009 21:45:34 +0000 (UTC) (envelope-from maksim.yevmenkin@gmail.com) Received: by rv-out-0506.google.com with SMTP id b25so5016028rvf.43 for ; Fri, 23 Jan 2009 13:45:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=HuJgWXD0zp0jCQUls7SQCykM4HPpWKYtPf6KwbKbK0Y=; b=lxMY5qOtxToLKWOvSQJH819LEn23NKQEub2oCWKLuuQ462/GYPA3p3mHD3qrhsVWKT zfDf726mDOr54sI0aBDkmeJ7oHL2by+wefnABfR7DKKHfpf0KRXcgWYbc3PlmSttQfre 0g6X+0pqNIkizD4ObcY7jnzLdr6rp5urvtwPo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=HX8wUwdfmjlNEklE2fuMFOb8vQO9p4yfmfoyLI/wq0UF9c13KIT6fKnbXiGglyKbgO APNFEWtDGCn1hLRg1H3Xsuc+K88qNttUHpygCNbHd6fop9Di+q5Qlpgn1/cM1ELITUon M5UjsGYtFv1B/vO1HCPIdZORp/ItqoDe5SExE= MIME-Version: 1.0 Received: by 10.140.165.21 with SMTP id n21mr3240292rve.240.1232747134303; Fri, 23 Jan 2009 13:45:34 -0800 (PST) In-Reply-To: <200901232146.58360.hselasky@c2i.net> References: <20090123154336.GJ60948@e.0x20.net> <200901232029.10538.hselasky@c2i.net> <200901232146.58360.hselasky@c2i.net> Date: Fri, 23 Jan 2009 13:45:34 -0800 Message-ID: From: Maksim Yevmenkin To: Hans Petter Selasky Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-current@freebsd.org, current@freebsd.org, Alfred Perlstein Subject: Re: panic: mutex Giant not owned at /usr/src/sys/kern/tty_ttydisc.c:1127 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Jan 2009 21:45:35 -0000 Hans Petter, [...] >> > First of all I've made some patches which try to tidy up the USB blutooth >> > driver. Please see: >> > >> > http://perforce.freebsd.org/chv.cgi?CH=156577 >> > http://perforce.freebsd.org/chv.cgi?CH=156579 >> >> thanks! unfortunately, i have few problems with those patches. please >> read below. >> >> >> that should not be needed (in theory) but i will add it. >> > >> > Yes! We are multithreaded! >> >> please tell me that you read the blob about locking strategy at the >> top of ng_ubt2.c file. when ubt_task() is pending, it holds a >> reference to the netgraph node. that means the pointer is still >> pointing to a valid structure, but the node is marked as "dead" and >> NG_NODE[_NOT]_VALID() macros can be use to check for that. so, even if >> task is still pending while the rest of ng_ubt2 stuff is gone, it does >> not matter because task holds netgraph node. so the next time task is >> executed it will see that the node is "dead", release the node and >> simply return doing nothing. > > This kind of complexity is unneccesary. We should be able to drain a node, or > sleep until a node is released. > > 0) stop all USB activity > 1) free node > 2) wait until the node is actually freed > 3) free USB resources which might access node fields > > Conclusion: > > No checks are required inside any of the USB callbacks. i've added taskqueue_drain() to detach() in my "stall" diff that i posted few hours back. is that addresses some of your concerns? also keep in mind that while netgraph node is created and destroyed primarily by the driver in response to hardware arrival and departure, nothing protects it from the netgraph side. in other words there is another way to manipulate netgraph node using netgraph api. for example user can request shutdown of a netgraph node, etc. so all of this has to be taking into consideration. in fact, if i understand you correctly, with addition of taskqueue_drain() to detach(), the later becomes completely synchronous. that is 1) NG_NODE_REF() 2) ng_rmnode_self (i.e. mark node as possibly dead, detach all the hooks, etc. etc.) 3) taskqueue_drain() (i.e. sleep until task has finished) 4) usb2_transfer_unsetup() (which does "usb2_transfer_drain() and sleeps) 5) NG_NODE_UNREF() 6) free everything up in theory, items (1) and (5) above just an additional protection because i was not sure about inner workings of usb2. i think, when we sort everything out they can be removed. >> >> > 2) You drain all USB transfers: "usb2_transfer_drain()" called >> >> > unlocked. >> >> >> >> yes, usb2_transfer_unsetup() does that, does it not? >> > >> > That is correct, but sometimes you need to stop the transfers at an >> > earlier point. It depends on your design. right >> when detach() is called its already out of driver's hands. all it can >> do now is to call unsetup(), or am i missing something here? > > The correct thing is to do a usb2_transfer_drain() which will wait until the > callback has been called back, if a transfer is pending. i do not understand. if usb2_transfer_unsetup() calls usb2_transfer_drain() and sleeps, then what difference does it make in detach()? i just followed original code that did it this way, i.e. drain and unsetup all transfers in one call. > In USB2 a "usb2_start_hardware()" call is always paired with a USB transfer > callback, no matter what you do! This way of design has been chosen so that > you can do refcounts inside the callback! ok, and that is exactly what code does. the only thing is that refcounts are on netgraph node and not on softc structure. again the reason for that is because netgraph node can be accessed from both usb2 driver and netgraph subsystem directly. > > I see in your code that you try to do things Asynchronously. This >> > complicates stuff quite a lot. In my patches I've tried to do some things >> > synchronously. >> >> no, No, NO (sorry for shouting). we _CAN_NOT_ sleep in netgraph. >> period. you _CAN_NOT_ grab usb interface mutexes unless you can >> absolutely guarantee that they will not sleep for any significant >> amount of time. > > usb2_transfer_start() and usb2_transfer_stop() are non-blocking and will never > sleep. The exception is "usb2_transfer_drain()" which can sleep for a few > milliseconds and is only called when the hook is disconnected. I do not see > that as a problem versus having "X" more checks in the code. well, when i looked at the code, i saw that both functions do USB_BUS_LOCK() and USB_BUS_UNLOCK(). i do not know enough about those locks and usb2 in general, so i decided to err on the side of caution and move all the operations that could potentially sleep into the taskqueue context. this actually completely decouples netgraph from usb2, and that is a "good thing (tm)" imo :) a little bit of complexity is totally justified here imo. also, now that you mentioned it, i should call usb2_transfer_drain() in ubt_task() instead of usb2_transfer_stop() to make transfer stop completely synchronous as well. please let me reiterate, sleeping in netgraph is NOT allowed. even if its for a few milliseconds. please accept it as a fact. think of all netgraph methods as fast interrupt handlers. it is very typical for fast interrupt handlers to do minimal amount of work and schedule swi to do more heavy processing. that is exactly what the code does here. >> netgraph is essentially single threaded and by >> sleeping you will stall entire netgraph subsystem and not just current >> thread. > > If we sleep 10 milliseconds on a UBT hook disconnect, is that a big problem? > How often will UBT hook disconnect be called? it does not matter how often it will or will not be. please read my comments about. once again, no sleeping in netgraph :) >> so, objection #1 (the biggest one) is: please leave sc_mbufq_mtx and >> ubt_task() glue in place. it is needed as far as i can tell. > > What I see is that USB already has a separate process handling the callback so > that when you call "usb2_transfer_start()" this other process is woken up. > Are you sure that there is a measurable gain using "ubt_task()" versus the > way I'm doing it? yes, i'm sure. i did not invent anything new here. it all follows typical fast interrupt handler/swi (upper/bottom half for linux foks :) model. there is really nothing complicated about it, except "scary" calls to NG_NODE_REF/UNREF and NG_NODE[_NOT]_VALID() :) i'm quite happy to make an additional cleanups that you would require, but asynchronous design is something that need to stay in place, imo. >> objection #2 (somewhat major). please do NOT remove NG_NODE_REF() call >> in detach() before calling ng_rmnode_self() (unless you remove >> NG_NODE_UNREF() below as well). again the reason here is to tell >> netgraph node that we are dying, but make sure we keep node_p pointer >> pointing to a valid structure as long as possible. to summarize, > > We are already holding a reference to the node from ubt_attach(), right? Why > do we need another node reference? there is no NG_NODE_REF in attach(). when we create node is comes with one reference and that reference is removed by ng_rmnode_self(). please read my comments about detach() above. hopefully it will make sense now. [...] >> more of a question than objection. why you insist on having single >> mutex for both interfaces? isoc transfers callbacks will be called >> much more often that control, bulk and interrupt callbacks, so >> wouldn't it make sense to use different lock for isoc transfers? > > Because a single mutex is much easier to maintain and I see no gain fine > graining more for a bluetooth device :-) i do not really have strong opinion about it. i just thought there would be less contention between isoc and bulk/intr/ctrl transfers when they use different locks. probably does not matter, since everything is going over the same bus. i'm fine with this change utill someone has credible evidence otherwise. >> changes in attach() and usb2_config structures are fine (i guess). but >> please keep asynchronous design in place. it is required. > > Is it possible we can make a compromise here? From my experience synchronous > code is smaller and easier to understand than asynchronous code! The only > disadvantage about synchronous code is that more threads might be required. > And hence Netgraph is single threaded other signalling might be delayed, but > is that a real problem if we are talking about delays around 1x10ms ever time > you disconnect the hook of an UBT node? like i said, i'm quite happy to compromise here. i will make any change that would help to understand/maintain code better. unfortunately, asynchronous design would have to stay in one form or another. netgraph has to be completely decoupled from any other subsystem, imo. it is just what it is. not really related to this conversation, but think about how many changes went into freebsd between version 6 and 8, and, yet, netgraph (including bluetooth) code changed very little in comparison. so keeping things "separate" is a "good thing(tm)" :) thanks, max