From owner-freebsd-hackers@FreeBSD.ORG Fri May 25 03:26:12 2007 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A9D6316A421 for ; Fri, 25 May 2007 03:26:12 +0000 (UTC) (envelope-from lstewart@room52.net) Received: from swin.edu.au (gpo3.cc.swin.edu.au [136.186.1.223]) by mx1.freebsd.org (Postfix) with ESMTP id 3C41313C465 for ; Fri, 25 May 2007 03:26:11 +0000 (UTC) (envelope-from lstewart@room52.net) Received: from [136.186.229.95] (lstewart.caia.swin.edu.au [136.186.229.95]) by swin.edu.au (8.13.6.20060614/8.13.1) with ESMTP id l4P3Q2jP007737; Fri, 25 May 2007 13:26:03 +1000 Message-ID: <46565781.2030407@room52.net> Date: Fri, 25 May 2007 13:26:58 +1000 From: Lawrence Stewart User-Agent: Thunderbird 1.5.0.9 (X11/20070123) MIME-Version: 1.0 To: =?UTF-8?B?RGFnLUVybGluZyBTbcO4cmdyYXY=?= References: <4649349D.4060101@room52.net> <200705150847.38838.marc.loerner@hob.de> <46499491.2010205@room52.net> <46515DE0.20209@room52.net> <86sl9qtpd1.fsf@dwp.des.no> <4652AD8C.7000605@room52.net> <86r6p9md2n.fsf@dwp.des.no> <465397FB.9080309@room52.net> <86odkcugev.fsf@dwp.des.no> In-Reply-To: <86odkcugev.fsf@dwp.des.no> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.4 required=5.0 tests=ALL_TRUSTED autolearn=disabled version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on gpo3.cc.swin.edu.au Cc: freebsd-hackers@freebsd.org Subject: Re: Writing a plain text file to disk from kernel space X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 May 2007 03:26:12 -0000 Comments inline... Dag-Erling Smørgrav wrote: > Lawrence Stewart writes: > >> Dag-Erling Smørgrav writes: >> >>> Since you are writing kernel code, I assume you have KDB/DDB in your >>> kernel and know how to use it. >>> >> I don't know how to use them really. Thus far I haven't had a need for >> really low level debugging tools... seems that may have changed >> though! Any good tutorials/pointers on how to get started with kernel >> debugging? >> > > The handbook and FAQ have information on debugging panics. Greg Lehey > (grog@) does a tutorial on kernel debugging, you can probably find > slides online (or just ask him) > For reference, I found what looks to be a very comprehensive kernel debugging reference here: http://www.lemis.com/grog/Papers/Debug-tutorial/tutorial.pdf Greg certainly knows the ins and outs of kernel debugging! > >>> kio_write probably blocks waiting for the write to complete. You can't >>> do that while holding a non-sleepable lock. >>> >> So this is where my knowledge/understanding gets very hazy... >> >> When a thread blocks waiting for some operation to complete or event >> to happen, the thread effectively goes to sleep, correct? >> > > It depends on the type of lock used, but mostly, yes. > > >> Looking at the kio_write code in subr_kernio.c, I'm guessing the lock >> that is causing the trouble is related to the "vn_lock" function call? >> > > What matters is that kio_write() may sleep and therefore can't be called > while holding a non-sleepable lock. > > >> I don't understand though why the vnode lock would be set up in such a >> way that when the write blocks whilst waiting for the underlying >> filesystem to signal everything is ok, it causes the kernel to panic! >> > > You cannot sleep while holding a non-sleepable lock. You need to find > out which locks are held at the point where you call kio_write(), and > figure out a way to delay the kio_write() call until those locks are > released. > > >> How do I make the lock "sleepable" or make sure the thread doesn't try >> go to sleep whilst holding the lock? >> > > You can't make an unsleepable lock sleepable. You might be able to > replace it with a sleepable lock, but you would have to go through every > part of the kernel that uses the lock and make sure that it works > correctly with a sleepable lock. Most likely, it won't. > > Thanks for the explanations. I'm starting to get a better picture of what's actually going on. So it seems that there is no way I can call kio_write from within the function that is acting as a pfil output hook, because it blocks at some point whilst doing the disk write, which makes the kernel unhappy because pfil code is holding a non-sleepable mutex somewhere. If you read my other message from yesterday, I still can't figure out why this only happens with outbound TCP traffic, but anyways... I'll have a bit more of a think about it and get back to the list shortly... Cheers, Lawrence