From owner-freebsd-hackers@FreeBSD.ORG Tue Jun 5 19:51:58 2007 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E973116A468 for ; Tue, 5 Jun 2007 19:51:57 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (66-23-211-162.clients.speedfactory.net [66.23.211.162]) by mx1.freebsd.org (Postfix) with ESMTP id 5FC4913C44C for ; Tue, 5 Jun 2007 19:51:57 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1]) (authenticated bits=0) by server.baldwin.cx (8.13.8/8.13.8) with ESMTP id l55JphV1030407; Tue, 5 Jun 2007 15:51:43 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: freebsd-hackers@freebsd.org Date: Tue, 5 Jun 2007 15:39:22 -0400 User-Agent: KMail/1.9.6 References: <4649349D.4060101@room52.net> <86odkcugev.fsf@dwp.des.no> <46565781.2030407@room52.net> In-Reply-To: <46565781.2030407@room52.net> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200706051539.22662.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]); Tue, 05 Jun 2007 15:51:44 -0400 (EDT) X-Virus-Scanned: ClamAV 0.88.3/3362/Tue Jun 5 13:02:53 2007 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: Lawrence Stewart , Dag-Erling =?utf-8?q?Sm=C3=B8rgrav?= Subject: Re: Writing a plain text file to disk from kernel space X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jun 2007 19:51:58 -0000 On Thursday 24 May 2007 11:26:58 pm Lawrence Stewart wrote: > Comments inline... >=20 > Dag-Erling Sm=C3=B8rgrav wrote: > > Lawrence Stewart writes: > > =20 > >> Dag-Erling Sm=C3=B8rgrav writes: > >> =20 > >>> Since you are writing kernel code, I assume you have KDB/DDB in your > >>> kernel and know how to use it. > >>> =20 > >> I don't know how to use them really. Thus far I haven't had a need for > >> really low level debugging tools... seems that may have changed > >> though! Any good tutorials/pointers on how to get started with kernel > >> debugging? > >> =20 > > > > The handbook and FAQ have information on debugging panics. Greg Lehey > > (grog@) does a tutorial on kernel debugging, you can probably find > > slides online (or just ask him) > > =20 >=20 >=20 > For reference, I found what looks to be a very comprehensive kernel=20 > debugging reference here:=20 > http://www.lemis.com/grog/Papers/Debug-tutorial/tutorial.pdf >=20 > Greg certainly knows the ins and outs of kernel debugging! >=20 > > =20 > >>> kio_write probably blocks waiting for the write to complete. You can= 't > >>> do that while holding a non-sleepable lock. > >>> =20 > >> So this is where my knowledge/understanding gets very hazy... > >> > >> When a thread blocks waiting for some operation to complete or event > >> to happen, the thread effectively goes to sleep, correct? > >> =20 > > > > It depends on the type of lock used, but mostly, yes. > > > > =20 > >> Looking at the kio_write code in subr_kernio.c, I'm guessing the lock > >> that is causing the trouble is related to the "vn_lock" function call? > >> =20 > > > > What matters is that kio_write() may sleep and therefore can't be called > > while holding a non-sleepable lock. > > > > =20 > >> I don't understand though why the vnode lock would be set up in such a > >> way that when the write blocks whilst waiting for the underlying > >> filesystem to signal everything is ok, it causes the kernel to panic! > >> =20 > > > > You cannot sleep while holding a non-sleepable lock. You need to find > > out which locks are held at the point where you call kio_write(), and > > figure out a way to delay the kio_write() call until those locks are > > released. > > > > =20 > >> How do I make the lock "sleepable" or make sure the thread doesn't try > >> go to sleep whilst holding the lock? > >> =20 > > > > You can't make an unsleepable lock sleepable. You might be able to > > replace it with a sleepable lock, but you would have to go through every > > part of the kernel that uses the lock and make sure that it works > > correctly with a sleepable lock. Most likely, it won't. > > > > =20 >=20 >=20 > Thanks for the explanations. I'm starting to get a better picture of=20 > what's actually going on. >=20 > So it seems that there is no way I can call kio_write from within the=20 > function that is acting as a pfil output hook, because it blocks at some= =20 > point whilst doing the disk write, which makes the kernel unhappy=20 > because pfil code is holding a non-sleepable mutex somewhere. >=20 > If you read my other message from yesterday, I still can't figure out=20 > why this only happens with outbound TCP traffic, but anyways... >=20 > I'll have a bit more of a think about it and get back to the list shortly= =2E.. Use a task to defer the kio_write() to a taskqueue. You have to malloc sta= te=20 (using M_NOWAIT, which can fail) to do this properly. If you are doing thi= s=20 for every packet, you are probably better off using malloc() to throw items= =20 into a queue and having a global task that drains the queue on each executi= on=20 doing kio_write()'s for each object. Regarding sleepable vs. non-sleepable locks. Getting preempted by an=20 interrupt is not considered "sleeping". Sleeping means voluntarily yieldin= g=20 the CPU to wait for an event such as via msleep() or a condition variable. = =20 Note that interrupt handlers can acquire non-sleepable locks. If you sleep= =20 while holding a non-sleepable lock, you may have an interrupt handler that= =20 can't run while it waits for some async event (like disk I/O) to complete. =2D-=20 John Baldwin