From owner-freebsd-hackers@FreeBSD.ORG Wed Dec 5 16:20:05 2012 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id EF7843E4 for ; Wed, 5 Dec 2012 16:20:05 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 4458F8FC13 for ; Wed, 5 Dec 2012 16:20:04 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id SAA15660; Wed, 05 Dec 2012 18:20:02 +0200 (EET) (envelope-from avg@FreeBSD.org) Message-ID: <50BF7431.8080109@FreeBSD.org> Date: Wed, 05 Dec 2012 18:20:01 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Alexandr Matveev Subject: Re: sleepq problem References: <50BF6E6B.2070203@timon.net.nz> In-Reply-To: <50BF6E6B.2070203@timon.net.nz> X-Enigmail-Version: 1.4.6 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@FreeBSD.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Dec 2012 16:20:06 -0000 on 05/12/2012 17:55 Alexandr Matveev said the following: > Hello, > > I'm writing a storage controller driver for 9.0-RELEASE-p4 and i'm using > sleepq at initialization to sleep until command is processed by controller: > > struct command { > <...> > uint8_t done; > }; > > void send_command_and_wait(struct command *cmd) > { > command->done = 0; > > send_command(cmd); > > for (;;) { > sleepq_lock(&command->done); > if (command->done) > break; > sleepq_add(&command->done, NULL, "wait for completion", > SLEEPQ_SLEEP, 0); > sleepq_wait(&command->done, 0); > } > sleepq_release(&command->done); > } > > Interrupt handler calls special function when command is processed: > > void command_finish(struct command *cmd) > { > sleepq_lock(&command->done); > command->done = 1; > sleepq_signal(&command->done, SLEEPQ_SLEEP, 0, 0); > sleepq_release(&command->done); > } > > This code panics very often with following messages: > > Sleeping thread (tid 100248, pid 1859) owns a non-sleepable lock > sched_switch() at sched_switch+0xf1 > mi_switch() at mi_switch+0x170 > sleepq_wait() at sleepq_wait+0x44 > send_command_and_wait() at send_command_with_retry+0x77 > <...> > panic: sleeping thread > cpuid = 1 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > kdb_backtrace() at kdb_backtrace+0x37 > panic() at panic+0x187 > propagate_priority() at propagate_priority+0x161 > turnstile_wait() at turnstile_wait+0x1b8 > _mtx_lock_sleep() at _mtx_lock_sleep+0xb0 > _mtx_lock_flags() at _mtx_lock_flags+0x96 > softclock() at softclock+0x25e > intr_event_execute_handlers() at intr_event_execute_handlers+0x66 > ithread_loop() at ithread_loop+0x96 > fork_exit() at fork_exit+0x11d > fork_trampoline() at fork_trampoline+0xe > --- trap 0, rip = 0, rsp = 0xffffff80002fad00, rbp = 0 --- > > Where tid 100248 is my driver thread which is sleeping & waiting for command > completion: > db> show thread 100248 > Thread 100243 at 0xfffffe0146aa98c0: > proc (pid 1859): 0xfffffe02a6815488 > name: kldload > stack: 0xffffff8464bf2000-0xffffff8464bf5fff > flags: 0x4 pflags: 0 > state: INHIBITED: {SLEEPING} > wmesg: wait for completion wchan: 0xffffff8464c1e244 > priority: 127 > container lock: sleepq chain (0xffffffff81101af8) > > But I can't understand what goes wrong. Sleepq chain lock is owned by > the other thread: > db> show lock 0xffffffff81101af8 > class: spin mutex > name: sleepq chain > flags: {SPIN, RECURSE} > state: {OWNED} > owner: 0xfffffe0008377000 (tid 100019, pid 12, "swi4: clock") > > Unfortunately, I can't find any examples of using sleepq in drivers. > What am I missing or don't understand? > You should not use sleepq, it's too low level. See locking(9) and follow references from there. -- Andriy Gapon