From owner-freebsd-fs@FreeBSD.ORG Wed Feb 3 11:23:19 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 098FA1065676; Wed, 3 Feb 2010 11:23:19 +0000 (UTC) (envelope-from stephane.lapie@darkbsd.org) Received: from quasar.darkbsd.org (shinigami.darkbsd.org [82.227.96.182]) by mx1.freebsd.org (Postfix) with ESMTP id 98F138FC1B; Wed, 3 Feb 2010 11:23:18 +0000 (UTC) Received: from quasar.darkbsd.org (localhost [127.0.0.1]) by quasar.darkbsd.org (Postfix) with ESMTP id 840151CFF; Wed, 3 Feb 2010 12:23:11 +0100 (CET) Received: from quasar.darkbsd.org ([127.0.0.1]) by quasar.darkbsd.org (quasar.darkbsd.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id n6qkRDKXiBJV; Wed, 3 Feb 2010 12:23:09 +0100 (CET) Received: from [192.168.3.42] (archer.yomi.darkbsd.org [192.168.3.42]) (Authenticated sender: darksoul) by quasar.darkbsd.org (Postfix) with ESMTPSA id 9E9211CF8; Wed, 3 Feb 2010 12:23:07 +0100 (CET) Message-ID: <4B695CA3.50008@darkbsd.org> Date: Wed, 03 Feb 2010 20:23:15 +0900 From: Stephane LAPIE User-Agent: Thunderbird 2.0.0.23 (X11/20090817) MIME-Version: 1.0 To: Andriy Gapon References: <4B682972.6030604@darkbsd.org> <4B682F29.90505@icyb.net.ua> <4B686324.2090308@elischer.org> <4B68641D.9000201@icyb.net.ua> In-Reply-To: <4B68641D.9000201@icyb.net.ua> X-Enigmail-Version: 0.95.6 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig9026C478EB31D9AA9953CE13" Cc: freebsd-fs@freebsd.org, Julian Elischer , freebsd-hardware@freebsd.org Subject: Re: [zfs][hardware] Reproducible kernel panic in 8.0-STABLE X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Feb 2010 11:23:19 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig9026C478EB31D9AA9953CE13 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Andriy Gapon wrote: > on 02/02/2010 19:38 Julian Elischer said the following: >> Andriy Gapon wrote: >>> on 02/02/2010 15:32 Stephane LAPIE said the following: >>>> I have a case of kernel panic that can be consistently reproduced, a= nd >>>> which I guess is related to the hardware I'm using (Marvell controll= ers, >>>> check my pciconf -lv output below). >>>> >>>> The kernel panic message is always, consistently, the following : >>>> >>>> Sleeping thread (tid 100021, pid 0) owns a non-sleepable lock >>> I probably won't be able to help you, but to kickstart debugging coul= d >>> you please >>> run 'procstat -t 0' and determine what kernel thread has tid 100021 o= n >>> your system? >> or in the kernel debugger after the panic, do: bt >=20 > I think that in this case it may not help. I mean the stack trace. > Because, I think that this panic happens after the taskqueue thread is = done with > its tasks and is parked waiting. >=20 >> you DO have options kdb and ddb right? (I never leave home without th= em) >> >=20 >=20 I just rebuilt a kernel with debugger options, and obtained the=20 following output upon pulling out one disk : Sleeping thread (tid 100024, pid 0) owns a non-sleepable lock sched_switch() at sched_switch+0xf8 mi_switch() at mi_switch+0x16f sleepq_timedwait() at sleepq_timedwait+0x42 _cv_timedwait() at _cv_timedwait+0x129 _sema_timedwait() at _sema_timedwait+0x55 ata_queue_request() at ata_queue_request+0x526 ata_controlcmd() at ata_controlcmd+0xa1 ata_setmode() at ata_setmode+0xdc ad_init() at ad_init+0x27 ad_reinit() at ad_reinit+0x48 ata_reinit() at ata_reinit+0x268 ata_conn_event() at ata_conn_event+0x49 taskqueue_run() at taskqueue_run+0x93 taskqueue_thread_loop() at taskqueue_thread_loop+0x46 fork_exit() at fork_exit+0x118 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip =3D 0, rsp =3D 0xffffff80000aad30, rbp =3D 0 --- panic: sleeping thread cpuid =3D 2 KDB: enter: panic [thread pid 12 tid 100008 ] Stopped at kdb_enter+0x3d: movq $0,0x4943d0(%rip) I think the output below is not really relevant though. db> bt Tracing pid 12 tid 100008 td 0xffffff000187e000 kdb_enter() at kdb_enter+0x3d panic() at panic+0x17b turnstile_adjust() at turnstile_adjust turnstile_wait() at turnstile_wait+0x1aa _mtx_lock_sleep() at _mtx_lock_sleep+0xb0 softclock() at softclock+0x2a9 intr_event_execute_handlers() at intr_event_execute_handlers+0xfd ithread_loop() at ithread_loop+0x8e fork_exit() at fork_exit+0x118 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip =3D 0, rsp =3D 0xffffff800005ad30, rbp =3D 0 --- If there is anything else I can run to obtain further information, all=20 hints are welcome, though this clearly seems to point to a problem with=20 my controller event handling as I initially thought. I am also very suspicious of that controller because it tends to drop=20 two disks at exactly the same time, which alas belong to the same raidz1 = block (BIOS level can't reset properly the port or redetect them after=20 this, I have to go through a cold boot; The disks themselves could be=20 damaged but I don't catch any weird readings via SMART and Reallocated=20 Sectors or such). I am seriously thinking of moving some of these disks=20 to the AHCI controller on my motherboard, and will resort to using my=20 spares at the very least in the meantime. Thanks for your time, --=20 Stephane LAPIE, EPITA SRS, Promo 2005 "Even when they have digital readouts, I can't understand them." --MegaTokyo --------------enig9026C478EB31D9AA9953CE13 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAktpXKkACgkQ24Ql8u6TF2PafgCg0KHN21iTsRKK5bicKqrVo4Rv E68AoKFECb7szXCvNUWvk7k40dKfMI5r =URPh -----END PGP SIGNATURE----- --------------enig9026C478EB31D9AA9953CE13--