From owner-freebsd-bugs@FreeBSD.ORG Fri Jun 29 15:10:04 2007 Return-Path: X-Original-To: freebsd-bugs@hub.freebsd.org Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 14D1816A468 for ; Fri, 29 Jun 2007 15:10:04 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [69.147.83.40]) by mx1.freebsd.org (Postfix) with ESMTP id E2EC713C465 for ; Fri, 29 Jun 2007 15:10:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.4/8.13.4) with ESMTP id l5TFA3SH016670 for ; Fri, 29 Jun 2007 15:10:03 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.4/8.13.4/Submit) id l5TFA3xl016669; Fri, 29 Jun 2007 15:10:03 GMT (envelope-from gnats) Resent-Date: Fri, 29 Jun 2007 15:10:03 GMT Resent-Message-Id: <200706291510.l5TFA3xl016669@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Herve Boulouis Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B268E16A421 for ; Fri, 29 Jun 2007 15:01:19 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [69.147.83.33]) by mx1.freebsd.org (Postfix) with ESMTP id 9703013C44C for ; Fri, 29 Jun 2007 15:01:19 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.13.1/8.13.1) with ESMTP id l5TF1JOY042195 for ; Fri, 29 Jun 2007 15:01:19 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.13.1/8.13.1/Submit) id l5TF1JWw042194; Fri, 29 Jun 2007 15:01:19 GMT (envelope-from nobody) Message-Id: <200706291501.l5TF1JWw042194@www.freebsd.org> Date: Fri, 29 Jun 2007 15:01:19 GMT From: Herve Boulouis To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.0 Cc: Subject: kern/114133: Kernel panic: locking problems between aac driver and aaccli X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Jun 2007 15:10:04 -0000 >Number: 114133 >Category: kern >Synopsis: Kernel panic: locking problems between aac driver and aaccli >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Fri Jun 29 15:10:03 GMT 2007 >Closed-Date: >Last-Modified: >Originator: Herve Boulouis >Release: FreeBSD 6.2-STABLE >Organization: >Environment: FreeBSD ridley.nerim.net 6.2-STABLE FreeBSD 6.2-STABLE #0: Thu Jun 21 16:13:11 CEST 2007 root@ridley.nerim.net:/usr/src/sys/i386/compile/RIDLEY i386 >Description: I got the following panic on 2 different servers : panic: _mtx_lock_sleep: recursed on non-recursive mutex AAC I/O lock @ ../../../dev/aac/aac.c:2985 This is triggered by a supervision script which runs aaccli to check the raid volume health. The 2 machines are supermicro Xeon servers with Adaptec 2200S cards. I have other servers with same hardware which do not exhibit the problem thought the 2 crashing ones are doing more disk i/o than the others. Compiling the kernel with the standard (from the handbook) lock debugging options gave the following info (same kernel backtrace, same 'sendfib' status & backtrace for aaccli process in all crashes) db> ps pid ppid pgrp uid state wmesg wchan cmd 66921 66920 3208 0 S sendfib 0xecdb5c14 aaccli 66920 66919 3208 0 S wait 0xc669c860 sh 66919 3209 3208 0 S piperd 0xc67e7198 perl5.8.6 66918 66629 66625 8 S nanslp 0xc0887d6c sleep 66629 66625 66625 8 S wait 0xc6c1a218 sh 66625 66622 66625 8 Ss wait 0xc69e8218 sh 66622 707 707 0 S piperd 0xc66c4b28 cron 66604 738 719 8 S nanslp 0xc0887d6c sleep 51466 734 734 8 S select 0xc08d28c4 innfeed 51465 734 734 8 S sbwait 0xc692979c perl5.8.6 3209 1 3208 0 S select 0xc08d28c4 snmpd 785 1 785 0 Ss+ ttyin 0xc64d2810 getty 784 1 784 0 Ss+ ttyin 0xc64d9410 getty 783 1 783 0 Ss+ ttyin 0xc64db010 getty 782 1 782 0 Ss+ ttyin 0xc64db410 getty 781 1 781 0 Ss+ ttyin 0xc64dac10 getty 780 1 780 0 Ss+ ttyin 0xc64d9c10 getty 779 1 779 0 Ss+ ttyin 0xc64ce410 getty 778 1 778 0 Ss+ ttyin 0xc64dbc10 getty --More-- 777 1 777 0 Ss+ ttyin 0xc64dc010 getty 739 736 719 8 S+ nanslp 0xc0887d6c perl5.8.6 738 735 719 8 S+ wait 0xc690d218 sh 736 1 719 8 S+ wait 0xc64b6a78 sh 735 1 719 8 S+ wait 0xc669cc90 sh 734 1 734 8 SLs vmpfw 0xc2a72df8 innd 707 1 707 0 Ss nanslp 0xc0887d6c cron 701 1 701 25 Ss pause 0xc690d464 sendmail 697 1 697 0 Ss select 0xc08d28c4 sendmail 691 1 691 0 Ss select 0xc08d28c4 sshd 634 1 633 80 S kqread 0xc6727000 lighttpd 567 1 567 0 Ss select 0xc08d28c4 syslogd 550 1 550 0 Ss nanslp 0xc0887d6c ipmon 510 1 510 0 Ss select 0xc08d28c4 devd 41 0 0 0 SL - 0xe6abacf8 [schedcpu] 40 0 0 0 SL sdflush 0xc08e0254 [softdepflush] 39 0 0 0 SL syncer 0xc0887adc [syncer] 38 0 0 0 SL vlruwt 0xc6452c90 [vnlru] 37 0 0 0 SL psleep 0xc08d2e4c [bufdaemon] 36 0 0 0 SL pgzero 0xc08e11c4 [pagezero] --More-- 35 0 0 0 SL psleep 0xc08e0d14 [vmdaemon] 34 0 0 0 SL psleep 0xc08e0cd0 [pagedaemon] 33 0 0 0 SL - 0xc6451a3c [fdc0] 32 0 0 0 WL [swi0: sio] 31 0 0 0 WL [irq1: atkbd0] 30 0 0 0 WL [irq17: ichsmb0] 29 0 0 0 WL [irq15: ata1] 28 0 0 0 WL [irq14: ata0] 27 0 0 0 WL [irq55: em1] 26 0 0 0 RL [irq54: em0] 25 0 0 0 SL idle 0xc6497000 [aic_recovery1] 24 0 0 0 WL [irq33: ahd1] 23 0 0 0 SL idle 0xc6495000 [aic_recovery0] 22 0 0 0 WL [irq32: ahd0] 21 0 0 0 SL aifthd 0xc6452430 [aac0aif] 20 0 0 0 WL [irq9: acpi0] 19 0 0 0 WL [swi2: cambio] 18 0 0 0 SL ccb_scan 0xc087e584 [xpt_thrd] 9 0 0 0 SL - 0xc63da880 [acpi_task_2] 8 0 0 0 SL - 0xc63da880 [acpi_task_1] --More-- 7 0 0 0 SL - 0xc63da880 [acpi_task_0] 17 0 0 0 RL CPU 0 [swi5: +] 6 0 0 0 SL - 0xc63daa00 [thread taskq] 16 0 0 0 WL [swi6: Giant taskq] 15 0 0 0 WL [swi6: task queue] 5 0 0 0 SL - 0xc63dad00 [kqueue taskq] 14 0 0 0 SL - 0xc0884500 [yarrow] 4 0 0 0 SL - 0xc0885228 [g_down] 3 0 0 0 SL - 0xc0885224 [g_up] 2 0 0 0 SL - 0xc088521c [g_event] 13 0 0 0 WL [swi1: net] 12 0 0 0 WL [swi3: vm] 11 0 0 0 RL [swi4: clock sio] 10 0 0 0 RL [idle] 1 0 1 0 SLs wait 0xc6391000 [init] 0 0 0 0 WLs [swapper] db> show pcpu cpuid = 0 curthread = 0xc6392300: pid 17 "swi5: +" curpcb = 0xe5066d90 fpcurthread = none idlethread = 0xc638c600: pid 10 "idle" APIC ID = 0 currentldt = 0x50 spin locks held: db> show locks exclusive sleep mutex AAC I/O lock r = 0 (0xc6476134) locked @ dev/aac/aac.c:1009 db> show alllocks Process 17 (swi5: +) thread 0xc6392300 (100011) exclusive sleep mutex AAC I/O lock r = 0 (0xc6476134) locked @ dev/aac/aac.c:1009 db> show lockedvnods Locked vnodes db> trace Tracing pid 17 tid 100011 td 0xc6392300 kdb_enter(c0815a6b,c0887740,c0814ee3,e5066bb8,100,...) at kdb_enter+0x30 panic(c0814ee3,c07f3b05,c07f3c43,ba9,c6476134,...) at panic+0xd5 _mtx_lock_sleep(c6476134,c6392300,0,c07f3c43,ba9,...) at _mtx_lock_sleep+0x48 _mtx_lock_flags(c6476134,0,c07f3c43,ba9,c6476134,...) at _mtx_lock_flags+0xe1 aac_ioctl_event(c6476000,c6c4e3e0,ecdb5c14) at aac_ioctl_event+0x4b aac_release_command(c647e380,c6476158,c647e380,c6476000,e5066c78,...) at aac_release_command+0x112 aac_bio_complete(c647e380,4,e5066c68,e5066c6c,2c,...) at aac_bio_complete+0x2e aac_complete(c6476000,1,c0818c67,50,c63da99c,...) at aac_complete+0xfb taskqueue_run(c63da980,e5066cd8,c05dad78,0,e5066cd8,...) at taskqueue_run+0x110 taskqueue_fast_run(0,e5066cd8,c05e9b1c,c0887440,1,...) at taskqueue_fast_run+0x13 ithread_execute_handlers(c6391860,c63da900,c0812eb5,30d,c6392300,...) at ithread_execute_handlers+0x128 ithread_loop(c6372970,e5066d38,c0812cb7,32d,0,...) at ithread_loop+0x83 fork_exit(c05dae70,c6372970,e5066d38) at fork_exit+0xc1 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe5066d6c, ebp = 0 --- db> alltrace Tracing command aaccli pid 66921 tid 100065 td 0xc69e7000 sched_switch(c69e7000,0,1,15e,5eef17a,...) at sched_switch+0x190 mi_switch(1,0,c0818796,1bf,1,...) at mi_switch+0x2e6 sleepq_switch(ecdb5c14,0,c0818796,211,ecdb5bf8,...) at sleepq_switch+0x112 sleepq_wait(ecdb5c14,c6476134,c07f41f0,0,0,...) at sleepq_wait+0x65 msleep(ecdb5c14,c6476134,0,c07f41f0,0,...) at msleep+0x339 aac_ioctl_sendfib(c6476000,bfbfc704,131,2,c6458b00,...) at aac_ioctl_sendfib+0xc4 aac_ioctl(c6458b00,20003802,ecdb5d0c,7,c69e7000,...) at aac_ioctl+0x1a0 giant_ioctl(c6458b00,20003802,ecdb5d0c,7,c69e7000,...) at giant_ioctl+0x56 devfs_ioctl_f(c6adf6c0,20003802,ecdb5d0c,c6a6b700,c69e7000,...) at devfs_ioctl_f+0xe7 ioctl(c69e7000,ecdb5d04,c,c69e7000,3,...) at ioctl+0x48d syscall(3b,3b,3b,20003802,6,...) at syscall+0x300 Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (54, FreeBSD ELF32, ioctl), eip = 0x815f468, esp = 0xbfbfc5fc, ebp = 0xbfbfc628 --- My aac.c version is 1.109.2.5 >How-To-Repeat: Run aaccli regularly (I only use "container list /full") on a machine doing a lot of disk i/o (the servers which crash here run inn and cricket) >Fix: The only workaround I found is to to stop running aaccli :( >Release-Note: >Audit-Trail: >Unformatted: