From owner-freebsd-scsi@FreeBSD.ORG Sun Jun 26 08:14:02 2005 Return-Path: X-Original-To: freebsd-scsi@freebsd.org Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7023816A41C for ; Sun, 26 Jun 2005 08:14:02 +0000 (GMT) (envelope-from christias@gmail.com) Received: from rproxy.gmail.com (rproxy.gmail.com [64.233.170.192]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2B39A43D5D for ; Sun, 26 Jun 2005 08:14:01 +0000 (GMT) (envelope-from christias@gmail.com) Received: by rproxy.gmail.com with SMTP id i8so696640rne for ; Sun, 26 Jun 2005 01:14:01 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:mime-version:content-type:content-transfer-encoding:content-disposition; b=ONGbjpUsJH/1tFoDRLtsV8bUKicFrHw6/W9BB/vsGK/ZzMIWwLnZ/KU8/Z580946fkxuSmLjfRjQgThZhMHHOfFz5+Gx09vnFnL1pXrYico77zjZ/SCny9QpRx8B3apaE+ZJYL+JIMIf6ZlgEjThC40pCeYwupzXPqkybxeNE0w= Received: by 10.38.78.53 with SMTP id a53mr2095517rnb; Sun, 26 Jun 2005 01:14:01 -0700 (PDT) Received: by 10.38.78.72 with HTTP; Sun, 26 Jun 2005 01:14:01 -0700 (PDT) Message-ID: Date: Sun, 26 Jun 2005 11:14:01 +0300 From: Panagiotis Christias To: freebsd-stable@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Cc: freebsd-scsi@freebsd.org Subject: Strange SCSI behavior after upgrading from 5.2.1 to 5.4 (and a panic) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Panagiotis Christias List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Jun 2005 08:14:02 -0000 Hello, on Thurday we upgraded one of our last 5.2.1 servers to 5.4. Tonight the server panicked, crashed and I had to power it off and on. Here are the logs before the panic: Jun 26 03:45:50 patroklos kernel: (da0:ahc0:0:0:0): lost device Jun 26 03:45:50 patroklos kernel: (da0:ahc0:0:0:0): Invalidating pack Jun 26 03:46:00 patroklos last message repeated 2 times Jun 26 03:46:06 patroklos kernel: initiate_write_filepage: already started Jun 26 03:46:07 patroklos last message repeated 9 times Jun 26 03:46:07 patroklos kernel: (da0:ahc0:0:0:0): READ(10). CDB: 28 0 72 f 16 0 0 0 80 0 Jun 26 03:46:07 patroklos kernel: (da0:ahc0:0:0:0): CAM Status: SCSI Status Error Jun 26 03:46:07 patroklos kernel: (da0:ahc0:0:0:0): SCSI Status: Check Cond= ition Jun 26 03:46:07 patroklos kernel: (da0:ahc0:0:0:0): UNIT ATTENTION asc:29,0 Jun 26 03:46:07 patroklos kernel: (da0:ahc0:0:0:0): Power on, reset, or bus device reset occurred Jun 26 03:46:07 patroklos kernel: (da0:ahc0:0:0:0): Retries Exhausted Jun 26 03:46:07 patroklos kernel: (da0:ahc0:0:0:0): Invalidating pack Jun 26 03:46:31 patroklos kernel: initiate_write_filepage: already started Jun 26 03:46:43 patroklos kernel: panic: initiate_write_inodeblock_ufs2: already started As I said the machine could not recover from the panic so there ais no crashdump. The 5.4 version of the dmesg output is available at: http://noc.ntua.gr/~christia/tmp/dmesg-5.4.txt The 5.2.1 version of the dmesg output is available at: http://noc.ntua.gr/~christia/tmp/dmesg-5.2.1.txt da0 is an 1302GB external IDE to SCSI RAID (8x200GB IDE drives in RAID5 configuration and a SCSI U160 interface). FreeBSD 5.4 connects to da0 at 80MB/s (40.000MHz, offset 31, 16bit, Tagged Queueing Enabled), while FreeBSD 2.5.1 (and FreeBSD 5.3 - just tried to boot with the 5.3-RELEASE-i386-miniinst.iso) connects happily at 160MB/s (80.000MHz, offset 62, 16bit, Tagged Queueing Enabled) which is the transfer rate supported by the RAID device and the SCSI card (Adaptec 3960D Ultra160 SCSI adapter/aic7899). Any ideas what could be or where could be the problem? What has changed in = 5.4? We had preserved the 5.2.1 system disks and after the crash we moved back to 5.2.1 until further notice. Now I'm thinking of trying 5.3 which seems to have the same behavior as 5.2.1 and will be still supported for a year or so. Thanks, Panagiotis From owner-freebsd-scsi@FreeBSD.ORG Mon Jun 27 11:01:58 2005 Return-Path: X-Original-To: freebsd-scsi@freebsd.org Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9EA8316A41C for ; Mon, 27 Jun 2005 11:01:58 +0000 (GMT) (envelope-from owner-bugmaster@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8345D43D53 for ; Mon, 27 Jun 2005 11:01:58 +0000 (GMT) (envelope-from owner-bugmaster@freebsd.org) Received: from freefall.freebsd.org (peter@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.3/8.13.3) with ESMTP id j5RB1wcp043149 for ; Mon, 27 Jun 2005 11:01:58 GMT (envelope-from owner-bugmaster@freebsd.org) Received: (from peter@localhost) by freefall.freebsd.org (8.13.3/8.13.1/Submit) id j5RB1vjI043143 for freebsd-scsi@freebsd.org; Mon, 27 Jun 2005 11:01:57 GMT (envelope-from owner-bugmaster@freebsd.org) Date: Mon, 27 Jun 2005 11:01:57 GMT Message-Id: <200506271101.j5RB1vjI043143@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: peter set sender to owner-bugmaster@freebsd.org using -f From: FreeBSD bugmaster To: freebsd-scsi@FreeBSD.org Cc: Subject: Current problem reports assigned to you X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Jun 2005 11:01:58 -0000 Current FreeBSD problem reports Critical problems Serious problems S Submitted Tracker Resp. Description ------------------------------------------------------------------------------- o [2001/05/03] kern/27059 scsi (symbios) SCSI subsystem hangs under heav o [2001/06/29] kern/28508 scsi problems with backup to Tandberg SLR40 st o [2002/06/17] kern/39388 scsi ncr/sym drivers fail with 53c810 and more o [2002/07/22] kern/40895 scsi wierd kernel / device driver bug s [2003/09/30] kern/57398 scsi Current fails to install on mly(4) based o [2003/12/26] kern/60598 scsi wire down of scsi devices conflicts with a [2004/01/10] kern/61165 scsi [panic] kernel page fault after calling c o [2004/09/15] kern/71778 scsi 5.3 BETA3 doesnt see Adaptec 2015S FW Rev o [2004/12/02] kern/74607 scsi FreeBSD 5.3 install CD crashes on SCSI de o [2004/12/02] kern/74627 scsi Adaptec 2940U2W Can't boot 5.3 10 problems total. Non-critical problems S Submitted Tracker Resp. Description ------------------------------------------------------------------------------- o [2000/12/06] kern/23314 scsi aic driver fails to detect Adaptec 1520B o [2001/08/15] kern/29727 scsi [amr] [patch] amr_enquiry3 structure in a o [2002/02/23] kern/35234 scsi World access to /dev/pass? (for scanner) o [2002/06/02] kern/38828 scsi [feature request] DPT PM2012B/90 doesn't o [2002/10/29] kern/44587 scsi dev/dpt/dpt.h is missing defines required o [2003/10/01] kern/57469 scsi [patch] Quirk for Conner CP3500 6 problems total. From owner-freebsd-scsi@FreeBSD.ORG Mon Jun 27 20:38:26 2005 Return-Path: X-Original-To: freebsd-scsi@freebsd.org Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4500D16A41C for ; Mon, 27 Jun 2005 20:38:26 +0000 (GMT) (envelope-from dbaukus@chiaro.com) Received: from rchss002.chiaro.com (rchss002.chiaro.com [63.88.196.82]) by mx1.FreeBSD.org (Postfix) with ESMTP id 04CCE43D1F for ; Mon, 27 Jun 2005 20:38:25 +0000 (GMT) (envelope-from dbaukus@chiaro.com) Received: from rchst007.cus.chiaro.com ([192.168.8.120]) by rchss002.chiaro.com (8.12.11/8.12.11) with SMTP id j5RKZQ6W001905 for ; Mon, 27 Jun 2005 15:35:26 -0500 (CDT) (envelope-from dbaukus@chiaro.com) Received: from chiaro.com ([192.168.25.95]) by rchst007.cus.chiaro.com with Microsoft SMTPSVC(5.0.2195.6713); Mon, 27 Jun 2005 15:38:22 -0500 Message-ID: <42C06470.9080700@chiaro.com> Date: Mon, 27 Jun 2005 15:41:20 -0500 From: dave baukus Organization: Chiaro Networks User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4) Gecko/20040414 X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-scsi@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-OriginalArrivalTime: 27 Jun 2005 20:38:22.0950 (UTC) FILETIME=[2942A860:01C57B58] Subject: iur crash ? -- long X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Jun 2005 20:38:26 -0000 I have a crash on BSD4.10 w/ a heavily modified network stack, but the disk/scsi subsystem is ostensively unmodified. I'm reasonably certain that the crash is caused by iir_intr(void *arg) passing 0xcOdedead to bcopy() as a length. We have INVARIANTS enabled. The quick question is: has this been seen ? The long question goes like this: ---------------------- Here's the stack trace: #0 dumpsys () at ../../kern/kern_shutdown.c:519 #1 0xc01e7253 in boot (howto=0x100) at ../../kern/kern_shutdown.c:331 #2 0xc01e76cb in panic (fmt=0xc04cb120 "vm_fault: fault on nofault entry, addr: %lx") at ../../kern/kern_shutdown.c:635 #3 0xc03c810c in vm_fault (map=0xc059fa78, vaddr=0xf5315000, fault_type=0x1, fault_flags=0x0) at ../../vm/vm_fault.c:240 #4 0xc041ca5e in trap_pfault (frame=0xf8225d90, usermode=0x0, eva=0xf5315000) at ../../i386/i386/trap.c:921 #5 0xc041c5e0 in trap (frame={tf_fs = 0xc01e0010, tf_es = 0xffff0010, tf_ds = 0xcb3c0010, tf_edi = 0xcbaeb52e, tf_esi = 0xf5315000, tf_ebp = 0xf8225e0c, tf_isp = 0xf8225dbc, tf_ebx = 0xc0dedead, tf_edx = 0xf5222b20, tf_ecx = 0x3033ee73, tf_eax = 0xd67d652e, tf_trapno = 0xc, tf_err = 0x0, tf_eip = 0xc041b502, tf_cs = 0x8, tf_eflags = 0x10202, tf_esp = 0xcadf9000, tf_ss = 0xcb9f9000}) at ../../i386/i386/trap.c:500 #6 0xc041b502 in generic_bcopy () #7 0xc04252fd in intr_mux (arg=x) at ../../i386/isa/intr_machdep.c:609 #8 0xc040e74e in vec9 () #9 0xc01df10b in exit1 (p=0xf80c88a0, rv=0xf) at ../../kern/kern_exit.c:225 #10 0xc01e9262 in sigexit (p=0xf80c88a0, sig=0xf) at ../../kern/kern_sig.c:1519 #11 0xc01e8fa4 in postsig (sig=0xf) at ../../kern/kern_sig.c:1422 #12 0xc041d254 in syscall2 (frame={tf_fs = 0xbfbf002f, tf_es = 0x8fe002f, tf_ds = 0xbfbf002f, tf_edi = 0xbfbff800, tf_esi = 0x8fe5540, tf_ebp = 0xbfbff800, tf_isp = 0xf8225fd4, tf_ebx = 0x64, tf_edx = 0xbfbff780, tf_ecx = 0xbfbff700, tf_eax = 0x4, tf_trapno = 0x7, tf_err = 0x2, tf_eip = 0x88fe6ef4, tf_cs = 0x1f, tf_eflags = 0x203, tf_esp = 0xbfbff604, tf_ss = 0x2f}) at ../../i386/i386/trap.c:177 -------------------------------- Since intr_mux() does not call generic_bcopy()/bcopy(), the stack frames must be a mangled. The frames 4 to 7 look like: 0xf8225d10: 0xf8225d40 0xc01eabd3 0xffffffff frame 4 0xf8225d44 trap_pfault 0xf8225d20: 0xc041ca5e 0xc059fa78 0xf5315000 0x00000001 0xf8225d30: 0x00000000 0x0000000c 0xf80c88a0 0xf5315000 frame 5 trap 0xf8225d40: 0xcb3cdf01 0xf8225d88 0xc041c5e0 0xf8225d90 0xf8225d50: 0x00000000 0xf5315000 0x006c0200 0xf5315000 0xf8225d60: 0xcbaeb52e 0x00000000 0xc0dedead 0x00000000 0xf8225d70: 0xcbae4e5a 0xf8225d80 0xc0421783 0xc04217e9 calltrap 0xf8225d80: 0xf8225e0c 0xc040dcb4 0xf8225e0c 0xc040d0c0 0xf8225d90: 0xc01e0010 0xffff0010 0xcb3c0010 0xcbaeb52e 0xf8225da0: 0xf5315000 0xf8225e0c 0xf8225dbc 0xc0dedead 0xf8225db0: 0xf5222b20 0x3033ee73 0xd67d652e 0x0000000c 0xf8225dc0: 0x00000000 0xc041b502 0x00000008 0x00010202 0xf8225dd0: 0xcadf9000 0xcb9f9000 0xc01b7e11 0xf5222b20 0xf8225de0: 0xcb9f904e 0xc0dedead 0xc6048460 0x00400200 0xf8225df0: 0x00000000 0xf5222b20 0x006c0200 0x00000000 0xf8225e00: 0x00000000 0x00090001 0x006fc67b frame 7 0xf8225e24 intr_mux 0xf8225e10: 0xc04252fd 0xcadf9000 0x006c0200 0x00000000 0xf8225e20: 0xc2ee216c 0xf8225e90 0xc040e1d7 0xc6048460 ------------------------------------ It's between frame 7 (intr_mux()) and frame 5 (trap()), that I begin guessing at the sequence of events. Based on the 0xcadf9000 at 0xf8225e14 I speculate that iir_intr was the last interrupt routine called. Here is the intrec * list passed to intr_mux() set $P=(intrec *)0xc6048460 (kgdb) intrecwalk $P $186 = {mask = 0x6c0200, handler = 0xc01b7b90 , argument = 0xcadf9000, next = 0xc60483e0, name = 0xcadf3b80 "iir0", intr = 0x9, maskptr = 0xc0552b54, flags = 0x0} $187 = {mask = 0x660200, handler = 0xc03ef350 , argument = 0xcadfc000, next = 0xc6048260, name = 0xcadf3a60 "em0", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0} $188 = {mask = 0x660200, handler = 0xc03ef350 , argument = 0xcadfd000, next = 0xcae01e60, name = 0xcadf3950 "em1", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0} $189 = {mask = 0x660200, handler = 0xc03ef350 , argument = 0xcadff000, next = 0xcae01ce0, name = 0xcadf3810 "em2", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0} $190 = {mask = 0x660200, handler = 0xc03ef350 , argument = 0xcae03000, next = 0xcae01ba0, name = 0xcadf3700 "em3", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0} $191 = {mask = 0x630212, handler = 0xc03b6570 , argument = 0x0, next = 0xcae01b00, name = 0xcadf3620 "ics0", intr = 0x9, maskptr = 0xc0552b48, flags = 0x0} $192 = {mask = 0x660200, handler = 0xc03ef350 , argument = 0xcae04000, next = 0xcae01a00, name = 0xcadf3520 "em4", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0} $193 = {mask = 0x660200, handler = 0xc03ef350 , argument = 0xcae06000, next = 0xcae01880, name = 0xcadf3410 "em5", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0} $194 = {mask = 0x660200, handler = 0xc03ef350 , argument = 0xcae07000, next = 0xcae01760, name = 0xcadf3300 "em6", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0} $195 = {mask = 0x68c640, handler = 0xc03d74b0 , argument = 0xcae09000, next = 0x0, name = 0xcadf32c0 "uhci0", intr = 0x9, maskptr = 0xc0552b4c, flags = 0x0} ----------------------------------------------- Since I think its iir_intr that was called, I poke around in the stack frames between frame 7 and 5. At 0xf8225dd8 I see the value 0xc01b7e11 (kgdb) x 0xc01b7e11 0xc01b7e11 : 0x830cc483 (kgdb) disass : ... ... ... 0xc01b7dff : test %ebx,%ebx 0xc01b7e01 : je 0xc01b7e14 0xc01b7e03 : push %ebx 0xc01b7e04 : lea 0x4e(%esi),%eax 0xc01b7e07 : push %eax 0xc01b7e08 : mov 0xffffffe8(%ebp),%edx 0xc01b7e0b : push %edx 0xc01b7e0c : call 0xc041b4d8 0xc01b7e11 : add $0xc,%esp 0xc01b7e14 : cmpl $0x0,0x42(%esi) ----------------------------------------------------- Therefore, 0xcadf9000 is the struct gdt_softc * argument to iir_intr() (kgdb) set $SC=(struct gdt_softc *)0xcadf9000 (kgdb) p *$SC $217 = {sc_hanum = 0x0, sc_class = 0x5, sc_bus = 0x4, sc_slot = 0x8, sc_device = 0x600, sc_subdevice = 0x1af, sc_fw_vers = 0x22a, sc_init_level = 0x6, sc_state = 0x0, sc_dev = 0xcadf4000, sc_dpmemt = 0x1, sc_dpmemh = 0xf31c7000, sc_dpmembase = 0xf8000000, sc_parent_dmat = 0xcadfad00, sc_buffer_dmat = 0xcadfacc0, sc_gccb_dmat = 0xcadfac80, sc_gccb_dmamap = 0x0, sc_gccb_busbase = 0x1d000, sc_gccbs = 0xf51c7000, sc_free_gccb = { slh_first = 0xf51e1860}, sc_pending_gccb = {slh_first = 0xf5211440}, sc_ccb_queue = {tqh_first = 0x0, tqh_last = 0xcadf9050}, sc_ucmd_queue = {tqh_first = 0x0, tqh_last = 0xcadf9058}, sc_ic_all_size = 0x2fc0, sc_cmd_len = 0x24, sc_cmd_off = 0x24, sc_cmd_cnt = 0x1, sc_cmd = "\000\000\000\000d\000\000\000\002\000\000\000¿#\235\000\200\000\000\000ÿÿÿÿ\001\000\000\000\000\000\006â\000\000\001", '\000' , sc_info = 0x0, sc_info2 = 0x0, sc_status = 0x1000, sc_service = 0x0, sc_bus_cnt = 0x3, sc_virt_bus = 0x2, sc_bus_id = "\a\a\000\000\000", sc_more_proc = 0x0, sc_hdr = {{ hd_present = 0x1, hd_is_logdrv = 0x0, hd_is_arraydrv = 0x0, hd_is_master = 0x0, hd_is_parity = 0x0, hd_is_hotfix = 0x0, hd_master_no = 0x0, hd_lock = 0x0, hd_heads = 0xff, hd_secs = 0x3f, hd_devtype = 0x0, hd_size = 0x88efe6a, hd_ldr_no = 0x0, hd_rw_attribs = 0x0, hd_start_sec = 0x0}, {hd_present = 0x0, hd_is_logdrv = 0x0, hd_is_arraydrv = 0x0, hd_is_master = 0x0, hd_is_parity = 0x0, hd_is_hotfix = 0x0, hd_master_no = 0x0, hd_lock = 0x0, hd_heads = 0x0, hd_secs = 0x0, hd_devtype = 0x0, hd_size = 0x0, hd_ldr_no = 0x0, hd_rw_attribs = 0x0, hd_start_sec = 0x0} }, sc_raw_feat = 0x1, sc_cache_feat = 0x101, sc_dvr = {size = 0x0, eu = {stream = '\000' , driver = {ionode = 0x0, service = 0x0, index = 0x0}, async = {ionode = 0x0, service = 0x0, status = 0x0, info = 0x0, scsi_coord = "\000\000"}, sync = {ionode = 0x0, service = 0x0, status = 0x0, info = 0x0, hostdrive = 0x0, scsi_coord = "\000\000", sense_key = 0x0}, test = {l1 = 0x0, l2 = 0x0, l3 = 0x0, l4 = 0x0}}, severity = 0x0, event_string = '\000' }, sims = {0xcadfac00, 0xcadfab40, 0xcadfaa80, 0x0, 0x0, 0x0}, paths = { 0xcadf3c20, 0xcadf3bf0, 0xcadf3bc0, 0x0, 0x0, 0x0}, sc_copy_cmd = 0xc01b90d4 , sc_get_status = 0xc01b9190 , sc_intr = 0xc01b91b4 , sc_release_event = 0xc01b92d0 , sc_set_sema0 = 0xc01b92f0 , sc_test_busy = 0xc01b9310 , links = {tqe_next = 0x0, tqe_prev = 0xc04ffe80}} ----------------------------------------------------- Now I try to figure which iir_intr() code path was executed. Only the case GDT_GCF_IOCTL: code path leads to a bcopy(). I walked all the struct gdt_ccb * in gdt->sc_gccbs[], Only 1 has a non-zero gccb->gc_flags value; its value is 4 (GDT_GCF_IOCTL) (kgdb) set $SCBS=(struct gdt_ccb *)&$SC->sc_gccbs[121] (kgdb) p *$SCBS $218 = {gc_scratch = "\001\000\0013", '\000' , gc_ccb = 0xcb16c400, gc_ucmd = 0xcb9f9000, gc_dmamap = 0x0, gc_map_flag = 0x1, gc_timeout = 0x0, gc_state = 0x0, gc_service = 0x9, gc_cmd_index = 0x7b, gc_flags = 0x4, sle = {sle_next = 0x0}} --------------------------------------- Down to the bcopy(): the bcopy() decission is made off of values in gc_ucmd, and nothing good can come from using most of these values: (kgdb) set $UCMD=(gdt_ucmd_t *)$SCBS->gc_ucmd (kgdb) p *$UCMD $219 = {io_node = 0xc0de, service = 0xdead, timeout = 0xc05076a0, status = 0x1, info = 0x0, BoardNode = 0xc0ded8b2, CommandIndex = 0xc0dedead, OpCode = 0xdead, u = {cache = {DeviceNo = 0xc0de, BlockNo = 0xc0dedead, BlockCnt = 0xc0dedead, DestAddr = 0xc0dedead}, ioctl = {param_size = 0xc0de, subfunc = 0xc0dedead, channel = 0xc0dedead, p_param = 0xc0dedead}, raw = {reserved = 0xc0de, direction = 0xc0dedead, mdisc_time = 0xc0dedead, mcon_time = 0xc0dedead, sdata = 0xc0dedead, sdlen = 0xc0dedead, clen = 0xc0dedead, cmd = "­ÞÞÀ­ÞÞÀ­ÞÞÀ", target = 0xad, lun = 0xde, bus = 0x1, priority = 0x0, sense_len = 0x0, sense_data = 0x0, link_p = 0x10}}, data = "\001\000\0013", '\000' , complete_flag = 0xcb16c400, links = { tqe_next = 0xcb9f9000, tqe_prev = 0x0}} ---------------------------- Not knowing anything about iir/scsi, it appears to me that gdt->sc_gccbs[121]->gc_ucmd has been freed and yet is still referenced and in use. How is this ddt_ucmd_t * gc_ucmd data managed ? Is it actively malloc()ed and free()d ? Any clues or pointers will be appreciated. -- Dave Baukus dbaukus@chiaro.com Chiaro Networks Ltd. Richardson, Texas USA From owner-freebsd-scsi@FreeBSD.ORG Tue Jun 28 18:29:01 2005 Return-Path: X-Original-To: freebsd-scsi@freebsd.org Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 033DD16A423 for ; Tue, 28 Jun 2005 18:29:00 +0000 (GMT) (envelope-from duncan@quantumlogic.net) Received: from a34-mta02.direcway.com (a34-mta02.direcpc.com [66.82.4.91]) by mx1.FreeBSD.org (Postfix) with ESMTP id AFBE843D4C for ; Tue, 28 Jun 2005 18:29:00 +0000 (GMT) (envelope-from duncan@quantumlogic.net) Received: from mutha.quantumlogic.net (dpc6682251146.direcpc.com [66.82.251.146]) by a34-mta02.direcway.com (iPlanet Messaging Server 5.2 HotFix 1.25 (built Mar 3 2004)) with ESMTP id <0IIT005I44O69E@a34-mta02.direcway.com> for freebsd-scsi@freebsd.org; Tue, 28 Jun 2005 14:28:59 -0400 (EDT) Received: from mutha (localhost [127.0.0.1]) by mutha.quantumlogic.net (8.13.3/8.13.1) with ESMTP id j5SISh5h014595; Tue, 28 Jun 2005 11:28:45 -0700 Date: Tue, 28 Jun 2005 11:28:42 -0700 From: DK Duncan To: freebsd-scsi@freebsd.org Message-id: <200506281828.j5SISh5h014595@mutha.quantumlogic.net> Content-transfer-encoding: 7BIT Cc: duncan@quantumlogic.net Subject: 5.4-RELEASE - aac0: COMMAND 0xXXXXXXXX TIMEOUT AFTER XXX SECONDS X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Jun 2005 18:29:01 -0000 Hi Scott et al - I have an older Supermicro 370DL3 SMP motherboard with an Adaptec 2810sa controller with 8 300GB SATA Maxtor drives attached in Raid5. Under heavy load I get errors of the form: aac0: COMMAND 0xXXXXXXXX TIMEOUT AFTER XXX SECONDS When this occurs, the system is alive but the filesystem is hung. A cold boot clears the condition. Under lighter loads it appears to be fine. The OS is FreeBSD 5.3-RELEASE upgraded from source to 5.4-RELEASE with acpi disabled. The problem is reproducible by extracting a large tarball which is present on the filesystem to the same filesystem. What information do you need to debug the problem? mutha<86> uname -a FreeBSD mutha 5.4-RELEASE FreeBSD 5.4-RELEASE #0: Sun Jun 26 09:54:14 PDT 2005 duncan@mutha:/usr/src/sys/i386/compile/MUTHA.smp.pcm i386 mutha<87> more /usr/src/sys/i386/conf/MUTHA.smp.pcm # # include GENERIC options SMP # For non-pnp sound cards with no bridge drivers only: #device pcm0 at isa? irq 10 drq 1 flags 0x0 # # For PnP/PCI sound cards device sound device "snd_es137x" # # For ata device atapicam Thanks and Best Regards, ~Don -------- From owner-freebsd-scsi@FreeBSD.ORG Sat Jul 2 18:39:44 2005 Return-Path: X-Original-To: freebsd-scsi@freebsd.org Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F028016A41C for ; Sat, 2 Jul 2005 18:39:44 +0000 (GMT) (envelope-from myself@rojer.pp.ru) Received: from hermes.hw.ru (hermes.hw.ru [80.68.240.91]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7F90B43D1F for ; Sat, 2 Jul 2005 18:39:43 +0000 (GMT) (envelope-from myself@rojer.pp.ru) Received: from [80.68.243.98] (account rojer@rbc.ru HELO [80.68.243.98]) by hermes.hw.ru (CommuniGate Pro SMTP 4.1.8) with ESMTP-TLS id 84903972 for freebsd-scsi@freebsd.org; Sat, 02 Jul 2005 22:39:41 +0400 Message-ID: <42C6DF6D.3070909@rojer.pp.ru> Date: Sat, 02 Jul 2005 22:39:41 +0400 From: Deomid Ryabkov User-Agent: Mozilla Thunderbird 1.0.2-6 (X11/20050513) X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-scsi@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: mpt driver fixes (i386/67047) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Jul 2005 18:39:45 -0000 i have posted a patch in a followup to i386/67047. http://www.freebsd.org/cgi/query-pr.cgi?pr=i386/67047 looking forward for any feedback. -- Deomid Ryabkov aka Rojer myself@rojer.pp.ru rojer@sysadmins.ru ICQ: 8025844