From owner-freebsd-stable@FreeBSD.ORG Thu Aug 23 15:51:12 2007 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 29F1716A417 for ; Thu, 23 Aug 2007 15:51:12 +0000 (UTC) (envelope-from freebsd@chillt.de) Received: from dd15624.kasserver.com (dd15624.kasserver.com [85.13.136.215]) by mx1.freebsd.org (Postfix) with ESMTP id B607513C480 for ; Thu, 23 Aug 2007 15:51:11 +0000 (UTC) (envelope-from freebsd@chillt.de) Received: from hundertwasser.cs.tcd.ie (dslb-084-060-112-077.pools.arcor-ip.net [84.60.112.77]) by dd15624.kasserver.com (Postfix) with ESMTP id E0F5B182EF9C3 for ; Thu, 23 Aug 2007 17:23:03 +0200 (CEST) Message-ID: <46CDA657.9080201@chillt.de> Date: Thu, 23 Aug 2007 16:23:03 +0100 From: Bartosz Fabianowski User-Agent: Thunderbird 2.0.0.6 (X11/20070810) MIME-Version: 1.0 To: stable@freebsd.org Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: Panic and reboot with USB hard disk X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Aug 2007 15:51:12 -0000 Hi list A 2.5" USB hard disk I recently got has been giving me a lot of trouble. When using the disk, I routinely get panics or random data corruption. This happens with two separate machines, both running 6-STABLE. I found that one file residing on the disk, when read, always makes the kernel panic. While I know this smells of a hardware error (bad sector, reads failing), the disk repeatedly passed badblocks' tests, both read-only and read-write, with no errors. I am therefore thinking that this may have something to do with FreeBSD's USB stack. A kernel with no debugger simply reboots when it encounters the error, without producing a crash dump. When KDB and DDB are compiled in, I end up in the debugger prompt where "trace" points to a routine apparently handling USB interrupts. Unfortunately, I have to run "call doadump" to get a crash dump, after which kgdb seems to show backtraces of the doadump call, not of the original error. I would really appreciate any help in debugging this problem. I have debug kernels on both machines, have a working test case and am happy to run any debugger commands required. The output of a kgdb backtrace is attached, although I fear it's not of much use. As a final note, the disk is 160GB in size, has a single UFS partition and is GELI encrypted. panic: vm_fault: fault on nofault entry, addr: db4f9000 KDB: enter: panic panic: from debugger Uptime: 30s kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode fault virtual address = 0xdb4f9000 fault code = supervisor write, page not present instruction pointer = 0x20:0xc06d7580 stack pointer = 0x28:0xde342464 frame pointer = 0x28:0xde342498 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = resume, IOPL = 0 current process = 21 (irq11: cbb0 bfe0+*) Dumping 767 MB (2 chunks) chunk 0: 1MB (159 pages) ... ok chunk 1: 767MB (196270 pages) 751 735 719 703 687 671 655 639 623 607 591 575 559 543 527 511 495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15 #0 doadump () at pcpu.h:165 165 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); (kgdb) bt #0 doadump () at pcpu.h:165 #1 0xc044d196 in db_fncall (dummy1=0, dummy2=0, dummy3=1999, dummy4=0xde342294 "") at /usr/src/sys/ddb/db_command.c:492 #2 0xc044cf12 in db_command (last_cmdp=0xc07761a4, cmd_table=0x0, aux_cmd_tablep=0xc07367e0, aux_cmd_tablep_end=0xc07367e4) at /usr/src/sys/ddb/db_command.c:350 #3 0xc044d025 in db_command_loop () at /usr/src/sys/ddb/db_command.c:458 #4 0xc044f265 in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_main.c:222 #5 0xc0575f07 in kdb_trap (type=0, code=0, tf=0xde342424) at /usr/src/sys/kern/subr_kdb.c:473 #6 0xc06d98db in trap_fatal (frame=0xde342424, eva=0) at /usr/src/sys/i386/i386/trap.c:829 #7 0xc06d8ef4 in trap (frame= {tf_fs = -567017464, tf_es = -1066532824, tf_ds = -567017432, tf_edi = -615542784, tf_esi = -402886656, tf_ebp = -567008104, tf_isp = -567008176, tf_ebx = -1001486592, tf_edx = 0, tf_ecx = 1024, tf_eax = -615563264, tf_trapno = 12, tf_err = 2, tf_eip = -1066568320, tf_cs = 32, tf_eflags = 589830, tf_esp = -1001501696, tf_ss = 0}) at /usr/src/sys/i386/i386/trap.c:270 #8 0xc06c376a in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #9 0xc06d7580 in memcpy () at /usr/src/sys/i386/i386/support.s:681 Previous frame inner to this frame (corrupt stack?)