From owner-freebsd-hardware@FreeBSD.ORG Mon Jun 23 11:45:47 2003 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 382C837B401 for ; Mon, 23 Jun 2003 11:45:47 -0700 (PDT) Received: from smtp3.adl2.internode.on.net (smtp3.adl2.internode.on.net [203.16.214.245]) by mx1.FreeBSD.org (Postfix) with ESMTP id BE0AD43FFD for ; Mon, 23 Jun 2003 11:45:45 -0700 (PDT) (envelope-from smckay@internode.on.net) Received: from smtp3.adl2.internode.on.net (localhost [127.0.0.1]) h5NIiL67047276 for ; Tue, 24 Jun 2003 04:15:43 +0930 (CST) Received: (from mailnull@localhost)h5NECpee027473 for ; Mon, 23 Jun 2003 23:42:51 +0930 (CST) X-Authentication-Warning: smtp3.adl2.internode.on.net: mailnull set sender to using -f Received: from dungeon.home (ppp155.qld.padsl.internode.on.net [150.101.176.154])h5NECo3X027467; Mon, 23 Jun 2003 23:42:51 +0930 Received: from dungeon.home (localhost [127.0.0.1]) by dungeon.home (8.12.8p1/8.11.6) with ESMTP id h5NECn7K006239; Tue, 24 Jun 2003 00:12:50 +1000 (EST) (envelope-from mckay) Message-Id: <200306231412.h5NECn7K006239@dungeon.home> To: freebsd-hardware@freebsd.org References: <200306171554.h5HFs2DQ041575@mail.synology.com> <200306181052.h5IAqTu2008960@dungeon.home> In-Reply-To: <200306181052.h5IAqTu2008960@dungeon.home> from Stephen McKay at "Wed, 18 Jun 2003 20:52:29 +1000" Date: Tue, 24 Jun 2003 00:12:49 +1000 From: Stephen McKay X-Streamed-Recipients: X-Scanned-By: MIMEDefang 2.33 (www . roaringpenguin . com / mimedefang) cc: Stephen McKay Subject: Re: ATA READ command timeout (and worse) X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Jun 2003 18:45:47 -0000 On Wednesday, 18th June 2003, Stephen McKay wrote: >I recompiled the kernel with DDB. A few test runs and I got this: > >Jun 18 19:19:44 peon /kernel: ad4: no status, reselecting device >Jun 18 19:19:44 peon /kernel: ad4: timeout sending command=c8 s=ff e=00 >Jun 18 19:19:44 peon /kernel: ad4: error executing command - resetting >Jun 18 19:19:44 peon /kernel: ata2: resetting devices .. >Jun 18 19:19:44 peon /kernel: ad4: removed from configuration >Jun 18 19:19:44 peon /kernel: ad5: removed from configuration >Jun 18 19:19:44 peon /kernel: done > >Fatal trap 12: page fault while in kernel mode >fault virtual address = 0x63657865 After I compiled with INVARIANTS, my crash changes a little: Fatal trap 12: page fault while in kernel mode fault virtual address = 0xdeadc0de fault code = supervisor read, page not present instruction pointer = 0x8:0xc012c7f0 stack pointer = 0x10:0xcd7fdbbc frame pointer = 0x10:0xcd7fdbc8 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 397 (diff) interrupt mask = bio kernel: type 12 trap, code=0 Stopped at ad_detach+0x34: cmpl %esi,0(%ebx) db> trace ad_detach(c11ba12c,0) at ad_detach+0x34 ata_reinit(c11ba100,c11ba100,0,0,0) at ata_reinit+0x86 ad_transfer(c1302280) at ad_transfer+0x49c ata_start(c11ba100,0,c12468a4,c651b860,c1246958) at ata_start+0x98 adstrategy(c651b860,c651b860,cd05e780,cd7fdc74,c0192262) at adstrategy+0x95 diskstrategy(c651b860,c1253800,c651b860,c1293a00,cd7fdc80) at diskstrategy+0x95 ... The "0xdeadc0de" address implies the illegal reuse of a freed structure. It looks (after poking about a bit in DDB) that ad_detach is reusing a datastructure after it has been freed. And indeed, it is. After the ad_free(request) call, the request->chain field is used implicitly in the TAILQ_FOREACH() macro, causing hideous painful death. Solution is to do the TAILQ_FOREACH() manually, and a little more carefully. Then I'll be able to see if having my disks disappear is recoverable. Woo hoo! First bug found. I'll see if I can find more. :-) Stephen.