From owner-freebsd-sparc64@FreeBSD.ORG Mon Dec 1 04:09:28 2003 Return-Path: Delivered-To: freebsd-sparc64@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6C59F16A4CE for ; Mon, 1 Dec 2003 04:09:28 -0800 (PST) Received: from mail.gmx.net (pop.gmx.de [213.165.64.20]) by mx1.FreeBSD.org (Postfix) with SMTP id 10CB043FDD for ; Mon, 1 Dec 2003 04:09:25 -0800 (PST) (envelope-from tmoestl@gmx.net) Received: (qmail 26024 invoked by uid 65534); 1 Dec 2003 12:09:23 -0000 Received: from p508E7F32.dip.t-dialin.net (EHLO timesink.dyndns.org) (80.142.127.50) by mail.gmx.net (mp016) with SMTP; 01 Dec 2003 13:09:23 +0100 X-Authenticated: #5374206 Received: by raven (Postfix, from userid 1001) id 4C6F473; Mon, 1 Dec 2003 13:09:23 +0100 (CET) Date: Mon, 1 Dec 2003 13:09:23 +0100 From: Thomas Moestl To: Robert Watson Message-ID: <20031201120923.GA3276@timesink.dyndns.org> References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="zYM0uCDKw75PZbzx" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.5.1i cc: sparc@freebsd.org Subject: Re: panic: trap: memory address not aligned in ata_prtdev() with Nov 18 GENERIC X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Dec 2003 12:09:28 -0000 --zYM0uCDKw75PZbzx Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Sun, 2003/11/30 at 20:29:09 -0500, Robert Watson wrote: > Unfortunately, I didn't have dumps set up on this box. On the other hand, > given that the panic was in the ata code, perhaps I wouldn't have got a > dump anyway. This was with a November 18th GENERIC kernel on a blade100. > dmesg also below. This appears to be highly reproduceable, and might be a > property of the bgfsck running on the system. > > [...] > > db> show msgbuf > msgbufp = 0xfffff80000407fe0 > magic = 63062, size = 32736, r= 4790, w = 4860, ptr = 0xfffff80000400000, > cksum= > 377365 > panic: trap: memory address not aligned > cpuid = 0; > Debugger("panic") > ... > db> trace > panic() at panic+0x174 > trap() at trap+0x3b4 > -- memory address not aligned sfar=0xdedeadc0ee sfsr=0x40029 > %o7=0xc007eda8 -- > ata_prtdev() at ata_prtdev+0x14 > ata_timeout() at ata_timeout+0x130 > softclock() at softclock+0x1a0 > ithread_loop() at ithread_loop+0x1b8 > fork_exit() at fork_exit+0x84 > fork_trampoline() at fork_trampoline+0x8 This can happen when an ATA operation times out, and is caused by an access to a freed structure. I have attached a workaround; IIRC sos is developing a more complete fix for this. ISTR the timeouts were caused by the fact that Blade 100s come with ATA66-capable disks and controllers, but a non-ATA66 (40 pin) cable, and that for some reason the driver check to catch this situation did not work. I am not seeing this on my machine because I replaced the cable long ago when I added another disk. Can you confirm that your box does only have a 40 pin cable? - Thomas -- Thomas Moestl http://www.tu-bs.de/~y0015675/ http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C --zYM0uCDKw75PZbzx Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="ata-timo.diff" Index: ata-queue.c =================================================================== RCS file: /vol/ncvs/src/sys/dev/ata/ata-queue.c,v retrieving revision 1.11 diff -u -r1.11 ata-queue.c --- ata-queue.c 20 Oct 2003 14:28:37 -0000 1.11 +++ ata-queue.c 20 Nov 2003 00:56:48 -0000 @@ -316,6 +316,8 @@ ata_timeout(struct ata_request *request) { struct ata_channel *ch = request->device->channel; + struct ata_device *reqdev = request->device; + char *reqstr = ata_cmd2str(request); int quiet = request->flags & ATA_R_QUIET; /* clear timeout etc */ @@ -324,10 +326,11 @@ /* call hw.interrupt to try finish up the command */ ch->hw.interrupt(request->device->channel); if (ch->running != request) { + /* request might already be freed - use copies. */ if (!quiet) - ata_prtdev(request->device, + ata_prtdev(reqdev, "WARNING - %s recovered from missing interrupt\n", - ata_cmd2str(request)); + reqstr); return; } --zYM0uCDKw75PZbzx--