From owner-freebsd-sparc64@FreeBSD.ORG Thu Mar 13 00:50:03 2008 Return-Path: Delivered-To: freebsd-sparc64@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AB994106566B for ; Thu, 13 Mar 2008 00:50:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id A62588FC12 for ; Thu, 13 Mar 2008 00:50:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m2D0o3ZS008343 for ; Thu, 13 Mar 2008 00:50:03 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m2D0o3oa008342; Thu, 13 Mar 2008 00:50:03 GMT (envelope-from gnats) Date: Thu, 13 Mar 2008 00:50:03 GMT Message-Id: <200803130050.m2D0o3oa008342@freefall.freebsd.org> To: freebsd-sparc64@FreeBSD.org From: jpd@dsb.tudelft.nl Cc: Subject: Re: sparc64/121539: Interrupt storm booting 7.0-R/sparc64 on ultra5 X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: jpd@dsb.tudelft.nl List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Mar 2008 00:50:03 -0000 The following reply was made to PR sparc64/121539; it has been noted by GNATS. From: jpd@dsb.tudelft.nl To: Marius Strobl Cc: bug-followup@freebsd.org Subject: Re: sparc64/121539: Interrupt storm booting 7.0-R/sparc64 on ultra5 Date: Thu, 13 Mar 2008 01:13:39 +0100 On Wed, Mar 12, 2008 at 23:54:45 +0100, Marius Strobl wrote: [snip!] > Vector 2016 is the ATA controller and the ata(4)/acd(4) apparently > has some problems accessing the CD. Could you please check whether > the cabling and the drive are ok and functional? Apologies for the narrative. The answer to your question is in the next and the paragraphs before the last interrupt storm. The rest is me attempting to be thorough. In short: Yes, overall I think they're ok. I just checked and the cable on the hard drive end said 'click' when I pushed on it on the drive's side. A marginal connection seems likely. The other connections seem to be ok, if old ata33-only cables. The cdrom I swapped with a then-new dvd drive (IE it's not sun-original) and it should be ok. It was used for installing 5.4 and solaris 10 from dvd a while back. The system has been mostly offline in the meantime. I'd like to note that booting 5.4 (which I did before and after trying to boot 7.0 for the first time) didn't have the problem, but 7.0 did, both while booting from cdrom and from hard drive, so whether that was an actual marginal connection, I guess we'll find out next (see below). I probably should've made the connection between the one and the other notice, altough not knowing what vector 2016 was, I substituted ignorance and went ahead. I noticed that *eventually* it'll go through, maybe prodded along by sending a couple of breaks, at which point I rolled a 7.0 base+man over the previous one. Once it booted it stopped complaining, mostly. Then I checked out src and built a custom kernel. Installing it would get me DMA errors when it got to the twe module, altough (again) brute force eventually got around it. On a lark I checked out the relevant bits of ports and installed smartmontools, and ran an offline test. The output looked all green except for a non-zero but low (14) Reallocated_Event_Count. So I think the hard disk drive and presumably the dvd drive are in reasonable shape. While I'm writing this the machine sat twirling away just as it did before, *very slowly* twirling away loading the kernel (which it did do much faster even with the interrupt storm messages coming up later) and eventually getting to a bootstage, but it will then panic. If this keeps up after I get a fresh image on it, I'll ask for help about that. Consoles: Open Firmware console Booting with sun4u support. FreeBSD/sparc64 bootstrap loader, Revision 1.0 (root@obrian.cse.buffalo.edu, Sun Feb 24 17:36:50 UTC 2008) bootpath="/pci@1f,0/pci@1,1/ide@3/disk@0,0:a" Loading /boot/defaults/loader.conf /boot/kernel/kernel data=0x412648+0x5b2a8 syms=[0x8+0x59340+0x8+0x4e312] / Hit [Enter] to boot immediately, or any other key for command prompt. Booting [/boot/kernel/kernel]... nothing to autoload yet. jumping to kernel entry at 0xc0060000. Copyright (c) 1992-2008 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.0-RELEASE #1: Tue Mar 11 21:11:39 UTC 2008 root@aquablue.local:/usr/src/sys/sparc64/compile/AQUABLUE panic: trap: memory address not aligned Uptime: 1s I might very well have forgotten something important in the compile, but I can't help but wonder why it started to load so slowly after I installed my custom kernel. Compile #0 worked, though. I'll see what happens when I get it to boot GENERIC again, compile again, and so forth. Now, long story short: I double-checked the connections, closed up the case, and booted GENERIC from the install cd again. Booting with hw.ata.atapi_dma=0 and .ata_dma=0 makes the interrupt storm go away, altough it will still complain: acd0: FAILURE - READ_BIG ILLEGAL REQUEST asc=0x64 ascq=0x00 GEOM_LABEL: Label for provider acd0 is iso9660/FreeBSD_Install. acd0: FAILURE - READ_BIG ILLEGAL REQUEST asc=0x64 ascq=0x00 Only three lines though. atapi_dma=0 and ata_dma=1 does the same. atapi_dma=1 and ata_dma=0 brings the interrupt storms back again. While in an emergency shell booted with hw.ata.atapi_dma=0 I managed to trigger an interrupt storm by accessing the cdrom (`ls') anyway: interrupt storm detected on "vec2016:"; throttling interrupt source interrupt storm detected on "vec2016:"; throttling interrupt source interrupt storm detected on "vec2016:"; throttling interrupt source ata2: reiniting channel .. ata2: reset tp1 mask=03 ostat0=51 ostat1=00 ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00 ata2: stat1=0x00 err=0x01 lsb=0x00 msb=0x00 ata2: reset tp2 stat0=50 stat1=00 devices=0x1 ad0: setting PIO4 on CMD 646 chip ad0: setting WDMA2 on CMD 646 chip ata2: reinit done .. ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=376800 interrupt storm detected on "vec2016:"; throttling interrupt source interrupt storm detected on "vec2016:"; throttling interrupt source interrupt storm detected on "vec2016:"; throttling interrupt source interrupt storm detected on "vec2016:"; throttling interrupt source ata2: reiniting channel .. ata2: reset tp1 mask=03 ostat0=51 ostat1=00 ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00 ata2: stat1=0x00 err=0x01 lsb=0x00 msb=0x00 ata2: reset tp2 stat0=50 stat1=00 devices=0x1 ad0: setting PIO4 on CMD 646 chip ad0: setting WDMA2 on CMD 646 chip ata2: reinit done .. ad0: TIMEOUT - READ_DMA retrying (0 retries left) LBA=376800 interrupt storm detected on "vec2016:"; throttling interrupt source interrupt storm detected on "vec2016:"; throttling interrupt source interrupt storm detected on "vec2016:"; throttling interrupt source ata2: reiniting channel .. ata2: reset tp1 mask=03 ostat0=51 ostat1=00 ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00 ata2: stat1=0x00 err=0x01 lsb=0x00 msb=0x00 ata2: reset tp2 stat0=50 stat1=00 devices=0x1 ad0: setting PIO4 on CMD 646 chip ad0: setting WDMA2 on CMD 646 chip ata2: reinit done .. ad0: FAILURE - READ_DMA timed out LBA=376800 g_vfs_done():ad0a[READ(offset=192921600, length=16384)]error = 5 interrupt storm detected on "vec2016:"; throttling interrupt source interrupt storm detected on "vec2016:"; throttling interrupt source interrupt storm detected on "vec2016:"; throttling interrupt source ata2: reiniting channel .. ata2: reset tp1 mask=03 ostat0=51 ostat1=00 ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00 ata2: stat1=0x00 err=0x01 lsb=0x00 msb=0x00 ata2: reset tp2 stat0=50 stat1=00 devices=0x1 ad0: setting PIO4 on CMD 646 chip ad0: setting WDMA2 on CMD 646 chip ata2: reinit done .. ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=376800 interrupt storm detected on "vec2016:"; throttling interrupt source interrupt storm detected on "vec2016:"; throttling interrupt source interrupt storm detected on "vec2016:"; throttling interrupt source interrupt storm detected on "vec2016:"; throttling interrupt source ata2: reiniting channel .. ata2: reset tp1 mask=03 ostat0=51 ostat1=00 ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00 ata2: stat1=0x00 err=0x01 lsb=0x00 msb=0x00 ata2: reset tp2 stat0=50 stat1=00 devices=0x1 ad0: setting PIO4 on CMD 646 chip ad0: setting WDMA2 on CMD 646 chip ata2: reinit done .. ad0: FAILURE - READ_DMA timed out LBA=376800 g_vfs_done():ad0a[READ(offset=192921600, length=16384)]error = 5 ls: firmware: Input/output error ls: kernel.generic: Input/output error ls: zfs: Input/output error boot1 kernel/ loader.4th loader.rc defaults/ kernel.5.4/ loader.conf modules/ device.hints loader* loader.help support.4th Fixit# mount /dev/md0 on / (ufs, local) devfs on /dev (devfs, local) /dev/acd0 on /dist (cd9660, local, read-only) /dev/ad0a on /mnt (ufs, local) /dev/ad0d on /mnt/usr/local (ufs, local, soft-updates) Fixit# I'm not sure why `zfs' reports an i/o error.