From owner-freebsd-sparc64@FreeBSD.ORG Fri Nov 14 11:11:11 2003 Return-Path: Delivered-To: freebsd-sparc64@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9AC0116A4CE for ; Fri, 14 Nov 2003 11:11:11 -0800 (PST) Received: from carver.gumbysoft.com (carver.gumbysoft.com [66.220.23.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1673443F93 for ; Fri, 14 Nov 2003 11:11:11 -0800 (PST) (envelope-from dwhite@gumbysoft.com) Received: by carver.gumbysoft.com (Postfix, from userid 1000) id 09C8F72DB8; Fri, 14 Nov 2003 11:11:11 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by carver.gumbysoft.com (Postfix) with ESMTP id 07F3372DB5 for ; Fri, 14 Nov 2003 11:11:11 -0800 (PST) Date: Fri, 14 Nov 2003 11:11:11 -0800 (PST) From: Doug White To: sparc64@freebsd.org Message-ID: <20031114105853.A92204@carver.gumbysoft.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: ultra5/cmd646 hang X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Nov 2003 19:11:11 -0000 In my continuing quest to get -current working on the ultra5 here, I've been able to get it to break into the debugger (yay DEBUGGER_ON_POWERFAIL). Initial details: FreeBSD 5.1-CURRENT #2: Fri Nov 14 10:51:48 PST 2003 dwhite@dwsparc.looksmart.com:/usr/src/sys/sparc64/compile/SPARC Preloaded elf kernel "/boot/kernel/kernel" at 0xc0330000. Timecounter "tick" frequency 270000000 Hz quality 0 real memory = 134217728 (128 MB) avail memory = 104366080 (99 MB) cpu0: Sun Microsystems UltraSparc-IIi Processor (270.00 MHz CPU) atapci0: port 0xc00020-0xc0002f,0xc00018-0xc0001b,0xc00010-0xc00017,0xc00008-0xc0000b,0xc00000-0xc00007 at device 3.0 on pci1 Note this system is running source as of 11/1 to eliminate the interrupt code changes as any source of the problem. The problem: Timecounters tick every 10.000 msec ad0: WARNING - SETFEATURES recovered from missing interrupt ad0: WARNING - SETFEATURES recovered from missing interrupt ad0: WARNING - SET_MULTI recovered from missing interrupt ad0: WARNING - SETFEATURES recovered from missing interrupt GEOM: create disk ad0 dp=0xfffff8001075eec0 ad0: 38182MB [77578/16/63] at ata2-master WDMA2 ad0: WARNING - READ_DMA recovered from missing interrupt acd0: WARNING - MODE_SENSE_BIG recovered from missing interrupt ad0: WARNING - READ_DMA recovered from missing interrupt ata3: resetting devices .. ### we are hanging here ### Analysis: Compiling with WITNESS & INVARIANTS causes none of these messages, the system locks solid just after the "timecounter" message. I'm thinking there is a timing issue a la the log message for ata-lowlevel.c rev 1.21. Setting the loader tunable hw.ata.ata_dma=0 has no effect on the problem (although the disk comes up at PIO4 as expected). Using the power switch and checking "show intr", it appears the ATA chip is getting stuck with the interrupt asserted: db> show intr fast pil13 5 ithrd pil2 3016775 pcib0 vec2021 2 sab0 vec2027 5 atapci0 vec2016 3016775 tick pil14 5067 Next steps: ata-chipset.c has a special interrupt override for 648 and newer chipsets, into the SiI bits. I'll try changing the interrupt handler registration to use SIIINTR. Hints appreciated :-) -- Doug White | FreeBSD: The Power to Serve dwhite@gumbysoft.com | www.FreeBSD.org