From owner-freebsd-sparc64@FreeBSD.ORG Mon Jul 21 12:44:38 2003 Return-Path: Delivered-To: freebsd-sparc64@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1756F37B401 for ; Mon, 21 Jul 2003 12:44:38 -0700 (PDT) Received: from collab.or8.net (collab.or8.net [209.94.128.81]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8E48B43F93 for ; Mon, 21 Jul 2003 12:44:37 -0700 (PDT) (envelope-from cjacknospamthanks@klatsch.org) Received: by collab.or8.net (Postfix, from userid 1002) id E33345350; Mon, 21 Jul 2003 15:44:36 -0400 (EDT) Date: Mon, 21 Jul 2003 15:44:36 -0400 From: Chris Jackman To: freebsd-sparc64@freebsd.org Message-ID: <20030721194436.GA42900@collab.or8.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.1i Subject: correctable DMA error AFAR X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Jul 2003 19:44:38 -0000 Hola senores! Error messages: pcib0: correctable DMA error AFAR 0x476d6140 AFSR 0x40e600003f800000 and pcib0: correctable DMA error AFAR 0x40adbc40 AFSR 0x40c400003f800000 My e250 has locked up twice in the last few weeks with these error messages. The error gets repeated over and over again on the serial console, and I can't do anything to the box except power cycle it. The first time it happened, I was transferring about 10 5gig files from another machine on the same switch. The second time, the machine was idle. I see the error message in /u/s/sys/sparc64/pci/psycho.c, in psycho_ce() at line 751. My world and kernel are from ~July 10th, and I have the latest psycho.c (1.41). How come this error is correctable, but the other functions around this one are all uncorrectable? Perhaps this function should also panic, since my machine is unusable when this error occurs. Also, is there a way to send a break over the serial console? I can send it with cu using ~#, but the e250 doesn't respond to it. My guess is that it is a solaris feature to catch the break signal, and drop to the Openboot Firmware. I'll hook up a keyboard to this machine with boot.conf settings to get the console output to the serial port, and if the error happens again I'll try ctrl-alt-escape on the keyboard to try and get to the debugger. Thanks!