From owner-freebsd-sparc64@FreeBSD.ORG Fri May 27 12:34:10 2011 Return-Path: Delivered-To: freebsd-sparc64@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 59A3D1065673 for ; Fri, 27 May 2011 12:34:10 +0000 (UTC) (envelope-from marius@alchemy.franken.de) Received: from alchemy.franken.de (alchemy.franken.de [194.94.249.214]) by mx1.freebsd.org (Postfix) with ESMTP id 04C7B8FC1D for ; Fri, 27 May 2011 12:34:09 +0000 (UTC) Received: from alchemy.franken.de (localhost [127.0.0.1]) by alchemy.franken.de (8.14.4/8.14.4/ALCHEMY.FRANKEN.DE) with ESMTP id p4RCY4bx078295; Fri, 27 May 2011 14:34:04 +0200 (CEST) (envelope-from marius@alchemy.franken.de) Received: (from marius@localhost) by alchemy.franken.de (8.14.4/8.14.4/Submit) id p4RCY43D078294; Fri, 27 May 2011 14:34:04 +0200 (CEST) (envelope-from marius) Date: Fri, 27 May 2011 14:34:04 +0200 From: Marius Strobl To: Peter Jeremy Message-ID: <20110527123404.GB78000@alchemy.franken.de> References: <20110526234728.GA69750@server.vk2pj.dyndns.org> <20110527120659.GA78000@alchemy.franken.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110527120659.GA78000@alchemy.franken.de> User-Agent: Mutt/1.4.2.3i Cc: freebsd-sparc64@freebsd.org Subject: Re: 'make -j16 universe' gives SIReset X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 May 2011 12:34:10 -0000 On Fri, May 27, 2011 at 02:06:59PM +0200, Marius Strobl wrote: > On Fri, May 27, 2011 at 09:47:28AM +1000, Peter Jeremy wrote: > > I tried a "make -j16 universe" using a recent 8-stable on a 16-CPU > > V890 and after about 11 minutes, I got the following. This box > > had been running Solaris without problem for several years so I'm > > inclined to suspect a software issue. > > It probably doesn't hurt to check the hardware with SunVTS though. > > > Any suggestions? > > > > ERROR: CPU4 SIReset > > > > > > System State (CPU4 reporting) > > > > BBC Devices: 0000.0000.0000.000f 0000.0000.0000.000f > > BBC Arb: 0000.0000.0000.000f 0000.0000.0000.000f > > BBC Quiesce: 0000.0000.0000.0003 0000.0000.0000.0003 > > BBC WDogAct: 0000.0000.0000.0000 0000.0000.0000.0000 > > BBC POR Gen: 0000.0000.0000.0000 0000.0000.0000.0000 > > BBC XIR Gen: 0000.0000.0000.0000 0000.0000.0000.0000 > > BBC POR Src: 0000.0000.0000.0000 0000.0000.0000.0000 > > BBC XIR Src: 0000.0000.0000.000f 0000.0000.0000.000f > > BBC EBus TC: 014f.99fd.a7e6.3f29 014f.99fd.a7e6.3f29 > > > > CMP0 Core Config/Control registers: > > > > CoreAvail: 0000.0000.0000.0003 0 1 > > CoreEnabled: 0000.0000.0000.0003 0 1 > > CoreRunning: 0000.0000.0000.0003 0 1 > > XIRSteering: 0000.0000.0000.0003 0 1 > > ErrSteering: 0000.0000.0000.0000 > > > > CPU0 Config/Control/Status registers: > > > > CPUVersion: 003e.0018.3100.0507 > > SafConfig: 0caa.01bc.2000.8002 9:1 ID:0 HBM TOL:15 > > SafBaseAdr: 0000.0400.0000.0000 > > DispatchCtl: 0000.0000.0000.0009 MS SI > > DCacheCtl: 0000.0200.0000.0010 WE > > ECacheCtl: 0000.0000.01c5.5000 5:1 8MB mode=5-5-5(2) R/W-turn:2 Late-Sel ECC:off > > ErrorEnable: 0000.0000.0000.000b CEEN NCEEN UCEEN > > > > AFAR: 0000.0000.0000.0000 > > AFSR: 0000.0000.0000.0000 (no errors set) > > AFAR 2: 0000.0000.8000.0000 > > AFSR 2: 0000.0000.0000.0000 (no errors set) > > > > DMMU SFAR: 0000.0000.f3f8.c300 > > DMMU SFSR: 0000.0000.0000.0000 (no status set) > > IMMU SFSR: 0000.0000.0080.8000 TM > > > > This doesn't indicate much, especially not the address of the instruction > causing the SIR, except that there was an i-TLB miss, which seems innocuous. > Generally, FreeBSD only triggers a SIR when something really unexpected > happens in an environemt where we can't or at least can't easily trigger > a panic. The only exception to this which is not really fatal from the > OS point of view are stray vector interrupts (IIRC even OpenSolaris just > ignores a certain amount of these). You could try whether the following > patch makes any difference to the SIR you're seeing: > http://people.freebsd.org/~marius/sparc64_intr_vector_stray.diff > Generally, both USIV and V880 with USIII (which should be quite close to > a V890) are rather quirky hardware; I've already hit two CPU bugs which > are not documented in the publicly available errata. Two other things > to try is to replace the following in cheetah.c: > val &= ~DCR_DTPE; > once with: > val &= ~(DCR_DTPE | DCR_ITPE); > and once with: > val &= ~DCR_SI; > Besides that, IIRC I haven't added a workaround for the USVI+ erratum #4 > so far, which seems unlikely to be the cause of this problem though. > Err, wait, I've just noticed that your machine has USIV rather than USIV+ CPUs so the latter shouldn't apply. It's probably still worth a try whether enabling single issue via val &= ~DCR_SI; outside of the respective block also makes a difference in this case. Marius