From owner-freebsd-amd64@FreeBSD.ORG Fri Feb 1 22:29:22 2008 Return-Path: Delivered-To: freebsd-amd64@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CBF1216A418; Fri, 1 Feb 2008 22:29:22 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from speedfactory.net (mail.speedfactory.net [66.23.216.219]) by mx1.freebsd.org (Postfix) with ESMTP id 2DB6813C457; Fri, 1 Feb 2008 22:29:21 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (unverified [66.23.211.162]) by speedfactory.net (SurgeMail 3.8s) with ESMTP id 230523681-1834499 for multiple; Fri, 01 Feb 2008 17:28:05 -0500 Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1]) (authenticated bits=0) by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m11MTH0Q035032; Fri, 1 Feb 2008 17:29:17 -0500 (EST) (envelope-from jhb@freebsd.org) From: John Baldwin To: gnn@freebsd.org Date: Fri, 1 Feb 2008 14:55:25 -0500 User-Agent: KMail/1.9.7 References: <200801310617.16333.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200802011455.25551.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]); Fri, 01 Feb 2008 17:29:17 -0500 (EST) X-Virus-Scanned: ClamAV 0.91.2/5643/Fri Feb 1 16:23:24 2008 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: freebsd-amd64@freebsd.org Subject: Re: Recent problems with 6-STABLE... X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Feb 2008 22:29:22 -0000 On Thursday 31 January 2008 11:12:40 pm gnn@freebsd.org wrote: > At Thu, 31 Jan 2008 06:17:16 -0500, > John Baldwin wrote: > > > > On Thursday 31 January 2008 04:37:13 am gnn@freebsd.org wrote: > > > At Tue, 29 Jan 2008 11:57:39 -0500, > > > John Baldwin wrote: > > > > > > > > On Tuesday 29 January 2008 07:32:16 am gnn@freebsd.org wrote: > > > > > Hi, > > > > > > > > > > I have two boxes running 6-STABLE, post 6.3 release, which have both > > > > > spontaneously rebooted, one under load and one not under load. I have > > > > > attached dmesg and some traceback information, from the one trace that > > > > > looked interesting. Any thoughts or hints would be apprecated. > > > > > > > > > > To save you scanning all the dmesg first these are dual processor XEON > > > > > boxes, each processor has 4 cores. > > > > > > > > Can you do 'x/i 0xffffffff80296642' to show which instruction faulted? > > > > > > (kgdb) x/i 0xffffffff80296642 > > > 0xffffffff80296642 : cmp %ecx,0x8(%rdx) > > > > Hmm, and rdx from your last post was: > > > > > printf "%x\n" 32491047111385957 > > 736e6f69746365 > > > > > echo "0x73 0x6e 0x6f 0x69 0x74 0x63 0x65" | dh > > snoitce > > > > so it appears you have a data corruption issue. You could check the > > hardware (RAM, etc.) but if that is ok you might want to see if you > > can isolate it to a specific driver if a driver has a bug (or > > hardware has an errata we don't work around yet). Do you have any > > custom drivers for hardware that does DMA? If not, which storage > > driver (including pciconf output if ATA) and NIC(s) does this box > > have? Also, how much RAM? > > Custom drivers? Not that I know of. This box uses Intel Pro/1000 > network drivers and Adaptec AIC7902 SCSI for talking to the disks. > > The box has 8G of RAM in 2G chunks (which has now been subjected to 40 > memtests and passed). Try hw.physmem=4g at the loader to see if it fixes it. If so, it's a bug with bounce buffering. -- John Baldwin