From owner-freebsd-amd64@FreeBSD.ORG Thu Jan 31 12:16:24 2008 Return-Path: Delivered-To: freebsd-amd64@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D4D6F16A503; Thu, 31 Jan 2008 12:16:24 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from speedfactory.net (mail.speedfactory.net [66.23.216.219]) by mx1.freebsd.org (Postfix) with ESMTP id 294A913C474; Thu, 31 Jan 2008 12:16:23 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (unverified [66.23.211.162]) by speedfactory.net (SurgeMail 3.8s) with ESMTP id 230312683-1834499 for multiple; Thu, 31 Jan 2008 07:14:56 -0500 Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1]) (authenticated bits=0) by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m0VCFpqa018988; Thu, 31 Jan 2008 07:16:06 -0500 (EST) (envelope-from jhb@freebsd.org) From: John Baldwin To: gnn@freebsd.org Date: Thu, 31 Jan 2008 06:17:16 -0500 User-Agent: KMail/1.9.7 References: <200801291157.39514.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200801310617.16333.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]); Thu, 31 Jan 2008 07:16:06 -0500 (EST) X-Virus-Scanned: ClamAV 0.91.2/5622/Thu Jan 31 06:00:29 2008 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: freebsd-amd64@freebsd.org Subject: Re: Recent problems with 6-STABLE... X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 Jan 2008 12:16:24 -0000 On Thursday 31 January 2008 04:37:13 am gnn@freebsd.org wrote: > At Tue, 29 Jan 2008 11:57:39 -0500, > John Baldwin wrote: > > > > On Tuesday 29 January 2008 07:32:16 am gnn@freebsd.org wrote: > > > Hi, > > > > > > I have two boxes running 6-STABLE, post 6.3 release, which have both > > > spontaneously rebooted, one under load and one not under load. I have > > > attached dmesg and some traceback information, from the one trace that > > > looked interesting. Any thoughts or hints would be apprecated. > > > > > > To save you scanning all the dmesg first these are dual processor XEON > > > boxes, each processor has 4 cores. > > > > Can you do 'x/i 0xffffffff80296642' to show which instruction faulted? > > (kgdb) x/i 0xffffffff80296642 > 0xffffffff80296642 : cmp %ecx,0x8(%rdx) Hmm, and rdx from your last post was: > printf "%x\n" 32491047111385957 736e6f69746365 > echo "0x73 0x6e 0x6f 0x69 0x74 0x63 0x65" | dh snoitce so it appears you have a data corruption issue. You could check the hardware (RAM, etc.) but if that is ok you might want to see if you can isolate it to a specific driver if a driver has a bug (or hardware has an errata we don't work around yet). Do you have any custom drivers for hardware that does DMA? If not, which storage driver (including pciconf output if ATA) and NIC(s) does this box have? Also, how much RAM? -- John Baldwin