Date: Fri, 1 Feb 2008 14:55:25 -0500 From: John Baldwin <jhb@freebsd.org> To: gnn@freebsd.org Cc: freebsd-amd64@freebsd.org Subject: Re: Recent problems with 6-STABLE... Message-ID: <200802011455.25551.jhb@freebsd.org> In-Reply-To: <m27ihp5n7r.wl%gnn@neville-neil.com> References: <m2fxwgx167.wl%gnn@neville-neil.com> <200801310617.16333.jhb@freebsd.org> <m27ihp5n7r.wl%gnn@neville-neil.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday 31 January 2008 11:12:40 pm gnn@freebsd.org wrote: > At Thu, 31 Jan 2008 06:17:16 -0500, > John Baldwin wrote: > > > > On Thursday 31 January 2008 04:37:13 am gnn@freebsd.org wrote: > > > At Tue, 29 Jan 2008 11:57:39 -0500, > > > John Baldwin wrote: > > > > > > > > On Tuesday 29 January 2008 07:32:16 am gnn@freebsd.org wrote: > > > > > Hi, > > > > > > > > > > I have two boxes running 6-STABLE, post 6.3 release, which have both > > > > > spontaneously rebooted, one under load and one not under load. I have > > > > > attached dmesg and some traceback information, from the one trace that > > > > > looked interesting. Any thoughts or hints would be apprecated. > > > > > > > > > > To save you scanning all the dmesg first these are dual processor XEON > > > > > boxes, each processor has 4 cores. > > > > > > > > Can you do 'x/i 0xffffffff80296642' to show which instruction faulted? > > > > > > (kgdb) x/i 0xffffffff80296642 > > > 0xffffffff80296642 <pfs_exit+114>: cmp %ecx,0x8(%rdx) > > > > Hmm, and rdx from your last post was: > > > > > printf "%x\n" 32491047111385957 > > 736e6f69746365 > > > > > echo "0x73 0x6e 0x6f 0x69 0x74 0x63 0x65" | dh > > snoitce > > > > so it appears you have a data corruption issue. You could check the > > hardware (RAM, etc.) but if that is ok you might want to see if you > > can isolate it to a specific driver if a driver has a bug (or > > hardware has an errata we don't work around yet). Do you have any > > custom drivers for hardware that does DMA? If not, which storage > > driver (including pciconf output if ATA) and NIC(s) does this box > > have? Also, how much RAM? > > Custom drivers? Not that I know of. This box uses Intel Pro/1000 > network drivers and Adaptec AIC7902 SCSI for talking to the disks. > > The box has 8G of RAM in 2G chunks (which has now been subjected to 40 > memtests and passed). Try hw.physmem=4g at the loader to see if it fixes it. If so, it's a bug with bounce buffering. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200802011455.25551.jhb>