Date: Fri, 4 Jul 2008 15:21:25 -0400 From: John Baldwin <jhb@freebsd.org> To: freebsd-current@freebsd.org Cc: gnn@freebsd.org Subject: Re: Has anyone else seen any form of in memory or on disk corruption? Message-ID: <200807041521.25711.jhb@freebsd.org> In-Reply-To: <m2r6a9poww.wl%gnn@neville-neil.com> References: <m2r6a9poww.wl%gnn@neville-neil.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Friday 04 July 2008 12:58:07 pm gnn@freebsd.org wrote: > Hi, > > I've been working on the following brain teasing (breaking?) problem > for about a week now. What I'm seeing is that on large memory > machines, those with more than 4G of RAM, the ungzipping/untarring of > files fails due to gzip thinking the file is corrupt. The way to > reproduce this is: > > 1) Create a bunch of gzip/tar balls in the 1-20MB range. > 2) Reboot FreeBSD 7.0 release > 3) Run gzip -t over all the files. > > I have hundreds of these files to run this over, and a full check > takes about 3 hours, but I usually see some form of corruption within > the first 20 minutes. > > Other important factors: > > 1) This is on very modern, 2P/4Core (8 cores total) hardware > 2) The disks are 1TB SATA set up in JBOD. > 3) The machines have 16G of RAM. > 4) Corruption is seen only after a reboot, if the machines continue to > run corruption is never seen again, until another reboot. > 5) The systems are all Xeon running amd64 > 6) The disk controller is an AMCC 9650, but we do see this very rarely > with the on board controlller. If this is one of the ATA controllers where it tries to use 63k transfers (126 * DEV_BSIZE) instead of 64k, then change it to 32k (64 * DEV_BSIZE). W/o this fix I see massive data corruption (couldn't even build a kernel with the fix, had to reinstall the box) on HT1000 ATA chipsets. Crashdumps also don't seem to work reliably w/o changing that. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200807041521.25711.jhb>