From owner-freebsd-current@FreeBSD.ORG Fri Jul 4 19:21:54 2008 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8BE68106567A; Fri, 4 Jul 2008 19:21:54 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 0B89B8FC0C; Fri, 4 Jul 2008 19:21:53 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from localhost.corp.yahoo.com (john@localhost [IPv6:::1]) (authenticated bits=0) by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m64JLk0Y068377; Fri, 4 Jul 2008 15:21:47 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: freebsd-current@freebsd.org Date: Fri, 4 Jul 2008 15:21:25 -0400 User-Agent: KMail/1.9.7 References: In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200807041521.25711.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [IPv6:::1]); Fri, 04 Jul 2008 15:21:47 -0400 (EDT) X-Virus-Scanned: ClamAV 0.93.1/7638/Fri Jul 4 07:41:46 2008 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,NO_RELAYS autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: gnn@freebsd.org Subject: Re: Has anyone else seen any form of in memory or on disk corruption? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Jul 2008 19:21:54 -0000 On Friday 04 July 2008 12:58:07 pm gnn@freebsd.org wrote: > Hi, > > I've been working on the following brain teasing (breaking?) problem > for about a week now. What I'm seeing is that on large memory > machines, those with more than 4G of RAM, the ungzipping/untarring of > files fails due to gzip thinking the file is corrupt. The way to > reproduce this is: > > 1) Create a bunch of gzip/tar balls in the 1-20MB range. > 2) Reboot FreeBSD 7.0 release > 3) Run gzip -t over all the files. > > I have hundreds of these files to run this over, and a full check > takes about 3 hours, but I usually see some form of corruption within > the first 20 minutes. > > Other important factors: > > 1) This is on very modern, 2P/4Core (8 cores total) hardware > 2) The disks are 1TB SATA set up in JBOD. > 3) The machines have 16G of RAM. > 4) Corruption is seen only after a reboot, if the machines continue to > run corruption is never seen again, until another reboot. > 5) The systems are all Xeon running amd64 > 6) The disk controller is an AMCC 9650, but we do see this very rarely > with the on board controlller. If this is one of the ATA controllers where it tries to use 63k transfers (126 * DEV_BSIZE) instead of 64k, then change it to 32k (64 * DEV_BSIZE). W/o this fix I see massive data corruption (couldn't even build a kernel with the fix, had to reinstall the box) on HT1000 ATA chipsets. Crashdumps also don't seem to work reliably w/o changing that. -- John Baldwin