From owner-freebsd-current@FreeBSD.ORG Fri Jul 4 17:23:06 2008 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A1E35106564A for ; Fri, 4 Jul 2008 17:23:06 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from proxy.meer.net (proxy.meer.net [64.13.141.13]) by mx1.freebsd.org (Postfix) with ESMTP id 835B78FC12 for ; Fri, 4 Jul 2008 17:23:06 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from mail.meer.net (mail.meer.net [64.13.141.3]) by proxy.meer.net (8.14.2/8.14.2) with ESMTP id m64GwVUb069258 for ; Fri, 4 Jul 2008 09:58:31 -0700 (PDT) (envelope-from gnn@neville-neil.com) Received: from mail2.meer.net (mail2.meer.net [64.13.141.16]) by mail.meer.net (8.13.3/8.13.3/meer) with ESMTP id m64GwOei011364 for ; Fri, 4 Jul 2008 09:58:24 -0700 (PDT) (envelope-from gnn@neville-neil.com) Received: from minion.local.neville-neil.com (sdsl-104-228.dsl.bway.net [216.220.104.228]) (authenticated bits=0) by mail2.meer.net (8.14.1/8.14.1) with ESMTP id m64GwIS4090785 for ; Fri, 4 Jul 2008 09:58:22 -0700 (PDT) (envelope-from gnn@neville-neil.com) Date: Fri, 04 Jul 2008 12:58:07 -0400 Message-ID: From: gnn@freebsd.org To: current@freebsd.org User-Agent: Wanderlust/2.15.5 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL/10.7 Emacs/22.1.50 (i386-apple-darwin8.11.1) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII X-Canit-CHI2: 0.50 X-Bayes-Prob: 0.5 (Score 0, tokens from: ) X-Spam-Score: 0.10 () [Tag at 5.00] COMBINED_FROM X-CanItPRO-Stream: default X-Canit-Stats-ID: 862747 - 2805bb7658f9 X-Scanned-By: CanIt (www . roaringpenguin . com) on 64.13.141.13 Cc: Subject: Has anyone else seen any form of in memory or on disk corruption? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Jul 2008 17:23:06 -0000 Hi, I've been working on the following brain teasing (breaking?) problem for about a week now. What I'm seeing is that on large memory machines, those with more than 4G of RAM, the ungzipping/untarring of files fails due to gzip thinking the file is corrupt. The way to reproduce this is: 1) Create a bunch of gzip/tar balls in the 1-20MB range. 2) Reboot FreeBSD 7.0 release 3) Run gzip -t over all the files. I have hundreds of these files to run this over, and a full check takes about 3 hours, but I usually see some form of corruption within the first 20 minutes. Other important factors: 1) This is on very modern, 2P/4Core (8 cores total) hardware 2) The disks are 1TB SATA set up in JBOD. 3) The machines have 16G of RAM. 4) Corruption is seen only after a reboot, if the machines continue to run corruption is never seen again, until another reboot. 5) The systems are all Xeon running amd64 6) The disk controller is an AMCC 9650, but we do see this very rarely with the on board controlller. 7) All boards are http://www.supermicro.com/products/motherboard/Xeon1333/5400/X7DWU.cfm 8) All machines have 3 1TB drives. 9) The corruption is in 4K chunks. That is N x 4K. 10) Files are not normally corrupted on disk, but this can happen. I have already tried a few of the obvious things, such as making sure that we sync pages before we shutdown the twa driver. Given what I have seen I believe this is something that happens from startup, and not at shutdown. Thoughts? Best, George