From owner-freebsd-stable@FreeBSD.ORG Sun May 22 13:40:57 2005 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BBA1516A41C; Sun, 22 May 2005 13:40:57 +0000 (GMT) (envelope-from scrappy@hub.org) Received: from ganymede.hub.org (blk-224-176-51.eastlink.ca [24.224.176.51]) by mx1.FreeBSD.org (Postfix) with ESMTP id 749AE43D1D; Sun, 22 May 2005 13:40:57 +0000 (GMT) (envelope-from scrappy@hub.org) Received: by ganymede.hub.org (Postfix, from userid 1000) id 2D78938C3A; Sun, 22 May 2005 10:40:51 -0300 (ADT) Received: from localhost (localhost [127.0.0.1]) by ganymede.hub.org (Postfix) with ESMTP id 2649D382A6; Sun, 22 May 2005 10:40:51 -0300 (ADT) Date: Sun, 22 May 2005 10:40:50 -0300 (ADT) From: "Marc G. Fournier" To: freebsd-questions@freebsd.org Message-ID: <20050522101434.W61528@ganymede.hub.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-stable@freebsd.org Subject: [4.11-STABLE] sporatic directory corruption with unionfs ... X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 May 2005 13:40:57 -0000 I have 6 servers in place right now, with the 6th being a brand new Intel 1U box, using a 3Ware 9500 controller to do RAID5 for 3xSATA drives ... yesterday, for the first time, I've started to get a sort of file system corruption that I've never seen before ... basically, it looks like the directory entries are getting corrupted, but the files themselves are fine ... Basically, I have a jail, with /usr in the jail being a unionfs from a template ... so: mount_union -b /template/usr /jail/usr in the template, there is a directory /usr/local/cyrus/bin, for instance ... the binaries under /usr/local/cyrus/bin are literally disappearing, as if someone were entring that directory and doing a 'rm' of those files ... I have a serial console attached to this server, and there are no errors being reported by the operating system ... tw_cli is showing that drives and unit are both functioning properly ... so I can find no reasons for the apparent "corruption" ... I just brought the server down to single user mode, and umounted everything so that I could do an fsck on the file system itself, and *its* checking as being clean ... its still running though, so there may be something at the end ... When I first noticed this, yesterday, the server had been running 11 days, and nothing in /var/log/messages to indicate a problem ... today, after <24hrs, its done it again, and again, no errors being generated anywhere to indicate a problem ... All the other 5 servers are running SCSI ... but unless there is a bug in the 9500 driver, I can't see it being hardware related ... I've kinda always expected something like this might happen with unionfs as a result of the server crashing, but not when its running fine ... I don't know what else to add to this, unfortunately, since neither the hardware, or the operating system, seem to want to give anything up :( The OS is from April 11th, so is relatively recent ... Any suggestions/ideas on what else to look at would be much appreciated ... Thank you ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664