From owner-freebsd-stable@FreeBSD.ORG Tue Jul 5 23:24:38 2005 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7292E16A41C for ; Tue, 5 Jul 2005 23:24:38 +0000 (GMT) (envelope-from scrappy@hub.org) Received: from ganymede.hub.org (blk-224-176-51.eastlink.ca [24.224.176.51]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3512243D45 for ; Tue, 5 Jul 2005 23:24:37 +0000 (GMT) (envelope-from scrappy@hub.org) Received: by ganymede.hub.org (Postfix, from userid 1000) id A8F6839378; Tue, 5 Jul 2005 20:24:41 -0300 (ADT) Received: from localhost (localhost [127.0.0.1]) by ganymede.hub.org (Postfix) with ESMTP id A5A3933F1B for ; Tue, 5 Jul 2005 20:24:41 -0300 (ADT) Date: Tue, 5 Jul 2005 20:24:41 -0300 (ADT) From: "Marc G. Fournier" To: freebsd-stable@freebsd.org Message-ID: <20050705195656.B940@ganymede.hub.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Subject: FreeBSD 4.x - SATA problems ... ? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2005 23:24:38 -0000 Recently, I added a new server to our network, using the 3Ware RAID controller (the 9500S-4LP card) and 3x140G SATA drives ... overall, the system works, but I'm getting a very odd behaviour that I've never seen before ... I have a process that run an rsync from another server to 'duplicate' the VPSs ... a 'live backup' sort of thing ... this is running on all our servers, without incident, *except*, it appears, the SATA server ... I had disabled it for a time, and just re-enabled it this morning, and somehow or another, it seems to be causing file system corruption ... As most 'old timers' here know, we use UNIONFS on all our servers ... when the corruption occurs, it looks like the "directory structures" are being changed ... this one is hard to explain :( For example, /usr/local/cyrus/bin has a bunch of binaries in it ... the binaries are kept on the "lower layer", so the upper layer only has a /usr/local/cyrus/bin directory created/ghosted, but no copies of the binaries ... so, when you are in the VPS, and do an ls of that directory, you see: # ls /usr/local/cyrus/bin arbitron cyr_expire lmtpd notifyd smmapd chk_cyrus cyrdump masssievec pop3d squatter ctl_cyrusdb deliver master pop3proxyd timsieved ctl_deliver fud mbexamine quota tls_prune ctl_mboxlist imapd mbpath reconstruct cvt_cyrusdb ipurge mkimap sievec When the 'corruption' happens, those all disappear, almost as if someone did a 'rm -rf' of the directory within the VPS, and then a 'mkdir' ... except that, from what I've been able to tell, this only happens randomly, it happens on any of the VPSs *and* only around the time that the rsync process is running ... As if, somehow, the rsync is taxing the system and causing bad writes ... but I can't find anything anywhere to indicate a problem ... To "fix" things, I umount the UNIONFS layer, and then do a 'find / cpio' to copy the "top layer" back over to fix the directory structure itself ... The thing is, I don't even know *where* to begin debugging this issue, since there aren't any errors being reported anywhere ... but maybe someone out there has an idea? thanks ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664