From owner-freebsd-current@FreeBSD.ORG Wed Jan 21 10:41:59 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 982DD16A4CE for ; Wed, 21 Jan 2004 10:41:59 -0800 (PST) Received: from sccrmhc13.comcast.net (sccrmhc13.comcast.net [204.127.202.64]) by mx1.FreeBSD.org (Postfix) with ESMTP id CC02A43D45 for ; Wed, 21 Jan 2004 10:41:51 -0800 (PST) (envelope-from julian@elischer.org) Received: from interjet.elischer.org ([24.7.73.28]) by comcast.net (sccrmhc13) with ESMTP id <2004012118415001600ghjr3e>; Wed, 21 Jan 2004 18:41:51 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id KAA14900; Wed, 21 Jan 2004 10:42:38 -0800 (PST) Date: Wed, 21 Jan 2004 10:42:37 -0800 (PST) From: Julian Elischer To: Jaye Mathisen In-Reply-To: <20040120171030.GR50677@backmaster.cdsnet.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: current@freebsd.org Subject: Re: 5.2 SMP data corruption problems... X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Jan 2004 18:41:59 -0000 On Tue, 20 Jan 2004, Jaye Mathisen wrote: > > > 5.2-current as of 1/15. mobo is Tyan HESL-T, bios rev is 1.04, dual > P3 1G'S. 2 3WARE CONTROLLERS, latest bios, 16 drives. > > Was seeing data corruption on large copies to the 3ware drives, via > FTP/samba or even just tar from disk to disk. Small files never > seemed to get corruped (md5 checksum'd everything regularly), but > files over 4G seemed to always get corrupted somewhere, although not > at the same spots. We see this with old AMD based systems and 3ware cards.. I think the 3ware cards have very suspect PCI bus interfaces.. under 4.x.. We found that it was the writes to disk that were bad.. as long as the datat was still in cache it was ok, but if you flushed it then the data actually on disk was bad. > > Eventually the box panic'd with a lock order reversal, and would not > let me fsck the large partition (900GB), it would keep panicing in > pass 2 wiht anotehr lock-order reversal. > > I supped to current as of 1/19, tried again, same thing, file > corruption, lots of panics. > > > Finally, in the midst of just messing with stuff, I build a new > kernel without the smp/apic stuff, and it's working fine. > > Disk-to-disk copies are fine, no corruption, nothing during uploads, > no panics. And I can fsck the partition that I couldn't before, and > it works fine. how very interesting... > > I do not have the kernel dump info, the debugging was being done > remotely over the phone, no way I was going to transcribe it that > way. > > Anyway, just a heads up for those with potentially serverworks > chipsets and 5.2, there's possibly something wrong. The corruption > is silent, if I hadn't checked, there'd be no way to know. > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current To > unsubscribe, send any mail to > "freebsd-current-unsubscribe@freebsd.org" >