From owner-freebsd-current@FreeBSD.ORG Sun Nov 11 06:31:43 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 22E4F16A419 for ; Sun, 11 Nov 2007 06:31:43 +0000 (UTC) (envelope-from nickpiggin@yahoo.com.au) Received: from smtp109.mail.mud.yahoo.com (smtp109.mail.mud.yahoo.com [209.191.85.219]) by mx1.freebsd.org (Postfix) with SMTP id E2DB513C491 for ; Sun, 11 Nov 2007 06:31:42 +0000 (UTC) (envelope-from nickpiggin@yahoo.com.au) Received: (qmail 25369 invoked from network); 11 Nov 2007 06:04:53 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com.au; h=Received:X-YMail-OSG:From:To:Subject:Date:User-Agent:Cc:References:In-Reply-To:MIME-Version:Content-Disposition:Content-Type:Content-Transfer-Encoding:Message-Id; b=vs7PhmWkGFDAHli+O6LOLwz9IJgbop/7yhoEtvTD4Vk3JP7x0K5uY52yUuvm5bAJsVWFb19qUW+2CkyBHP0F88kc3fmmao1+y8bEcULUbc2CXa2iYO5fZrwmw6s1IcVGIcI4/Pa5bHdLYeY4BAirlEEpJhk0S1XVkK8rUJLReIQ= ; Received: from unknown (HELO ?192.168.1.5?) (nickpiggin@59.167.38.76 with login) by smtp109.mail.mud.yahoo.com with SMTP; 11 Nov 2007 06:04:52 -0000 X-YMail-OSG: YKEpqIQVM1noF6Ik_exTNGP8u0QDP1BhqOSnQDJvB7sQGQWZemzxTZ9s9HCjsVS5KwGl5TJu7w-- From: Nick Piggin To: Daniel Gerzo Date: Sun, 11 Nov 2007 08:41:00 +1100 User-Agent: KMail/1.9.5 References: <20071109191821.V639@10.0.0.1> <1306808611.20071110124527@rulez.sk> In-Reply-To: <1306808611.20071110124527@rulez.sk> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200711110841.00411.nickpiggin@yahoo.com.au> X-Mailman-Approved-At: Sun, 11 Nov 2007 12:27:53 +0000 Cc: freebsd-current@freebsd.org Subject: Re: FreeBSD corruption problems on barcelona X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Nov 2007 06:31:43 -0000 Hi Daniel, On Saturday 10 November 2007 22:45, Daniel Gerzo wrote: > Hello Nick, > > Saturday, November 10, 2007, 4:21:05 AM, you wrote: > > Here it is attached. Now there is a cdrom error there, however I > > don't believe it is the cause of the problem (or at least, there > > is a bigger problem with the sata disk). The install has run > > perfectly every time I've run it, so it is pulling the data off > > the CD OK. > > > > Now I have actually got as far as root login, I filled up a 1MB > > file with /dev/urandom and took an md5. Then copied that to 50 > > files on the /tmp filesystem, unmounted and remounted it, and then > > read back the md5 sums. Practially all of them are wrong, but they > > seem to be wrong in the same ways (eg. many share the same > > incorrect md5 sum). Reading the files back from disk consistently > > gives the same information, so it seems like reads are OK. > > Did you by any chance tried to install some other OS and checked if > that is really a FreeBSD problem? You mentioned that you've got a new > box, so I suppose that you tried only FreeBSD on it so far on it. I have got Linux on it as well, no sign of problems (that doesn't completely rule out a hardware problem, of course...) > In the past, I had a bit similar problem. The symptons were that I > checked some file's md5 hash, then copied it some other location and > checked the new md5 hash of that file, it was different. The problem > was resolved after we replaced CPU (AFAIR). The thing is, the data doesn't get corrupted in the pagecache. If I copy the files then read them back from cache, everything is fine. It's only after dumping the pagecache (via unmount and remount), and reading it back into pagecache, can the corruption be seen. Subseqent unmounting and remounting shows exactly the same data. Also, the corruption isn't a usual CPU corruption one like a bitflip or cachline corruption, but significant blocks of zeroes in the files (which look like they're page or filesystem block size aligned). So it seems to be getting corrupted going from pagecache to disk. It would be pretty unusual if it were a CPU problem, but it could be other hardware, sure. > So the things you are describing in your email seem to me more like a > hardware problem than a FreeBSD problem, could you please run some > kind of hardware test and try to replace your controllers, sata cable, > disk and so on? It's tricky. The controller is built in. Cable and disk I'm reluctant to replace, given that reads are going across them just fine. But I can run any specific test that you suggest. Thanks, Nick