From owner-freebsd-current@FreeBSD.ORG Thu May 1 02:07:55 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C572237B401 for ; Thu, 1 May 2003 02:07:55 -0700 (PDT) Received: from sauron.fto.de (p15106025.pureserver.info [217.160.140.13]) by mx1.FreeBSD.org (Postfix) with ESMTP id C2FC643F85 for ; Thu, 1 May 2003 02:07:54 -0700 (PDT) (envelope-from hschaefer@fto.de) Received: from localhost (localhost.fto.de [127.0.0.1]) by sauron.fto.de (Postfix) with ESMTP id D7EEA25C0FA; Thu, 1 May 2003 11:07:53 +0200 (CEST) Received: from sauron.fto.de ([127.0.0.1]) by localhost (sauron [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 07573-07; Thu, 1 May 2003 11:07:52 +0200 (CEST) Received: from giskard.foundation.hs (p5091AC1D.dip.t-dialin.net [80.145.172.29]) by sauron.fto.de (Postfix) with ESMTP id 43E9425C0A9; Thu, 1 May 2003 11:07:50 +0200 (CEST) Received: from daneel.foundation.hs (daneel.foundation.hs [192.168.20.2]) by giskard.foundation.hs (8.9.3/8.9.3) with ESMTP id LAA86097; Thu, 1 May 2003 11:07:48 +0200 (CEST) (envelope-from hschaefer@fto.de) Date: Thu, 1 May 2003 11:07:48 +0200 (CEST) From: Heiko Schaefer X-X-Sender: heiko@daneel.foundation.hs To: freebsd-current@freebsd.org In-Reply-To: <20030430151514.X27116@daneel.foundation.hs> Message-ID: <20030501105847.O32907@daneel.foundation.hs> References: <8269.1051708000@critter.freebsd.dk> <20030430151514.X27116@daneel.foundation.hs> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by amavisd-new at fto.de cc: Poul-Henning Kamp Subject: poul: sorry, was: Re: still: Re: gbde data corruption? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 May 2003 09:07:56 -0000 Hi Poul, sorry for all the fuss, you can now reattach your pulled-out hair. and i need to find someone or something new to blame this issue on... i have just reproduced the data-corruption on non-gbde filesystems (what a way to start a day). so gbde is very very likely not to blame for any of the stuff that i've been complaining about (apart from the slowness while writing to disc, which will bugs me again after i've sorted out the corruption :) i'll get back to you in a few weeks *g*). now i have the questionable pleasure to rule out pieces of hardware as suspects, i guess. damn. sometimes cheap pc hardware really sucks. even though i don't think it's clear if this is a hardware or a software issue. if anyone has suggestions on how to rule out specific causes or what pieces of hard- and software are particularly suspect, i'd be glad to hear about it. first of all i am planning to rule out the sis nic in the server (onboard nics have always been suspect to me) sorry again, poul for blaming this on you right away. regards, Heiko > [corruption of data] > > That is really strange, the problems I've seen until now have all > > resulted in data coming back scrambled beyond recognition, and therefore > > practically incompressible, this sounds like they're filled with identical > > bytes or sectors of some kind. > > > > Can you try to run "cmp -l oldfile newfile" and study the output > > for a bit ? Any observations you can make will be helpful. > > the broken version of the file contains lots of 0-bytes (instead of high > entropy values in the original file). seems by the output of cmp that > every damaged value is replaced by 0. > > > >(potentially this could also be an nfs-issue, as i am copying onto the > > >gbde partition via nfs from a 4.6-rc machine. but i can't really imagine > > >that, never had anything like that in all of my non-gbde freebsd nfs > > >experience. if it is an nfs issue, then it would probably be fbsd-5 > > >specific - is there any such known issue ?!) > > > > I doubt it is NFS, but it would be nice if you could verify the checksum > > on both the client and server side, just to see that they agree. > > to clarify: i mount the (remote) gbde partition to a box which wishes to > get rid of a lot of data - then i move stuff onto the gbde mount via nfs. > > the checking of the checksums was then done on the server (i.e. locally). > > > >the partition in question now looks like this: > > >e: 117231392 16 4.2BSD 4096 16384 64 # (Cyl. 0*- 7297*) > > > > What does diskinfo(8) say about the encrypted (ad0e.bde) and unencrypted > > (ad0e) devices (for some value of "ad0") ? > > zoidberg# diskinfo /dev/ad0s1e > /dev/ad0s1e 512 29051207680 56740640 56290 16 63 > zoidberg# diskinfo /dev/ad0s1e.bde > /dev/ad0s1e.bde 4096 28937551872 7064832 > > another thing i just notice: /var/log/messages contains lots of > > [...] > Apr 30 15:24:55 zoidberg kernel: ENOMEM 0xc4c62100 on 0xc45c6c80(ad2s1e.bde) > Apr 30 15:25:19 zoidberg kernel: ENOMEM 0xc3fa5000 on 0xc45c6c80(ad2s1e.bde) > Apr 30 15:25:57 zoidberg kernel: ENOMEM 0xc4b46100 on 0xc45c6c80(ad2s1e.bde) > Apr 30 15:25:57 zoidberg kernel: ENOMEM 0xc4364500 on 0xc45c6c80(ad2s1e.bde) > [...] > > i haven't yet checked the data on ad2s1e.bde, it might be partially > corrupt or not. > > > >this time i inited gbde's sectorsize to "4096". last time i reported > > >corruption, gbde's sectorsize was at its default (i presume 512). the > > >corruption then 'felt' just the same. very sporadic - and somewhat > > >non-deterministic from my point of view. > > > > The sectorsize is mainly a performance issue, it should not affect operation > > i feel that the issue i see is outside the realm of 'should' - so i try to > give any information i can think of. even useless information :) > > also, i have the unpleasant feeling that i might be making some stupid > mistake, and waste your time by looking entirely in the wrong direction. > > ...for all i know the hardware i use on the server-side (or the drivers > for it ... for some reason the sis-based onboard nic comes to my mind, > just now) could be subtly broken :/ > > if you have no other things i could report or try, i might just throw away > the gbde volumes and try the same copying with non-gbde partitions, just > to be sure. > > regards, > > Heiko