From owner-freebsd-stable@FreeBSD.ORG Sat Apr 26 15:14:56 2008 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A7621106564A for ; Sat, 26 Apr 2008 15:14:56 +0000 (UTC) (envelope-from arno@heho.snv.jussieu.fr) Received: from shiva.jussieu.fr (shiva.jussieu.fr [134.157.0.129]) by mx1.freebsd.org (Postfix) with ESMTP id 583608FC0C for ; Sat, 26 Apr 2008 15:14:55 +0000 (UTC) (envelope-from arno@heho.snv.jussieu.fr) Received: from heho.snv.jussieu.fr (heho.snv.jussieu.fr [134.157.184.22]) by shiva.jussieu.fr (8.14.2/jtpda-5.4) with ESMTP id m3QFErO5069376 ; Sat, 26 Apr 2008 17:14:53 +0200 (CEST) X-Ids: 165 Received: from heho.snv.jussieu.fr (localhost [127.0.0.1]) by heho.snv.jussieu.fr (8.13.3/jtpda-5.2) with ESMTP id m3QFEpVU047951 ; Sat, 26 Apr 2008 17:14:51 +0200 (MEST) Received: (from arno@localhost) by heho.snv.jussieu.fr (8.13.3/8.13.1/Submit) id m3QFEkSI047948; Sat, 26 Apr 2008 17:14:46 +0200 (MEST) (envelope-from arno) To: Mike Tancsa References: <20080421094718.GY25623@hub.freebsd.org> <200804211537.m3LFbaZA086977@lava.sentex.ca> <200804221501.m3MF1guW092221@lava.sentex.ca> <200804221741.m3MHfYjO092795@lava.sentex.ca> <200804221807.m3MI73bN092981@lava.sentex.ca> <200804222155.m3MLtoKt093783@lava.sentex.ca> From: "Arno J. Klaassen" Date: 26 Apr 2008 17:14:46 +0200 In-Reply-To: <200804222155.m3MLtoKt093783@lava.sentex.ca> Message-ID: Lines: 75 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (shiva.jussieu.fr [134.157.0.165]); Sat, 26 Apr 2008 17:14:54 +0200 (CEST) X-Virus-Scanned: ClamAV 0.92/6953/Sat Apr 26 15:14:44 2008 on shiva.jussieu.fr X-Virus-Status: Clean X-Miltered: at jchkmail.jussieu.fr with ID 481346ED.002 by Joe's j-chkmail (http : // j-chkmail dot ensmp dot fr)! X-j-chkmail-Enveloppe: 481346ED.002/134.157.184.22/heho.snv.jussieu.fr/heho.snv.jussieu.fr/ X-j-chkmail-Score: MSGID : 481346ED.002 on jchkmail.jussieu.fr : j-chkmail score : . : R=. U=. O=. B=0.025 -> S=0.025 X-j-chkmail-Status: Ham Cc: stable@freebsd.org, pluknet@gmail.com Subject: Re: nfs-server silent data corruption X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 Apr 2008 15:14:56 -0000 Hello, Mike Tancsa writes: > At 02:35 PM 4/22/2008, Arno J. Klaassen wrote: > > > > Also, you are using ULE or the 4BSD scheduler ? I > > > still have 4BSD on the box I am testing on. > > > >Interesting, this is with ULE. I didn't really test 4BSD on this > >box (I believed those who said SMP needs ULE *and* am quite > >satisfied with overall performance). I'll try 4BSD though time > >is getting short; I promised to deliver this box next thursday but will > >still have some days for on-site testing. > > > I have recompiled the kernel with ULE, and it seems fine as well. I > ran 160 iterations of a 300MB file and there was no corruption. Same > process - copy a junk random file over nfs mount, unmount the nfs > mount, remount it copy it back, compare the files. Let me summarise my investigations till now : - in all failing cases just *one* byte is currupted, 4 or all 8 bits set to zero *and* the original value is one out of the limited subset {1, 8, 9} .... here is the output of `cmp -x $i/BIG $i/BIG2` for some failing cases I saved : 03869a48 09 00 05209d88 09 00 01777148 09 00 00f10f88 09 00 01f4c4c8 11 00 06c3d6c8 11 00 0725ca48 18 00 01608008 09 00 00f3b888 18 00 07aa45c8 29 20 - it does *not* seem to depend on : - the interface : I could produce it using nfe0, nfe1 and re0 using some netgear pci-card - the distribution of the 4Gig memory : installing 4G at CPU1 or 1G at CPU1 and 2G at CPU2 produces same results (NB, all memory passed memtest.iso in both situtations for complete run) - the frequency control method : easier to produce with cpufreq/powerd, but finally I can reproduce the cooruption as well using acpi_ppc - the nfs-client and options (not exhaustively tested, but different test include i386-releng6, amd64-releng6 and linux, and quite a set of different try and see mounf_nfs options I am testing right now with a fixed frequency of 1Ghz. I am not so inclined to test 4BSD, since reboot possibilities are limited for me now on this box, but I set up next week a similar board (S3992e) (iff I can find quad-core socket F over here ...) and in a certain sense hope I can reproduce it an that board as well. Best, Arno