Date: 26 Apr 2008 17:14:46 +0200 From: "Arno J. Klaassen" <arno@heho.snv.jussieu.fr> To: Mike Tancsa <mike@sentex.net> Cc: stable@freebsd.org, pluknet@gmail.com Subject: Re: nfs-server silent data corruption Message-ID: <wp7iek4pi1.fsf@heho.snv.jussieu.fr> In-Reply-To: <200804222155.m3MLtoKt093783@lava.sentex.ca> References: <wpmyno2kqe.fsf@heho.snv.jussieu.fr> <20080421094718.GY25623@hub.freebsd.org> <wp63ubp8e0.fsf@heho.snv.jussieu.fr> <200804211537.m3LFbaZA086977@lava.sentex.ca> <wpy77650s0.fsf@heho.snv.jussieu.fr> <200804221501.m3MF1guW092221@lava.sentex.ca> <wpzlrlu6w7.fsf@heho.snv.jussieu.fr> <200804221741.m3MHfYjO092795@lava.sentex.ca> <wpabjln518.fsf@heho.snv.jussieu.fr> <200804221807.m3MI73bN092981@lava.sentex.ca> <wpk5ipkaaa.fsf@heho.snv.jussieu.fr> <200804222155.m3MLtoKt093783@lava.sentex.ca>
index | next in thread | previous in thread | raw e-mail
Hello,
Mike Tancsa <mike@sentex.net> writes:
> At 02:35 PM 4/22/2008, Arno J. Klaassen wrote:
>
> > > Also, you are using ULE or the 4BSD scheduler ? I
> > > still have 4BSD on the box I am testing on.
> >
> >Interesting, this is with ULE. I didn't really test 4BSD on this
> >box (I believed those who said SMP needs ULE *and* am quite
> >satisfied with overall performance). I'll try 4BSD though time
> >is getting short; I promised to deliver this box next thursday but will
> >still have some days for on-site testing.
>
>
> I have recompiled the kernel with ULE, and it seems fine as well. I
> ran 160 iterations of a 300MB file and there was no corruption. Same
> process - copy a junk random file over nfs mount, unmount the nfs
> mount, remount it copy it back, compare the files.
Let me summarise my investigations till now :
- in all failing cases just *one* byte is currupted, 4 or all 8 bits
set to zero *and* the original value is one out of the limited
subset {1, 8, 9} ....
here is the output of `cmp -x $i/BIG $i/BIG2` for some failing
cases I saved :
03869a48 09 00
05209d88 09 00
01777148 09 00
00f10f88 09 00
01f4c4c8 11 00
06c3d6c8 11 00
0725ca48 18 00
01608008 09 00
00f3b888 18 00
07aa45c8 29 20
- it does *not* seem to depend on :
- the interface : I could produce it using nfe0, nfe1 and
re0 using some netgear pci-card
- the distribution of the 4Gig memory : installing 4G at
CPU1 or 1G at CPU1 and 2G at CPU2 produces same results
(NB, all memory passed memtest.iso in both situtations
for complete run)
- the frequency control method : easier to produce with
cpufreq/powerd, but finally I can reproduce the cooruption
as well using acpi_ppc
- the nfs-client and options (not exhaustively tested, but different
test include i386-releng6, amd64-releng6 and linux, and quite
a set of different try and see mounf_nfs options
I am testing right now with a fixed frequency of 1Ghz.
I am not so inclined to test 4BSD, since reboot possibilities are
limited for me now on this box, but I set up next week a similar
board (S3992e) (iff I can find quad-core socket F over here ...)
and in a certain sense hope I can reproduce it an that board as well.
Best, Arno
home |
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?wp7iek4pi1.fsf>
