Date: Tue, 27 Jan 2009 13:45:56 +0100 From: "Arno J. Klaassen" <arno@heho.snv.jussieu.fr> To: Dmitry Marakasov <amdmi3@amdmi3.ru> Cc: current@freebsd.org Subject: Re: Data corruption with checksum offloading enabled Message-ID: <wphc3kj4sb.fsf@heho.snv.jussieu.fr> In-Reply-To: <20090126144044.GB6054@hades.panopticon> (Dmitry Marakasov's message of "Mon\, 26 Jan 2009 14\:40\:44 %2B0000") References: <20090123221826.GB30982@deprived.panopticon> <20090126144044.GB6054@hades.panopticon>
next in thread | previous in thread | raw e-mail | index | archive | help
Hello,
Dmitry Marakasov <amdmi3@amdmi3.ru> writes:
> For now I have two cases of corruption - in both cases it is single
> difference of one 128 byte block with file offsets 0x65F872 and
> 0x61A072.
I had a similar problem last April on a 7-stable box reported
in a 'nfs-server silent data corruption' thread.
I found :
- in all failing cases just *one* byte is currupted, 4 or all 8 bits
set to zero *and* the original value is one out of the limited
subset {1, 8, 9} ....
here is the output of `cmp -x $i/BIG $i/BIG2` for some failing
cases I saved :
03869a48 09 00
05209d88 09 00
01777148 09 00
00f10f88 09 00
01f4c4c8 11 00
06c3d6c8 11 00
0725ca48 18 00
01608008 09 00
00f3b888 18 00
07aa45c8 29 20
Does your corruption fulfill these characterisations as well?
> I was suggested by Andrzej Tobola to try disabling txcsum on a
> network interface. I've disabled both rxcsum and txcsum, and that
> solved a problem.
>
> Judging from that this helped Andrzej with sk(4) and me with ale(4)
> driver, that's not a single driver problem. Does his mean that we
> have global problems with checksum offloading?
I could reproduce it with nfe(4) and re(4) ...
interestingly enough, I could *not* reproduce it when disabling
cpu frequency control ...
for what it's worth
Best, Arno
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?wphc3kj4sb.fsf>
