Skip site navigation (1)Skip section navigation (2)
Date:      21 Apr 2008 16:52:55 +0200
From:      "Arno J. Klaassen" <arno@heho.snv.jussieu.fr>
To:        Kris Kennaway <kris@FreeBSD.ORG>
Cc:        stable@FreeBSD.ORG, David Wolfskill <david@catwhisker.org>, Clayton Milos <clay@milos.co.za>, net@FreeBSD.ORG
Subject:   Re: nfs-server silent data corruption
Message-ID:  <wp63ubp8e0.fsf@heho.snv.jussieu.fr>
In-Reply-To: <20080421094718.GY25623@hub.freebsd.org>
References:  <wpmyno2kqe.fsf@heho.snv.jussieu.fr> <20080421094718.GY25623@hub.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Kris Kennaway <kris@FreeBSD.ORG> writes:

> On Mon, Apr 21, 2008 at 01:02:33AM +0200, Arno J. Klaassen wrote:
> 
> > I didn't stress-test this MB for a while, but last time I did was
> > with 7-PRELEASE/RC?/CANTremember-exactly-but-close-to-release
> > and all worked great
> > 
> > I did add 2G ECC to the 2nd CPU since, though I doubt that interferes
> > with NFS.
> 
> Uh, you're getting server-side data corruption, it could definitely be
> because of the memory you added.

yop, though I'm still not convinced the memory is bad (the very same
Kingston ECC as the 2*1G in use for about half a year already) :

I added it directly to the 2nd CPU (diagram on page 9 of
 http://www.tyan.com/manuals/m_s2895_101.pdf) and the problem
seems to be the interaction between nfe0 and powerd .... :

 - if I stop powerd, problems go away 
 - I let run powerd but turn of txcsum and tso4 on the interface,
   the problem is a lot harder to produce (if ever this gives
   a hint to anyone)

Device is :

nfe0@pci0:0:10:0:       class=0x068000 card=0x289510f1 chip=0x005710de rev=0xa3 hdr=0x00
    vendor     = 'Nvidia Corp'
    device     = 'nForce4 Ultra NVidia Network Bus Enumerator'
    class      = bridge
    cap 01[44] = powerspec 2  supports D0 D1 D2 D3  current D0

(this is with the default BIOS setting " LAN Bridge Enabled", disabling
 that setting makes pciconf say "class = network" but does not influence
 my problem)

I will restart my tests now by populating all 4G to only CPU1 and
say whether that matters.

Best, Arno
 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?wp63ubp8e0.fsf>