From owner-freebsd-stable@FreeBSD.ORG Mon Apr 21 21:57:39 2008 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B89DD106567F for ; Mon, 21 Apr 2008 21:57:39 +0000 (UTC) (envelope-from arno@heho.snv.jussieu.fr) Received: from shiva.jussieu.fr (shiva.jussieu.fr [134.157.0.129]) by mx1.freebsd.org (Postfix) with ESMTP id 709268FC33 for ; Mon, 21 Apr 2008 21:57:39 +0000 (UTC) (envelope-from arno@heho.snv.jussieu.fr) Received: from heho.snv.jussieu.fr (heho.snv.jussieu.fr [134.157.184.22]) by shiva.jussieu.fr (8.14.2/jtpda-5.4) with ESMTP id m3LLvbjP035211 ; Mon, 21 Apr 2008 23:57:37 +0200 (CEST) X-Ids: 164 Received: from heho.snv.jussieu.fr (localhost [127.0.0.1]) by heho.snv.jussieu.fr (8.13.3/jtpda-5.2) with ESMTP id m3LLva8n022193 ; Mon, 21 Apr 2008 23:57:36 +0200 (MEST) Received: (from arno@localhost) by heho.snv.jussieu.fr (8.13.3/8.13.1/Submit) id m3LLvat4022190; Mon, 21 Apr 2008 23:57:36 +0200 (MEST) (envelope-from arno) To: Mike Tancsa References: <20080421094718.GY25623@hub.freebsd.org> <200804211537.m3LFbaZA086977@lava.sentex.ca> From: "Arno J. Klaassen" Date: 21 Apr 2008 23:57:35 +0200 In-Reply-To: <200804211537.m3LFbaZA086977@lava.sentex.ca> Message-ID: Lines: 123 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (shiva.jussieu.fr [134.157.0.164]); Mon, 21 Apr 2008 23:57:38 +0200 (CEST) X-Virus-Scanned: ClamAV 0.92/6865/Mon Apr 21 17:43:29 2008 on shiva.jussieu.fr X-Virus-Status: Clean X-Miltered: at jchkmail.jussieu.fr with ID 480D0DD1.005 by Joe's j-chkmail (http : // j-chkmail dot ensmp dot fr)! X-j-chkmail-Enveloppe: 480D0DD1.005/134.157.184.22/heho.snv.jussieu.fr/heho.snv.jussieu.fr/ X-j-chkmail-Score: MSGID : 480D0DD1.005 on jchkmail.jussieu.fr : j-chkmail score : . : R=. U=. O=. B=0.010 -> S=0.010 X-j-chkmail-Status: Ham Cc: stable@freebsd.org Subject: Re: nfs-server silent data corruption X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Apr 2008 21:57:39 -0000 Hello, Mike Tancsa writes: > At 10:52 AM 4/21/2008, Arno J. Klaassen wrote: > > >Device is : > > > > nfe0@pci0:0:10:0: class=0x068000 card=0x289510f1 > > chip=0x005710de rev=0xa3 hdr=0x00 > > vendor = 'Nvidia Corp' > > device = 'nForce4 Ultra NVidia Network Bus Enumerator' > > class = bridge > > cap 01[44] = powerspec 2 supports D0 D1 D2 D3 current D0 > > > >(this is with the default BIOS setting " LAN Bridge Enabled", disabling > > that setting makes pciconf say "class = network" but does not influence > > my problem) > > > >I will restart my tests now by populating all 4G to only CPU1 and > >say whether that matters. > > Hi, > How long does it take for the problem to show up ? Less than an hour in general (running the same client script simultanuously on a 100Mbps linux box and 1Gbps bds6-x86) > I have what appears > to be a very similar Tyan board (I have an Socket 939 X2 cpu) with the > same NIC, but this one is running RELENG_7 from April 17th. There > have been a few fixes for the nfe driver since 7.0 > > I am running this small script below on a nfs client (em nic) against > the server (nfe) ( mount options on the client 192.168.245.1:/backup > /backup nfs rw,-r=32768,-w=32768,tcp,noauto ) > > #!/bin/sh > i=0 > while true > do > i=`expr $i + 1` > dd if=/dev/urandom of=/tmp/junk.txt bs=1024 count=81920 > /dev/null 2>&1 > cp -p /tmp/junk.txt /backup/ > orig=`md5 -q /tmp/junk.txt` > umount /backup > sleep 2 > mount /backup > copy=`md5 -q /backup/junk.txt` > echo "$orig and $copy on $i" > if [ $orig != $copy ]; then > echo "\a copy not ok on $i" > exit 255 > fi > done quite the same as what I do (apart from the umount/sleep/mount and I use same partition for write and copy) : SIZE=$1 COUNTER=${2:-20} until [ $COUNTER -lt 1 ]; do echo "**** Still $COUNTER iterations to go *** " echo echo -n Creating random file of $SIZE MBytes ... dd if=/dev/random of=BIG bs=1048576 count=${SIZE} > /dev/null 2>&1 echo Done echo -n Calculating md5 checksum ... CS1=`md5 -q BIG` echo Done echo -n Copying file ... cp -fp BIG BIG2 echo Done echo -n Calculating md5 checksum ... CS2=`md5 -q BIG2` echo Done if [ ${CS1} != ${CS2} ]; then echo CHECKSUM MISMATCH exit -1 else echo fi let COUNTER-=1 done for info, I test with args '38 999' (38M, try 999 times) on linux (slightly adapted script BTW) and '138 999' on bsd. The best 'score' I got was 'still 871 iterations to go' > On the server, I have > > nfe0@pci0:0:10:0: class=0x068000 card=0x286510f1 chip=0x005710de > rev=0xa3 hdr=0x00 > vendor = 'Nvidia Corp' > device = 'nForce4 Ultra NVidia Network Bus Enumerator' > class = bridge > cap 01[44] = powerspec 2 supports D0 D1 D2 D3 current D0 idem > # ifconfig nfe0 > nfe0: flags=8843 metric 0 mtu 1500 > options=10b > ether 00:e0:81:58:91:6a > inet 192.168.245.1 netmask 0xffffff00 broadcast 192.168.245.255 > media: Ethernet autoselect (1000baseTX ) > status: active idem > How long does it take for the problem to come up ? as said : approximately half an hour; never more than 4 hours Best, Arno