Date: Wed, 23 Apr 2008 13:52:08 +0400 From: pluknet <pluknet@gmail.com> To: pyunyh@gmail.com Cc: stable@freebsd.org Subject: Re: nfs-server silent data corruption Message-ID: <a31046fc0804230252n34f09235wc47d1d1e25191280@mail.gmail.com> In-Reply-To: <20080423045347.GE54715@cdnetworks.co.kr> References: <20080421094718.GY25623@hub.freebsd.org> <200804211537.m3LFbaZA086977@lava.sentex.ca> <wpy77650s0.fsf@heho.snv.jussieu.fr> <200804221501.m3MF1guW092221@lava.sentex.ca> <wpzlrlu6w7.fsf@heho.snv.jussieu.fr> <200804221741.m3MHfYjO092795@lava.sentex.ca> <wpabjln518.fsf@heho.snv.jussieu.fr> <200804221807.m3MI73bN092981@lava.sentex.ca> <a31046fc0804221313k5bf5e2f3h106a342a5644e152@mail.gmail.com> <20080423045347.GE54715@cdnetworks.co.kr>
next in thread | previous in thread | raw e-mail | index | archive | help
2008/4/23 Pyun YongHyeon <pyunyh@gmail.com>: > > On Wed, Apr 23, 2008 at 12:13:44AM +0400, pluknet wrote: > > On 22/04/2008, Mike Tancsa <mike@sentex.net> wrote: > > > At 02:00 PM 4/22/2008, Arno J. Klaassen wrote: > > > > > > > > > > > > > Are you using the latest RELENG_7, or at least the latest version of > > > > > nfe thats in RELENG_7 ? > > > > > > > > > > > > Think so : > > > > > > > > > > OK, and it is the latest RELENG_7 ? Or just the if_nfe.c file has been > > > manually updated ? Also, you are using ULE or the 4BSD scheduler ? I still > > > have 4BSD on the box I am testing on. > > > > Hi, I have the same problem with data corruption (with nfe on nfs server side), > > particularly when transferring large files. > > Maybe this is somehow associated with the topic. > > > > My simple test case: > > truncate -s 1000m bigfile > > ^^ here I get zero-filed file > > cp bigfile /nfs/mounted > > ^^ here I get not-at-all-zero-filed file, after uploading to nfs server > > > > I looked at the corrupted file. It contains a few ranges, filed with > > non-zero bytes: > > equal to zero? real 4-byte value offset > > ====================================== > > not equal 1200355616 at pos=38797316 > > ... <-- this range contains per-4bytes garbage, omit > > not equal 3879749905 at pos=38813696 > > > > not equal 161160732 at pos=45613060 > > ... <-- ditto > > not equal 575257183 at pos=45629440 > > > > not equal 1943682165 at pos=59768836 > > ... <-- ditto > > not equal 2843639625 at pos=59785216 > > > > not equal 2653910121 at pos=60293124 > > ... <-- ditto > > not equal 3462830780 at pos=60309504 > > > > Some info: > > > > nfs server on 8-CURRENT as of Apr 17 > > nfs client on 7.0-STABLE as of Apr 12 > > > > dmesg | grep nfe > > nfe0: <NVIDIA nForce2 MCP2 Networking Adapter> port 0xe000-0xe007 mem > > 0xe2001000-0xe2001fff irq 20 at device 4.0 on pci0 > > miibus0: <MII bus> on nfe0 > > nfe0: Ethernet address: 00:04:61:6c:76:b1 > > nfe0: [FILTER] > > nfe0: tx v1 error 0x6001 > > nfe0: tx v1 error 0x6001 > > nfe0: tx v1 error 0x6001 > > nfe0: tx v1 error 0x6001 > > nfe0: tx v1 error 0x6001 > > nfe0: tx v1 error 0x6001 > > nfe0: tx v1 error 0x6001 > > nfe0: tx v1 error 0x6001 > > nfe0: tx v1 error 0x6001 > > nfe0: tx v1 error 0x6001 > > nfe0: tx v1 error 0x6001 > > nfe0: tx v1 error 0x6001 > > ^^^ > > I'm not sure it's related with data corruption issue but 0x6001 > would mean Tx underflow error. I recall these Tx errors were seen > on nfe(4) if negotiated speed/duplex does not match with link > partner or MACs. > Does link partner also agree on speed/duplex settings of nfe(4)? One unmanaged 10/100 switch is between them (which are both 100baseTX), so I cannot say exactly :( Though I can achieve speed upto 100mbps. I can test later directly on demand. > What PHY driver nfe(4) use? > $ kldload if_nfe nfe0: <NVIDIA nForce2 MCP2 Networking Adapter> port 0xe000-0xe007 mem 0xe2001000-0xe2001fff irq 20 at device 4.0 on pci0 nfe0: Ethernet address: 00:04:61:6c:76:b1 nfe0: [FILTER] miibus0: <MII bus> on nfe0 rlphy0: <RTL8201L 10/100 media interface> PHY 1 on miibus0 rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto nfe0: link state changed to DOWN nfe0: link state changed to UP So, it seems to be rlphy. > > > This appears while cp'ing file to server. > > (btw they do not appear with disabled polling, probably it's an another issue) > > > > vmstat -i | grep nfe > > irq20: nfe0 ohci0 1 0 > > > > nfe0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 > > options=48<VLAN_MTU,POLLING> > > ether 00:04:61:6c:76:b1 > > inet 192.168.200.137 netmask 0xffffff00 broadcast 192.168.200.255 > > media: Ethernet autoselect (100baseTX <full-duplex>) > > status: active > > I can reproduce it regardless polling presence. > > > > nfe0@pci0:0:4:0: class=0x020000 card=0x10001695 chip=0x006610de > > rev=0xa1 hdr=0x00 > > wbr, pluknet
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a31046fc0804230252n34f09235wc47d1d1e25191280>