From owner-freebsd-questions@FreeBSD.ORG Mon Oct 18 15:02:44 2004 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7A90716A4CE for ; Mon, 18 Oct 2004 15:02:44 +0000 (GMT) Received: from elektroda.ath.cx (b9-135.xan.duth.gr [193.92.211.135]) by mx1.FreeBSD.org (Postfix) with ESMTP id 49E5843D53 for ; Mon, 18 Oct 2004 15:02:42 +0000 (GMT) (envelope-from bigbrother@bonbon.net) Received: from macedon (macedon.vlsi.gr [192.168.3.226]) by elektroda.ath.cx (8.12.11/8.12.11) with ESMTP id i9IF2T7V064729; Mon, 18 Oct 2004 18:02:29 +0300 (EEST) (envelope-from bigbrother@bonbon.net) Message-Id: <200410181502.i9IF2T7V064729@elektroda.ath.cx> From: "Bigbrother" To: "'Gary Dunn'" Date: Mon, 18 Oct 2004 18:03:21 +0300 Organization: Bigbrother Clustered Network MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook, Build 11.0.5510 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2800.1441 In-Reply-To: <1097911884.2633.10.camel@vaiosr7k.ozland> Thread-Index: AcSzUkiaePIEtsTMT5SA+vmBMocl/QBxUI/Q cc: freebsd-questions@freebsd.org Subject: RE: Local Lan Transfers data integrity failure (was: NFS data integrity failure) X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Oct 2004 15:02:44 -0000 > -----Original Message----- > From: Gary Dunn [mailto:knowtree@aloha.com] > Subject: Re: NFS data integrity failure > > On Thu, 2004-10-14 at 07:58, Bigbrother wrote: > > > > > MachineB mounts machineA:/disk and puts 1.2 GB of data from > its disk to the > > machineA dick. A CRC check performed on the copied files show that > > everything is correct. (always!) > > Then do it this way :-) > > Seriously, though, to isolate NFS you need to exercise the network and > file systems using other methods. How about transfering the same files > using a) ftp and b) scp. If the problem is dropped packets or > fragmentation or stuck bits in the NIC, those methods will be equally > unsuccessful. > > Does either machine ever display an error message about nfs going down > then coming back? I can't remember the exact words, something like > connection lost then restored. When this happens to me at > work it is due > to the ethernet switch port one system is connected to coming > up in half > duplex instead of full duplex. Once it was a bad cat5 cable. > > Are the file sizes different? > -- > Gary Dunn Hi, Thanks Gary for your useful suggestions....Following the analysis of the problem my results indicate until now: A) A thorough multi-pass memory test (using memtest86) showed NO memory faults --Transfers between machineA and machineB on 1.2 GB of data B)When I SCP using machineA (get)I always get some CRC errors (target!=source) C)When I FTP using machineA (get)I always get some CRC errors (target!=source) --- D)When I SCP using machineB (get)no CRC errors are produced (target==source) E)When I FTP using machineB (get) no CRC errors are produced (target==source) F)When I FTP (put) using machineB CRC errors are produced G)When I NFS (put) using machineB CRC errors are NEVER produced E) When I SCP (put) using machineB CRC errors are produced) The file sizes are always the same at 15.000.000. Even the corrupted files have the same file size. The SYSLOG does not log anything related to NFS or disc problems. NFS does not go down and come back. So it is not NFS only problem. Also, note that I have optimized my FreeBSD by using these values: (I do not know if they interfere with my machine) /sbin/sysctl -w kern.ipc.somaxconn=4096 /sbin/sysctl -w kern.maxfiles=65536 /sbin/sysctl -w kern.maxfilesperproc=10050 /sbin/sysctl -w net.inet.tcp.sendspace=3217968 /sbin/sysctl -w net.inet.tcp.recvspace=3217968 /sbin/sysctl -w kern.ipc.maxsockbuf=8388608 /sbin/sysctl -w net.inet.udp.recvspace=3217968 /sbin/sysctl -w net.inet.raw.recvspace=3217968 What do you think is causing this? What tests can I do in order to further investigate this problem? Thanks again people!!!