From owner-freebsd-stable@FreeBSD.ORG Sun Apr 20 23:34:48 2008 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D628B106564A; Sun, 20 Apr 2008 23:34:48 +0000 (UTC) (envelope-from arno@heho.snv.jussieu.fr) Received: from shiva.jussieu.fr (shiva.jussieu.fr [134.157.0.129]) by mx1.freebsd.org (Postfix) with ESMTP id 612838FC20; Sun, 20 Apr 2008 23:34:48 +0000 (UTC) (envelope-from arno@heho.snv.jussieu.fr) Received: from heho.snv.jussieu.fr (heho.snv.jussieu.fr [134.157.184.22]) by shiva.jussieu.fr (8.14.2/jtpda-5.4) with ESMTP id m3KN2Zox002315 ; Mon, 21 Apr 2008 01:02:36 +0200 (CEST) X-Ids: 164 Received: from heho.snv.jussieu.fr (localhost [127.0.0.1]) by heho.snv.jussieu.fr (8.13.3/jtpda-5.2) with ESMTP id m3KN2Yrb016566 ; Mon, 21 Apr 2008 01:02:34 +0200 (MEST) Received: (from arno@localhost) by heho.snv.jussieu.fr (8.13.3/8.13.1/Submit) id m3KN2YLE016563; Mon, 21 Apr 2008 01:02:34 +0200 (MEST) (envelope-from arno) To: stable@freebsd.org From: "Arno J. Klaassen" Date: 21 Apr 2008 01:02:33 +0200 Message-ID: Lines: 66 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (shiva.jussieu.fr [134.157.0.164]); Mon, 21 Apr 2008 01:02:36 +0200 (CEST) X-Virus-Scanned: ClamAV 0.92/6851/Sun Apr 20 23:25:02 2008 on shiva.jussieu.fr X-Virus-Status: Clean X-Miltered: at jchkmail.jussieu.fr with ID 480BCB8C.000 by Joe's j-chkmail (http : // j-chkmail dot ensmp dot fr)! X-j-chkmail-Enveloppe: 480BCB8C.000/134.157.184.22/heho.snv.jussieu.fr/heho.snv.jussieu.fr/ X-j-chkmail-Score: MSGID : 480BCB8C.000 on jchkmail.jussieu.fr : j-chkmail score : . : R=. U=. O=. B=0.016 -> S=0.016 X-j-chkmail-Status: Ham Cc: net@freebsd.org Subject: nfs-server silent data corruption X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Apr 2008 23:34:48 -0000 Hello, I've a strange problem with a box I'm setting up as nfs-server under 7-stable : - tyan S2895 MB, 2*285Dualcore Opteron, 4G-ECC, ahd-scsi, nfe-network - stripped GENERIC as kernel - sources as of last saturday afternoon (European time) I removed everything from /boot/loader.conf and /etc/sysctl.conf, still I get "easily" data corruption when exporting ahd-scsi over nfs (NB exporting geom_raid5 gives same data corruption) Testing with the following pseudo code : while checksum1 == checksum2 do create random file of $1 MBytes calculate md5 checksum1 copy calculate md5 checksum2 on copy Tested on both (as nfs-client) a 6-stable-i386 from a couple of weeks ago as well as a linux 2.6.15-gentoo-r1 of about two years ago : within half an hour the copy will be different .... ;( I played with nfs-options on client side (nfs[23], conn, intr, [udp|tcp], -r=, -w= ) but none seem to matter. Start/Stop rpc.lock/sttatd on server/client just provoked some : cp: utimes: BIG2: No such file or directory cp: chown: BIG2: Stale NFS file handle cp: chmod: BIG2: Stale NFS file handle cp: chflags: BIG2: Operation not supported cp: BIG2: Stale NFS file handle cp: setting permissions for `BIG2': Stale NFS file handle cp: closing `BIG2': Stale NFS file handle [and then the while loop continued ... as if the NFS handle where not that stale ..] Anyway, I'll try to nail this down more (e.g. nfs-write performance is horrible ... (nfsd falling down to 0% cpu and then after while 'wake up' and be at around 3-6% again)) I didn't stress-test this MB for a while, but last time I did was with 7-PRELEASE/RC?/CANTremember-exactly-but-close-to-release and all worked great I did add 2G ECC to the 2nd CPU since, though I doubt that interferes with NFS. Bref, if anyone has a suggestion ???? (I will try downgrade to RELENG_7_0 iff noone has a new suggestion for RELENG_7, but I'd like to go forward and test some maybe suspect recent MFC or other suggestion) Thanx in advance, best, Arno