From owner-freebsd-current@FreeBSD.ORG Thu Jan 12 06:06:08 2012 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CE853106564A for ; Thu, 12 Jan 2012 06:06:08 +0000 (UTC) (envelope-from dan@dan.emsphone.com) Received: from email2.allantgroup.com (email2.emsphone.com [199.67.51.116]) by mx1.freebsd.org (Postfix) with ESMTP id 73FD08FC13 for ; Thu, 12 Jan 2012 06:06:08 +0000 (UTC) Received: from dan.emsphone.com (dan.emsphone.com [199.67.51.101]) by email2.allantgroup.com (8.14.4/8.14.4) with ESMTP id q0C664fX069996 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 12 Jan 2012 00:06:04 -0600 (CST) (envelope-from dan@dan.emsphone.com) Received: from dan.emsphone.com (smmsp@localhost [127.0.0.1]) by dan.emsphone.com (8.14.5/8.14.5) with ESMTP id q0C6642K037033 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 12 Jan 2012 00:06:04 -0600 (CST) (envelope-from dan@dan.emsphone.com) Received: (from dan@localhost) by dan.emsphone.com (8.14.5/8.14.5/Submit) id q0C663nU037032; Thu, 12 Jan 2012 00:06:03 -0600 (CST) (envelope-from dan) Date: Thu, 12 Jan 2012 00:06:03 -0600 From: Dan Nelson To: Martin Cracauer Message-ID: <20120112060603.GH91606@dan.emsphone.com> References: <20120111182110.GA75991@cons.org> <2072420569.94661.1326332545279.JavaMail.root@erie.cs.uoguelph.ca> <20120112015839.GA23012@cons.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120112015839.GA23012@cons.org> X-OS: FreeBSD 8.2-STABLE User-Agent: Mutt/1.5.21 (2010-09-15) X-Virus-Scanned: clamav-milter 0.97.2 at email2.allantgroup.com X-Virus-Status: Clean X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.6 (email2.allantgroup.com [199.67.51.78]); Thu, 12 Jan 2012 00:06:05 -0600 (CST) X-Scanned-By: MIMEDefang 2.68 on 199.67.51.78 Cc: Rick Macklem , Stefan Bethke , freebsd-current@freebsd.org Subject: Re: Data corruption over NFS in -current X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jan 2012 06:06:08 -0000 In the last episode (Jan 11), Martin Cracauer said: > Rick Macklem wrote on Wed, Jan 11, 2012 at 08:42:25PM -0500: > > Also, if you can reproduce the problem fairly easily, capture a packet > > trace via > > # tcpdump -s 0 -w xxx host > > running on the client (or similar). Then email me "xxx" as an attachment > > and I can look at it in wireshark. (If you choose to look at it in > > wireshark, I would suggest you look for Create RPCs to see if they are > > Exclusive Creates, plus try and see where the data for the corrupt file > > is written.) > > > > Even if the capture is pretty large, it should be easy to find the > > interesting part, so long as you know the name of the corrupt file and > > search for that. > > That's probably not practical, we are talking about hammering the NFS > server with several CPU hours worth of parallel activity in a shellscript > but I'll do my best :-) The tcpdump options -C and -W can help here. For example, -C 1000 -W 10 will keep the most recent 10-GB of traffic by circularly writing to 10 1-GB capture files. All you need to do is kill the tcpdump when you discover the corruption, and work backwards through the logs until you find your file. -- Dan Nelson dnelson@allantgroup.com