From owner-freebsd-current@FreeBSD.ORG Thu Apr 16 14:42:56 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 998B2106566B for ; Thu, 16 Apr 2009 14:42:56 +0000 (UTC) (envelope-from dgerow@afflictions.org) Received: from relay4-v.mail.gandi.net (relay4-v.mail.gandi.net [217.70.178.78]) by mx1.freebsd.org (Postfix) with ESMTP id 32D9F8FC23 for ; Thu, 16 Apr 2009 14:42:56 +0000 (UTC) (envelope-from dgerow@afflictions.org) Received: from plebeian.afflictions.org (CPE0021296fd1ec-CM0019475d4056.cpe.net.cable.rogers.com [99.241.164.229]) by relay4-v.mail.gandi.net (Postfix) with ESMTP id 78F07BA47 for ; Thu, 16 Apr 2009 16:42:54 +0200 (CEST) Received: by plebeian.afflictions.org (Postfix, from userid 1001) id D84853056; Thu, 16 Apr 2009 10:42:51 -0400 (EDT) Date: Thu, 16 Apr 2009 10:42:51 -0400 From: Damian Gerow To: freebsd-current@freebsd.org Message-ID: <20090416144251.GA1605@plebeian.afflictions.org> References: <49BD117B.2080706@163.com> <012d01c9b706$ccace720$6606b560$@Sparrevohn@btinternet.com> <20090409003108.fe768d54.nork@FreeBSD.org> <200904131304.43585.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200904131304.43585.jhb@freebsd.org> User-Agent: Mutt/1.5.19 (2009-01-05) Subject: Re: ZFS checksum errors on umass(4) insertion X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Apr 2009 14:42:57 -0000 John Baldwin wrote: : I have no idea how this would break what you are seeing. The : zfs_get_xattrdir() function is only called from zfs_lookup() when : LOOKUP_XATTR is specified, and that only happens from the extended attribute : VOP routines. Are you using extended attributes at all? Also, have you : tried running with INVARIANTS and DEBUG_VFS_LOCKS to catch missing locks? I've spent most of the past week running tests, with various combinations, against sources dating back about two weeks ago. I've been using a standard GENERIC kernel with two modifications: I've removed umass, and added DEBUG_VFS_LOCKS. I also set vfs.zfs.debug=1, where debug.vfs_* and debug.mpsafevfs are all kept at their defaults of 1. What I've found: 1) Reverting the extended attribute locking change (r189967) does not change the situation for me. I still experience checksum issues and data loss. (Unsurprisingly.) 2) Without umass loaded, I have been completely unable to trigger the issue. 3) Once umass is loaded, and the symptoms start cropping up, unloading umass does not make them go away (again, unsurprisingly). What I haven't yet tested, but am currently working towards, is whether removing umass stops further checksum errors from ocurring. 4) r189967 does remove some LORs for me, even though I don't use (that I know of) extended attributes. 5) It seems that so long as umass is used at all, the symptoms will eventually show up. I've been able to trigger the symptoms by inserting then removing a umass device immediately after boot, then ramping up the workload. 6) The only difference made by vfs.zfs.debug=1 is that zfs reclaims are logged. I'm at a bit of a loss as to what to test next, other than checking for an increased number of checksum errors after unloading umass. However, I'm not convinced this is going to highlight the actual problem. I'm all ears as to what to test for at this point, as I'm running out of ideas. A little less wordy: help? - Damian