From owner-freebsd-fs@FreeBSD.ORG Thu Dec 15 15:42:22 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 956991065670 for ; Thu, 15 Dec 2011 15:42:22 +0000 (UTC) (envelope-from danno@internet2.edu) Received: from int-proxy02.merit.edu (int-proxy02.merit.edu [207.75.116.231]) by mx1.freebsd.org (Postfix) with ESMTP id 554D48FC14 for ; Thu, 15 Dec 2011 15:42:21 +0000 (UTC) Received: from localhost (localhost.localdomain [127.0.0.1]) by int-proxy02.merit.edu (Postfix) with ESMTP id 66C5F12002C for ; Thu, 15 Dec 2011 10:42:21 -0500 (EST) X-Virus-Scanned: amavisd-new at int-proxy02.merit.edu Received: from int-proxy02.merit.edu ([127.0.0.1]) by localhost (int-proxy02.merit.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ybVBbZtjxice for ; Thu, 15 Dec 2011 10:42:20 -0500 (EST) Received: from shrubbery.internet2.edu (desk174.internet2.edu [207.75.165.174]) by int-proxy02.merit.edu (Postfix) with ESMTPSA id ADE98120022 for ; Thu, 15 Dec 2011 10:42:20 -0500 (EST) Message-ID: <4EEA155C.5050305@internet2.edu> Date: Thu, 15 Dec 2011 10:42:20 -0500 From: Dan Pritts User-Agent: Postbox 3.0.2 (Macintosh/20111203) MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <4EE118C7.8030803@internet2.edu> <4EE12632.4070309@internet2.edu> <4EE21936.6020502@egr.msu.edu> In-Reply-To: <4EE21936.6020502@egr.msu.edu> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Re: ZFS hangs with 8.2-release X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Dec 2011 15:42:22 -0000 Hi all, as a followup to my notes from last week. Short answer, I have followed most or all of the list's suggestions and I still get crashes when scrubbing. In fact, It is now reliably crashing after <10 minutes. Does anyone have any other suggestions? Are the ZFS devs here, and would crash dumps be useful? Below are my responses to specific things that folks suggested. > do a memory test my colleague reminded me that we have run a test in the last month or two, since we started troubleshooting this. 24 hours with memtest86+ with no errors reported. FWIW this system was stable running solaris for several years. > Recommendations to upgrade to 8.2-STABLE and then polite explanations > after i did it wrong We've upgraded to 8.2-STABLE and applied the 1-line patch suggested by Adam McDougall. > FreeBSD netflow3.internet2.edu 8.2-STABLE FreeBSD 8.2-STABLE #1: Mon > Dec 12 15:45:06 UTC 2011 > root@netflow3.internet2.edu:/usr/obj/usr/src/sys/GENERIC amd64 And many recommendations from Adam McDougall that resulted in the following /boot/loader.conf. I also tried removing all of the zfs and vm lines, same problems. I think that something in here is causing the lockups - with the empty loader.conf it reboots instead of locking. > verbose_loading="YES" > rootdev="disk16s1a" > > #I have 16G of Ram > > vfs.zfs.prefetch_disable=1 > vfs.zfs.txg.timeout="5" > vfs.zfs.arc_min="512M" > vfs.zfs.arc_max="4G" > vm.kmem_size="32G" Specifics from Adam: >> >> - In my experience running with prefetch disabled is a significant >> impact to speed, once you are comfortable with doing some performance >> testing I would evaluate that and decide for yourself about "some >> discussion suggests that the prefetch sucks" Just to confirm, is there any STABILITY reason not to disable prefetch? The notes I saw suggested that it hurt stability. >> - Be wary of using dedupe in v28, it seems to have a huge performance >> drag when working with files that were written while dedupe was >> enabled; I won't comment more on that except to suggest not adding >> that variable to your issue Good to know. Not appropriate for our data set anyway. >> - These comments mostly relate to speed, but I had to give the ARC >> enough room to work without deadlocking the system so they may help >> you there. "enough to work" meaning along the lines of 2-4G as suggested above? thanks! danno -- Dan Pritts, Sr. Systems Engineer Internet2 office: +1-734-352-4953 | mobile: +1-734-834-7224