From owner-freebsd-current@FreeBSD.ORG Wed Jun 20 16:28:54 2007 Return-Path: X-Original-To: Current@FreeBSD.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E52A116A41F; Wed, 20 Jun 2007 16:28:54 +0000 (UTC) (envelope-from kris@obsecurity.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id CFF7613C44B; Wed, 20 Jun 2007 16:28:54 +0000 (UTC) (envelope-from kris@obsecurity.org) Received: from rot26.obsecurity.org (elvis.mu.org [192.203.228.196]) by elvis.mu.org (Postfix) with ESMTP id BB9181A3C19; Wed, 20 Jun 2007 09:28:00 -0700 (PDT) Received: by rot26.obsecurity.org (Postfix, from userid 1001) id 32957BA7E; Wed, 20 Jun 2007 12:28:54 -0400 (EDT) Date: Wed, 20 Jun 2007 12:28:54 -0400 From: Kris Kennaway To: Joe Marcus Clarke Message-ID: <20070620162854.GA35000@rot26.obsecurity.org> References: <1182354823.6504.23.camel@shumai.marcuscom.com> <20070620160306.GA74674@rot26.obsecurity.org> <1182356274.6504.30.camel@shumai.marcuscom.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1182356274.6504.30.camel@shumai.marcuscom.com> User-Agent: Mutt/1.4.2.3i Cc: Current@FreeBSD.org, Kris Kennaway Subject: Re: ZFS and deadlock with {nullfs,NFS} X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Jun 2007 16:28:55 -0000 On Wed, Jun 20, 2007 at 12:17:54PM -0400, Joe Marcus Clarke wrote: > On Wed, 2007-06-20 at 12:03 -0400, Kris Kennaway wrote: > > On Wed, Jun 20, 2007 at 11:53:43AM -0400, Joe Marcus Clarke wrote: > > > I've resurrected by amd64 Tinderbox with a ZFS base, and I've been > > > seeing a 100% reproducible deadlock when I use it with either localhost > > > NFS or nullfs. When this occurs, the CPU is 100% idle, but I can no > > > longer connect via SSH, and the box will only reboot from the debugger. > > > I know there are some tuning bits I can tweak, but all I've run across > > > is for memory consumption. Any pointers would be helpful. I'm also at > > > the debugger, so if there is anything I can do to help troubleshoot why > > > this is happening, please let me know. > > > > > > This box is -CURRENT as of June 19, 2007. It has a GENERIC kernel minus > > > devices I do not have (i.e. SMP kernel). I am currently using nullfs > > > for the Tinderbox. The process that most regularly locks up is mtree. > > > Here is the trace: > > > > > A full process list from the debugger can be found at > > > http://www.marcuscom.com/downloads/cobbler_proc.txt . > > > > 404 at the moment, but look for processes involving zil* in the > > backtrace. I had to disable zil (vfs.zfs.zil_disable=1 tunable) to > > prevent low-memory deadlocks on my machines. Since then it's been > > fine. > > Fixed, sorry. > > > > > You may also wish to use my patches (see the archives) to improve > > performance and low-memory behaviour. > > Thanks for the advice. I'll check. I didn't think low memory since it > didn't look like I was using much. Even now with the box locked, I have > 1035 MB free with no swap in use (this box has 2 GB total). By default there is only a 320 MB kmem_map into which all of zfs (including its buffer cache and I/O buffers) has to cram itself, so that is where the low memory condition may be happening. This is one of the things that should be tuned to give non-terrible performance by actually allowing some caching to occur. Kris