From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 03:39:32 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 67A0BFD8 for ; Tue, 5 Mar 2013 03:39:32 +0000 (UTC) (envelope-from karl@denninger.net) Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) by mx1.freebsd.org (Postfix) with ESMTP id 2EB91CCC for ; Tue, 5 Mar 2013 03:39:31 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by fs.denninger.net (8.14.6/8.13.1) with ESMTP id r253dVVU041211 for ; Mon, 4 Mar 2013 21:39:31 -0600 (CST) (envelope-from karl@denninger.net) Received: from [127.0.0.1] [192.168.1.40] by Spamblock-sys (LOCAL); Mon Mar 4 21:39:31 2013 Message-ID: <513568EE.80006@denninger.net> Date: Mon, 04 Mar 2013 21:39:26 -0600 From: Karl Denninger User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130215 Thunderbird/17.0.3 MIME-Version: 1.0 CC: freebsd-stable@freebsd.org Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? References: <513524B2.6020600@denninger.net> <1362449266.92708.8.camel@btw.pki2.com> <51355F64.4040409@denninger.net> <8C68812328E3483BA9786EF15591124D@multiplay.co.uk> In-Reply-To: <8C68812328E3483BA9786EF15591124D@multiplay.co.uk> X-Enigmail-Version: 1.5 X-Antivirus: avast! (VPS 130304-2, 03/04/2013), Outbound message X-Antivirus-Status: Clean Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 03:39:32 -0000 On 3/4/2013 9:25 PM, Steven Hartland wrote: > ----- Original Message ----- From: "Karl Denninger" > >> Stick this in /boot/loader.conf and see if your lockups goes away: >> >> vfs.zfs.write_limit_override=1024000000 > ... > >> If it turns out that the write_limit_override tunable is the one >> responsible for stopping the hangs I can drop the ARC limit tunable >> although I'm not sure I want to; I don't see much if any performance >> penalty from leaving it where it is and if the larger cache isn't >> helping anything then why use it? I'm inclined to stick an SSD in the >> cabinet as a cache drive instead of dedicating RAM to this -- even >> though it's not AS fast as RAM it's still MASSIVELY quicker than getting >> data off a rotating plate of rust. > > Now interesting you should say that I've seen a stall recently on ZFS > only box running on 6 x SSD RAIDZ2. > > The stall was caused by fairly large mysql import, with nothing else > running. > > Then it happened I thought the machine had wedged, but minutes (not > seconds) later, everything sprung into action again. That's exactly what I can reproduce here; the stalls are anywhere from a few seconds to well north of a half-minute. It looks like the machine is hung -- but it is not. The machine in question normally runs with zero swap allocated but it always has 1.5Gb of shared memory allocated to Postgres ("shared_buffers = 1500MB" in its config file) I wonder if the ARC cache management code is misbehaving when shared segments are in use? -- -- Karl Denninger /The Market Ticker ®/ Cuda Systems LLC