From owner-freebsd-fs@FreeBSD.ORG Wed Jun 27 17:08:38 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 26EEC106566C for ; Wed, 27 Jun 2012 17:08:38 +0000 (UTC) (envelope-from lserinol@gmail.com) Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id 9B1F58FC20 for ; Wed, 27 Jun 2012 17:08:37 +0000 (UTC) Received: by wgbds11 with SMTP id ds11so1246957wgb.31 for ; Wed, 27 Jun 2012 10:08:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=references:in-reply-to:mime-version:content-transfer-encoding :content-type:message-id:cc:x-mailer:from:subject:date:to; bh=er08dUbDiLKJAma4hGK4V8JCH6RpU2dyQOoiqESgUrQ=; b=PwF08mGqSLGUJj2vLpep1/u9FwQxfoAk39Bbj3SNDIS1hRACiaAZFttjSdPVrGP/ef UsnShcrZTflUIOw4jXbgFMBwmMmKqKcZdaXhV6mvA+uMKrqNF8+vZxZpBsouUOHvZVzl 3MnulKxQMbuU7gaWxW8OF1vpE0aQiY7aG8hCrHU3R3jScWTQRL0+CFlJKtVC+RFHsNNp JMUUKz+4J281Wcm84mcfRo/D7zv0QIy0mVIaY5C15lvKPZA/RExWRz/wmf3cvlrDrgwA MDuqem1GcvFchx7dN/jHLaGhvapIqKNCf1ZrNhhJHol1o2xAs+5vXx+kwnspp2iWfUBS Djmg== Received: by 10.216.143.146 with SMTP id l18mr10206412wej.56.1340816542149; Wed, 27 Jun 2012 10:02:22 -0700 (PDT) Received: from [192.168.1.214] ([176.41.41.117]) by mx.google.com with ESMTPS id ce3sm20958283wib.11.2012.06.27.10.02.20 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 27 Jun 2012 10:02:21 -0700 (PDT) References: In-Reply-To: Mime-Version: 1.0 (1.0) Message-Id: X-Mailer: iPhone Mail (9B176) From: Levent Serinol Date: Wed, 27 Jun 2012 20:02:17 +0300 To: Andreas Nilsson Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: "freebsd-fs@freebsd.org" Subject: Re: ZFS stalls on Heavy I/O X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jun 2012 17:08:38 -0000 Hi, On 27 Haz 2012, at 19:34, Andreas Nilsson wrote: >=20 >=20 > On Wed, Jun 27, 2012 at 5:50 PM, Dean Jones w= rote: > On Wed, Jun 27, 2012 at 2:15 AM, Levent Serinol wrote= : > > Hi, > > > > Under heavy I/O load we face freeze problems on ZFS volumes on both > > Freebsd 9 Release and 10 Current versions. Machines are HP servers (64bi= t) > > with HP Smart array 6400 raid controllers (with battery units). Every da= > > device is a hardware raid5 where each one includes 9x300GB 10K SCSI hard= > > drivers. Main of I/O pattern happens on local system except some small N= FS > > I/O from some other servers (NFS lookup/getattr/ etc.). These servers ar= e > > mail servers (qmail) with small I/O patterns (64K Read/Write). Below yo= u > > can find procstat output on freeze time. write_limit is set to 200MB > > because of the huge amount of txg_wait_opens observed before. Every proc= ess > > stops on D state I think due to txg queue and other 2 queues are full. I= s > > there any suggestion to fix the problem ? > > > > btw inject_compress is the main process injecting emails to user inboxes= > > (databases). Also, those machines were running without problems on > > Linux/XFS filesystem. For a while ago, we started migration from Linux t= o > > Freebsd > > > > > > http://pastebin.com/raw.php?i=3Dic3YepWQ > > _______________________________________________ >=20 > Looks like you are running dedup with only 12 gigs of ram? >=20 > Dedup is very ram hungry and the dedup tables are probably no longer > fitting entirely in memory and therefore the system is swapping and > thrashing about during writes. >=20 > Also ZFS really prefers to directly address drives instead of RAID > controllers. It can not guarantee or know what the controller is > doing behind the scenes. > You might want to read http://constantin.glez.de/blog/2011/07/zfs-dedupe-o= r-not-dedupe and see if you need more ram. >=20 > And yes, having raid below zfs somewhat defeats the point of zfs. >=20 > Regards > Andreas That was one the machines, i'm running several similar machines except few c= hanges. For examplw some of them have 50gb and 20gbs of ram and some of them= has direct access every disk itself on poil as you suggested ( pools includ= ing 24 disks) some of the machines also running p812 hp raid cards (1gb cach= e ) , every hp card has battery unit. Every machine wheter rumning on 50gb r= am or pools with lots of disks have the same stall problem except one of the= m which is using hp p6300 san with fc connection . It's running zfs without p= roblems. Do i have to suspect on ciss driver which is common on all machines= where problems occur ? Wheter they use 6400 or p812 raid cards all of them = is using same ciss driver except the one connected via fc to san. Btw when zfs stalls after 1-2 minutes later it contiunes to write and read a= s usual.=20 Do you suspect any problem in procstat ouput that i provided ? Thanks, Levent=