From owner-freebsd-fs@FreeBSD.ORG Wed Jun 27 17:28:58 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6D50F1065670 for ; Wed, 27 Jun 2012 17:28:58 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from smtprelay03.ispgateway.de (smtprelay03.ispgateway.de [80.67.31.37]) by mx1.freebsd.org (Postfix) with ESMTP id ED1358FC0C for ; Wed, 27 Jun 2012 17:28:57 +0000 (UTC) Received: from [78.35.159.221] (helo=fabiankeil.de) by smtprelay03.ispgateway.de with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.68) (envelope-from ) id 1Sjw2s-0002ym-Fe; Wed, 27 Jun 2012 19:28:50 +0200 Date: Wed, 27 Jun 2012 19:28:43 +0200 From: Fabian Keil To: Levent Serinol Message-ID: <20120627192843.69214ea0@fabiankeil.de> In-Reply-To: References: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/B2eUcb+W.ufrLFcX9jhdONG"; protocol="application/pgp-signature" X-Df-Sender: Nzc1MDY3 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS stalls on Heavy I/O X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jun 2012 17:28:58 -0000 --Sig_/B2eUcb+W.ufrLFcX9jhdONG Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Levent Serinol wrote: > On 27 Haz 2012, at 19:34, Andreas Nilsson wrote: > > On Wed, Jun 27, 2012 at 5:50 PM, Dean Jones wrote: > > On Wed, Jun 27, 2012 at 2:15 AM, Levent Serinol wr= ote: > > > Hi, > > > > > > Under heavy I/O load we face freeze problems on ZFS volumes on both > > > Freebsd 9 Release and 10 Current versions. Machines are HP servers (6= 4bit) > > > with HP Smart array 6400 raid controllers (with battery units). Every= da > > > device is a hardware raid5 where each one includes 9x300GB 10K SCSI h= ard > > > drivers. Main of I/O pattern happens on local system except some smal= l NFS > > > I/O from some other servers (NFS lookup/getattr/ etc.). These servers= are > > > mail servers (qmail) with small I/O patterns (64K Read/Write). Below= you > > > can find procstat output on freeze time. write_limit is set to 200MB > > > because of the huge amount of txg_wait_opens observed before. Every p= rocess > > > stops on D state I think due to txg queue and other 2 queues are full= . Is > > > there any suggestion to fix the problem ? > > > > > > btw inject_compress is the main process injecting emails to user inbo= xes > > > (databases). Also, those machines were running without problems on > > > Linux/XFS filesystem. For a while ago, we started migration from Lin= ux to > > > Freebsd > > > > > > > > > http://pastebin.com/raw.php?i=3Dic3YepWQ > > > _______________________________________________ > >=20 > > Looks like you are running dedup with only 12 gigs of ram? > >=20 > > Dedup is very ram hungry and the dedup tables are probably no longer > > fitting entirely in memory and therefore the system is swapping and > > thrashing about during writes. > >=20 > > Also ZFS really prefers to directly address drives instead of RAID > > controllers. It can not guarantee or know what the controller is > > doing behind the scenes. > > You might want to read http://constantin.glez.de/blog/2011/07/zfs-dedup= e-or-not-dedupe and see if you need more ram. > >=20 > > And yes, having raid below zfs somewhat defeats the point of zfs. > That was one the machines, i'm running several similar machines except fe= w changes. For examplw some of them have 50gb and 20gbs of ram and some of = them has direct access every disk itself on poil as you suggested ( pools i= ncluding 24 disks) some of the machines also running p812 hp raid cards (1g= b cache ) , every hp card has battery unit. Every machine wheter rumning on= 50gb ram or pools with lots of disks have the same stall problem except on= e of them which is using hp p6300 san with fc connection . It's running zfs= without problems. Do i have to suspect on ciss driver which is common on a= ll machines where problems occur ? Wheter they use 6400 or p812 raid cards = all of them is using same ciss driver except the one connected via fc to s= an. >=20 > Btw when zfs stalls after 1-2 minutes later it contiunes to write and rea= d as usual.=20 Do the stalls get shorter if you decrease kern.cam.da.default_timeout? Fabian --Sig_/B2eUcb+W.ufrLFcX9jhdONG Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAk/rQtMACgkQBYqIVf93VJ0qxgCfXh+ehGM/nNzmQ224Fyw9D30n fuAAn26ybD5NUIPV21mmUc8jP5p8aBD0 =Mh7+ -----END PGP SIGNATURE----- --Sig_/B2eUcb+W.ufrLFcX9jhdONG--