Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 29 Jun 2012 11:57:04 +0300
From:      Levent Serinol <lserinol@gmail.com>
To:        Fabian Keil <freebsd-listen@fabiankeil.de>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS stalls on Heavy I/O
Message-ID:  <CACqg54TRKjzGkxMmvOJqoWyUoEW=UuCpb%2Bwvqe4qPz0-0TV8rg@mail.gmail.com>
In-Reply-To: <20120627192843.69214ea0@fabiankeil.de>
References:  <CACqg54Si-vHFAVkjpTS40MZt4E1Kn14kDUFKmVb8vx449fCnFw@mail.gmail.com> <CAMXYB4K%2B9EKPyqdCRZZgLvDQuwK=AAGSZi8%2B-AfkOrnJQzwdUA@mail.gmail.com> <CAPS9%2BSs1oCh=Szhf_qCam85hPh%2BMFu-XRjTEJZT5hYt12qMhXw@mail.gmail.com> <DD521FBB-CFAD-4D51-8D0A-D21240FB30FE@gmail.com> <20120627192843.69214ea0@fabiankeil.de>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi,

On Wed, Jun 27, 2012 at 8:28 PM, Fabian Keil
<freebsd-listen@fabiankeil.de>wrote:

> Levent Serinol <lserinol@gmail.com> wrote:
>
> > On 27 Haz 2012, at 19:34, Andreas Nilsson <andrnils@gmail.com> wrote:
>
> > > On Wed, Jun 27, 2012 at 5:50 PM, Dean Jones <
> dean.jones@oregonstate.edu> wrote:
> > > On Wed, Jun 27, 2012 at 2:15 AM, Levent Serinol <lserinol@gmail.com>
> wrote:
> > > > Hi,
> > > >
> > > >  Under heavy I/O load we face freeze problems on ZFS volumes on both
> > > > Freebsd 9 Release and 10 Current versions. Machines are HP servers
> (64bit)
> > > > with HP Smart array 6400 raid controllers (with battery units).
> Every da
> > > > device is a hardware raid5 where each one includes 9x300GB 10K SCSI
> hard
> > > > drivers. Main of I/O pattern happens on local system except some
> small NFS
> > > > I/O from some other servers (NFS lookup/getattr/ etc.). These
> servers are
> > > > mail servers (qmail) with small I/O patterns (64K Read/Write).
>  Below you
> > > > can  find procstat output on freeze time. write_limit is set to 200MB
> > > > because of the huge amount of txg_wait_opens observed before. Every
> process
> > > > stops on D state I think due to txg queue and other 2 queues are
> full. Is
> > > > there any suggestion to fix the problem ?
> > > >
> > > > btw inject_compress is the main process injecting emails to user
> inboxes
> > > > (databases). Also, those machines were running without problems on
> > > > Linux/XFS filesystem. For a while ago, we started  migration from
> Linux to
> > > > Freebsd
> > > >
> > > >
> > > > http://pastebin.com/raw.php?i=ic3YepWQ
> > > > _______________________________________________
> > >
> > > Looks like you are running dedup with only 12 gigs of ram?
> > >
> > > Dedup is very ram hungry and the dedup tables are probably no longer
> > > fitting entirely in memory and therefore the system is swapping and
> > > thrashing about during writes.
> > >
> > > Also ZFS really prefers to directly address drives instead of RAID
> > > controllers.  It can not guarantee or know what the controller is
> > > doing behind the scenes.
> > > You might want to read
> http://constantin.glez.de/blog/2011/07/zfs-dedupe-or-not-dedupe and see
> if you need more ram.
> > >
> > > And yes, having raid below zfs somewhat defeats the point of zfs.
>
> > That was one the machines, i'm running several similar machines except
> few changes. For examplw some of them have 50gb and 20gbs of ram and some
> of them has direct access every disk itself on poil as you suggested (
> pools including 24 disks) some of the machines also running p812 hp raid
> cards (1gb cache ) , every hp card has battery unit. Every machine wheter
> rumning on 50gb ram or pools with lots of disks have the same stall problem
> except one of them which is using hp p6300 san with fc connection . It's
> running zfs without problems. Do i have to suspect on ciss driver which is
> common on all machines where problems occur ? Wheter they use 6400 or p812
> raid cards all of them  is using same ciss driver except the one connected
> via fc to san.
> >
> > Btw when zfs stalls after 1-2 minutes later it contiunes to write and
> read as usual.
>
> Do the stalls get shorter if you decrease kern.cam.da.default_timeout?
>
> Fabian
>

 I set kern.cam.da.default_timeout to 30 later tried 10, nothing happened.
I just tried this  on a machine where there are plenty of ram (50GB) , hp
p812 raid controller (1gb cache+battery) and zpool configuration as below.

 state: ONLINE
 scan: none requested
config:

NAME        STATE     READ WRITE CKSUM
mail213     ONLINE       0     0     0
  raidz1-0  ONLINE       0     0     0
    da1     ONLINE       0     0     0
    da2     ONLINE       0     0     0
    da3     ONLINE       0     0     0
    da4     ONLINE       0     0     0
    da5     ONLINE       0     0     0
    da6     ONLINE       0     0     0
    da7     ONLINE       0     0     0
    da8     ONLINE       0     0     0
    da9     ONLINE       0     0     0
    da10    ONLINE       0     0     0
    da11    ONLINE       0     0     0
  raidz1-1  ONLINE       0     0     0
    da13    ONLINE       0     0     0
    da14    ONLINE       0     0     0
    da15    ONLINE       0     0     0
    da16    ONLINE       0     0     0
    da17    ONLINE       0     0     0
    da18    ONLINE       0     0     0
    da19    ONLINE       0     0     0
    da20    ONLINE       0     0     0
    da21    ONLINE       0     0     0
    da22    ONLINE       0     0     0
    da24    ONLINE       0     0     0
cache
  da23      ONLINE       0     0     0
spares
  da12      AVAIL

zpool list
NAME      SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
mail213     6T  4.32T  1.68T    71%  2.25x  ONLINE  -

every process where trying to write or read on this pool is in "D" state.
here is a small trace of a process. Every one of them, strangely waits too
long on lseek syscall. I have also replaced the lseek with pread/pwrite
 but same delay occures.

20338: 0.008389433 lseek(3,0x51dc00,SEEK_SET) = 5364736 (0x51dc00)
20338: 0.469858534 read(3,"\^B\0\0\0\r\0Y\0\0\0Yd\^C\M-/\^C"...,1024) =
1024 (0x400)
20338: 0.470058003 lseek(3,0xf1000,SEEK_SET) = 987136 (0xf1000)
20338: 0.492391062 read(3,"\n\0\0\0\^N\0007\0\0007\0z\0\M-B"...,1024) =
1024 (0x400)
20338: 0.492525369 lseek(4,0x2a50,SEEK_SET) = 10832 (0x2a50)
20338: 0.492680628 write(4,"\0\0\^Tx",4) = 4 (0x4)
20338: 0.492759340 lseek(4,0x2a54,SEEK_SET) = 10836 (0x2a54)
20338: 0.492950148 write(4,"\^B\0\0\0\r\0Y\0\0\0Yd\^C\M-/\^C"...,1024) =
1024 (0x400)
20338: 0.493027673 lseek(4,0x2e54,SEEK_SET) = 11860 (0x2e54)
20338: 0.493124404 write(4,"\M-&"\t\M-K",4) = 4 (0x4)
20338: 0.493204793 lseek(3,0xb3e800,SEEK_SET) = 11790336 (0xb3e800)
20338: 0.555674360 read(3,"\n\0\0\0\r\0}\0\0\M-B\^A\^F\^AV"...,1024) = 1024
(0x400)
20338: 0.555823752 lseek(3,0x2c7800,SEEK_SET) = 2914304 (0x2c7800)
20338: 0.568206547 read(3,"\n\0\0\0\r\0K\0\0\M^V\0\M-W\^A'"...,1024) = 1024
(0x400)
20338: 0.568324720 lseek(4,0x2e58,SEEK_SET) = 11864 (0x2e58)
20338: 0.568437026 write(4,"\0\0\^C\M-E",4) = 4 (0x4)
20338: 0.568511757 lseek(4,0x2e5c,SEEK_SET) = 11868 (0x2e5c)
20338: 0.568666527 write(4,"\n\0\0\0\^N\0007\0\0007\0z\0\M-B"...,1024) =
1024 (0x400)
20338: 0.568742795 lseek(4,0x325c,SEEK_SET) = 12892 (0x325c)
20338: 0.568852168 write(4,"\M-&"\no",4) = 4 (0x4)
20338: 0.568930600 lseek(4,0x3260,SEEK_SET) = 12896 (0x3260)
20338: 0.569028379 write(4,"\0\0\v\^_",4) = 4 (0x4)
20338: 0.569094659 lseek(4,0x3264,SEEK_SET) = 12900 (0x3264)
20338: 0.569235950 write(4,"\n\0\0\0\r\0K\0\0\M^V\0\M-W\^A'"...,1024) =
1024 (0x400)
20338: 0.569302439 lseek(4,0x3664,SEEK_SET) = 13924 (0x3664)
20338: 0.569390022 write(4,"\M-&"\n\^[",4) = 4 (0x4)
20338: 0.569451692 lseek(4,0x3668,SEEK_SET) = 13928 (0x3668)
20338: 0.569534176 write(4,"\0\0,\M-{",4) = 4 (0x4)
20338: 0.569594170 lseek(4,0x366c,SEEK_SET) = 13932 (0x366c)
20338: 0.569721771 write(4,"\n\0\0\0\r\0}\0\0\M-B\^A\^F\^AV"...,1024) =
1024 (0x400)
20338: 0.569782604 lseek(4,0x3a6c,SEEK_SET) = 14956 (0x3a6c)
20338: 0.569873818 write(4,"\M-&"\nW",4) = 4 (0x4)
20338: 0.569953368 lseek(3,0x9000,SEEK_SET) = 36864 (0x9000)
20338: 0.570027610 read(3,"\^B\0\0\0\^S\0\M^D\0\0\0\M-(\M-X"...,1024) =
1024 (0x400)
20338: 0.570122595 lseek(3,0x2a35c00,SEEK_SET) = 44260352 (0x2a35c00)
20338: 0.728753280 read(3,"\^B\0\0\0\^S\0\M^D\^A\0\0\M-/7"...,1024) = 1024
(0x400)
20338: 0.728882977 lseek(3,0x2bcd800,SEEK_SET) = 45930496 (0x2bcd800)
20338: 0.728959175 read(3,"\n\0\0\0\^R\0\M-~\0\0\M-~\^A)\^A"...,1024) =
1024 (0x400)
20338: 0.729059188 lseek(4,0x3a70,SEEK_SET) = 14960 (0x3a70)
20338: 0.729143278 write(4,"\0\0\M-/7",4) = 4 (0x4)
20338: 0.729203063 lseek(4,0x3a74,SEEK_SET) = 14964 (0x3a74)
20338: 0.729328011 write(4,"\n\0\0\0\^R\0\M-~\0\0\M-~\^A)\^A"...,1024) =
1024 (0x400)
20338: 0.729385351 lseek(4,0x3e74,SEEK_SET) = 15988 (0x3e74)
20338: 0.729473212 write(4,"\M-&"\t\M^T",4) = 4 (0x4)
20338: 0.729539772 lseek(3,0x9800,SEEK_SET) = 38912 (0x9800)
20338: 0.729608776 read(3,"\^B\0\0\0\^A\^C\M-G\0\0\0b\^\\^C"...,1024) =
1024 (0x400)
20338: 0.729702783 lseek(3,0x1886c00,SEEK_SET) = 25717760 (0x1886c00)
20338: 0.876752391 read(3,"\^B\^A\M^P\0\^P\09\^A\0\0.I\^A"...,1024) = 1024
(0x400)
20338: 0.876904926 lseek(3,0x2999000,SEEK_SET) = 43618304 (0x2999000)
20338: 1.088693201 read(3,"\^B\0\0\0\^O\0\M-*\0\0\0$l\0\M-*"...,1024) =
1024 (0x400)
20338: 1.088931153 lseek(3,0x2bc1400,SEEK_SET) = 45880320 (0x2bc1400)
20338: 1.089040037 read(3,"\n\0\0\0\^N\^A\^Z\0\^A\^Z\^AN\^A"...,1024) =
1024 (0x400)
20338: 1.089176648 lseek(4,0x3e78,SEEK_SET) = 15992 (0x3e78)
20338: 1.089300199 write(4,"\0\0\M-/\^F",4) = 4 (0x4)
20338: 1.089384009 lseek(4,0x3e7c,SEEK_SET) = 15996 (0x3e7c)
20338: 1.089572373 write(4,"\n\0\0\0\^N\^A\^Z\0\^A\^Z\^AN\^A"...,1024) =
1024 (0x400)
20338: 1.089654927 lseek(4,0x427c,SEEK_SET) = 17020 (0x427c)
20338: 1.089771773 write(4,"\M-&"\n\^D",4) = 4 (0x4)
20338: 1.089882891 lseek(4,0x4280,SEEK_SET) = 17024 (0x4280)
20338: 1.089998899 write(4,"\0\0\0\^A",4) = 4 (0x4)
20338: 1.090082290 lseek(4,0x4284,SEEK_SET) = 17028 (0x4284)
20338: 1.090246000 write(4,"SQLite format 3\0\^D\0\^A\^A\0@ "...,1024) =
1024 (0x400)
20338: 1.090328135 lseek(4,0x4684,SEEK_SET) = 18052 (0x4684)
20338: 1.090441349 write(4,"\M-&"\t3",4) = 4 (0x4)
20338: 1.090547439 lseek(4,0x4688,SEEK_SET) = 18056 (0x4688)
20338: 1.090660234 write(4,"\0\0\M-.'",4) = 4 (0x4)
20338: 1.090750959 lseek(4,0x468c,SEEK_SET) = 18060 (0x468c)
20338: 1.090939462 write(4,"\^E\0\0\0]\^Au\0\0\0\M-/\^?\^A"...,1024) = 1024
(0x400)
20338: 1.091030327 lseek(4,0x4a8c,SEEK_SET) = 19084 (0x4a8c)
20338: 1.091149966 write(4,"\M-&"\n1",4) = 4 (0x4)
20338: 1.091271003 lseek(4,0x4a90,SEEK_SET) = 19088 (0x4a90)
20338: 1.091398535 write(4,"\0\0\0\^F",4) = 4 (0x4)
20338: 1.091485069 lseek(4,0x4a94,SEEK_SET) = 19092 (0x4a94)
20338: 1.091673573 write(4,"\r\0\0\0\^B\^C\M-g\0\^C\M-s\^C"...,1024) = 1024
(0x400)
20338: 1.091756336 lseek(4,0x4e94,SEEK_SET) = 20116 (0x4e94)
20338: 1.091875836 write(4,"\M-&"\b\M-)",4) = 4 (0x4)
20338: 1.091975011 fcntl(3,F_SETLK,0x7fffffffcf80) = 0 (0x0)
20338: 1.092062384 fcntl(3,F_SETLK,0x7fffffffcf80) = 0 (0x0)
20338: 1.092152550 lseek(4,0x5000,SEEK_SET) = 20480 (0x5000)
20338: 1.092241040 read(4,0x7fffffffcf10,8) = 0 (0x0)
20338: 1.092337352 fsync(0x4,0x0,0x0,0x0,0x1,0x0) = 0 (0x0)

Here is a stack trace on problem time http://pastebin.com/raw.php?i=T1iwhuEG


Thanks,
Levent



-- 

http://lserinol.blogspot.com/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACqg54TRKjzGkxMmvOJqoWyUoEW=UuCpb%2Bwvqe4qPz0-0TV8rg>