Date: Mon, 23 Apr 2012 15:38:10 +0100 From: Tim Bishop <tim-lists@bishnet.net> To: freebsd-fs@freebsd.org Subject: Re: ZFS: processes hanging when trying to access filesystems Message-ID: <20120423143810.GA95448@carrick-users.bishnet.net> In-Reply-To: <20120327181457.GC24787@carrick-users.bishnet.net> References: <20120327181457.GC24787@carrick-users.bishnet.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Here's a comparison of top output. This shows the higher context switching. I'm not sure if this is part of the cause of the problems, or just an effect: "top -Sj -m io" last pid: 95277; load averages: 0.04, 0.11, 0.13 up 20+05:31:54 15:29:52 186 processes: 2 running, 182 sleeping, 1 stopped, 1 waiting CPU: 4.1% user, 0.0% nice, 3.6% system, 0.0% interrupt, 92.3% idle Mem: 412M Active, 488M Inact, 4685M Wired, 52M Cache, 551M Buf, 288M Free Swap: 6144M Total, 316M Used, 5828M Free, 5% Inuse PID JID USERNAME VCSW IVCSW READ WRITE FAULT TOTAL PERCENT COMMAND 12 0 root 617 1 0 0 0 0 0.00% intr 11 0 root 584 1212 0 0 0 0 0.00% idle 0 0 root 322 46 0 0 0 0 0.00% kernel 3 0 root 257 1 0 0 0 0 0.00% g_up 4 0 root 175 3 0 0 0 0 0.00% g_down 13 0 root 20 0 0 0 0 0 0.00% yarrow 5 0 root 17 0 0 16 0 16 88.89% zfskern 641 0 _pflogd 4 0 0 0 0 0 0.00% pflogd last pid: 92079; load averages: 0.39, 0.22, 0.18 up 20+05:22:39 15:20:37 197 processes: 2 running, 192 sleeping, 1 stopped, 1 zombie, 1 waiting CPU: 0.0% user, 0.0% nice, 5.3% system, 1.5% interrupt, 93.2% idle Mem: 484M Active, 478M Inact, 4655M Wired, 52M Cache, 551M Buf, 257M Free Swap: 6144M Total, 316M Used, 5828M Free, 5% Inuse PID JID USERNAME VCSW IVCSW READ WRITE FAULT TOTAL PERCENT COMMAND 11 0 root 3945 6837 0 0 0 0 0.00% idle 12 0 root 2130 1 0 0 0 0 0.00% intr 0 0 root 2008 99 0 0 0 0 0.00% kernel 3 0 root 1810 0 0 0 0 0 0.00% g_up 4 0 root 1486 12 0 0 0 0 0.00% g_down 13 0 root 20 2 0 0 0 0 0.00% yarrow 5 0 root 19 0 2 66 0 68 95.77% zfskern 20 0 root 9 0 0 0 0 0 0.00% g_mirror r The latter shows the machine when it's unresponsive and processes are starting to hang. Tim. On Tue, Mar 27, 2012 at 07:14:57PM +0100, Tim Bishop wrote: > I have a machine running 8-STABLE amd64 from the end of last week. I > have a problem where the machine starts to freeze up. Any process > accessing the ZFS filesystems hangs, which eventually causes more and > more processes to be spawned (cronjobs, etc, never complete). Although > the root filesystem is on UFS (the machine hosts jails on ZFS), > eventually I can't log in anymore. > > The problem occurs when the frequently used part of the ARC gets too > large. See this graph: > > http://dl.dropbox.com/u/318044/zfs_arc_utilization-day.png > > At the right of the graph things started to hang. > > At the same time I see a high amount of context switching. > > I picked a hanging process and procstat showed the following: > > PID TID COMM TDNAME KSTACK > 24787 100303 mutt - mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 txg_wait_open+0x85 dmu_tx_assign+0x170 zfs_inactive+0xf1 zfs_freebsd_inactive+0x1a vinactive+0x71 vputx+0x2d8 null_reclaim+0xb3 vgonel+0x119 vrecycle+0x7b null_inactive+0x1f vinactive+0x71 vputx+0x2d8 vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 > > I'm running a reduced amount of jails on the machine at the moment which > is limiting the speed at which the machine freezes up completely. I'd > like to debug this problem further, so any advice on useful information > to collect would be appreciated. > > I've had this problem on the machine before[1] but adding more RAM > allievated the issue. > > Tim. > > [1] http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058541.html -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120423143810.GA95448>