Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 31 Aug 2010 14:35:56 +0100
From:      Tim Bishop <tim@bishnet.net>
To:        Dan Nelson <dnelson@allantgroup.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: 8.1R ZFS almost locking up system
Message-ID:  <20100831133556.GB45316@carrick-users.bishnet.net>
In-Reply-To: <20100821222429.GB73221@dan.emsphone.com>
References:  <20100821220435.GA6208@carrick-users.bishnet.net> <20100821222429.GB73221@dan.emsphone.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Aug 21, 2010 at 05:24:29PM -0500, Dan Nelson wrote:
> In the last episode (Aug 21), Tim Bishop said:
> > A few items from top, including zfskern:
> > 
> >   PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
> >     5 root        4  -8    -     0K    60K zio->i  0  54:38  3.47% zfskern
> > 91775     70      1  44    0 53040K 31144K tx->tx  1   2:11  0.00% postgres
> > 39661 tdb         1  44    0 55776K 32968K tx->tx  0   0:39  0.00% mutt
> > 14828 root        1  47    0 14636K  1572K tx->tx  1   0:03  0.00% zfs
> > 11188 root        1  51    0 14636K  1572K tx->tx  0   0:03  0.00% zfs
> > 
> > At some point during this process my zfs snapshots have been failing to
> > complete:
> > 
> > root        5  0.8  0.0     0    60  ??  DL    7Aug10  54:43.83 [zfskern]
> > root     8265  0.0  0.0 14636  1528  ??  D    10:00AM   0:03.12 zfs snapshot -r pool0@2010-08-21_10:00:01--1d
> > root    11188  0.0  0.1 14636  1572  ??  D    11:00AM   0:02.93 zfs snapshot -r pool0@2010-08-21_11:00:01--1d
> > root    14828  0.0  0.1 14636  1572  ??  D    12:00PM   0:03.04 zfs snapshot -r pool0@2010-08-21_12:00:00--1d
> > root    17862  0.0  0.1 14636  1572  ??  D     1:00PM   0:01.96 zfs snapshot -r pool0@2010-08-21_13:00:01--1d
> > root    20986  0.0  0.1 14636  1572  ??  D     2:00PM   0:02.07 zfs snapshot -r pool0@2010-08-21_14:00:01--1d
> 
> procstat -k on some of these processes might help to pinpoint what part of
> the zfs code they're all waiting in.

It happened again this Saturday (clearly something in the weekly
periodic run is triggering the issue). procstat -kk shows the following
for processes doing something zfs related (where zfs related means the
string 'zfs' in the procstat -kk output):

    0 100084 kernel           zfs_vn_rele_task mi_switch+0x16f sleepq_wait+0x42 _sleep+0x31c taskqueue_thread_loop+0xb7 fork_exit+0x118 fork_trampoline+0xe 
    5 100031 zfskern          arc_reclaim_thre mi_switch+0x16f sleepq_timedwait+0x42 _cv_timedwait+0x129 arc_reclaim_thread+0x2d1 fork_exit+0x118 fork_trampoline+0xe 
    5 100032 zfskern          l2arc_feed_threa mi_switch+0x16f sleepq_timedwait+0x42 _cv_timedwait+0x129 l2arc_feed_thread+0x1be fork_exit+0x118 fork_trampoline+0xe 
    5 100085 zfskern          txg_thread_enter mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_thread_wait+0x79 txg_quiesce_thread+0xb5 fork_exit+0x118 fork_trampoline+0xe 
    5 100086 zfskern          txg_thread_enter mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 zio_wait+0x61 dsl_pool_sync+0xea spa_sync+0x355 txg_sync_thread+0x195 fork_exit+0x118 fork_trampoline+0xe 
   17 100040 syncer           -                mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_synced+0x7c zil_commit+0x416 zfs_sync+0xa6 sync_fsync+0x184 sync_vnode+0x16b sched_sync+0x1c9 fork_exit+0x118 fork_trampoline+0xe 
 2210 100156 syslogd          -                mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 zfs_freebsd_write+0x378 VOP_WRITE_APV+0xb2 vn_write+0x2d7 dofilewrite+0x85 kern_writev+0x60 writev+0x41 syscall+0x1e7 Xfast_syscall+0xe1 
 3500 100177 syslogd          -                mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 zfs_freebsd_write+0x378 VOP_WRITE_APV+0xb2 vn_write+0x2d7 dofilewrite+0x85 kern_writev+0x60 writev+0x41 syscall+0x1e7 Xfast_syscall+0xe1 
 3783 100056 syslogd          -                mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 zfs_freebsd_write+0x378 VOP_WRITE_APV+0xb2 vn_write+0x2d7 dofilewrite+0x85 kern_writev+0x60 writev+0x41 syscall+0x1e7 Xfast_syscall+0xe1 
 4064 100165 mysqld           initial thread   mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 closef+0x3b kern_close+0x14d syscall+0x1e7 Xfast_syscall+0xe1 
 4441 100224 python2.6        initial thread   mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc null_reclaim+0xbc vgonel+0x12e vrecycle+0x7d null_inactive+0x1f vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 
 4444 100227 python2.6        initial thread   mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc null_reclaim+0xbc vgonel+0x12e vrecycle+0x7d null_inactive+0x1f vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 
 4445 100228 python2.6        initial thread   mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc null_reclaim+0xbc vgonel+0x12e vrecycle+0x7d null_inactive+0x1f vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 
 4446 100229 python2.6        initial thread   mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc null_reclaim+0xbc vgonel+0x12e vrecycle+0x7d null_inactive+0x1f vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 
 4447 100089 python2.6        initial thread   mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc null_reclaim+0xbc vgonel+0x12e vrecycle+0x7d null_inactive+0x1f vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 
 5352 100270 mutt             -                mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_synced+0x7c zil_commit+0x416 zfs_freebsd_fsync+0xd7 null_bypass+0xd3 fsync+0x161 syscall+0x1e7 Xfast_syscall+0xe1 
52686 100200 tarsnap          -                mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 closef+0x3b kern_close+0x14d syscall+0x1e7 Xfast_syscall+0xe1 
59049 100207 webalizer        initial thread   mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 zfs_freebsd_write+0x378 VOP_WRITE_APV+0xb2 null_bypass+0xd3 VOP_WRITE_APV+0x141 vn_write+0x2d7 dofilewrite+0x85 kern_pwritev+0x63 pwrite+0x59 syscall+0x1e7 Xfast_syscall+0xe1 
77573 100479 perl             -                mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 zfs_freebsd_write+0x378 VOP_WRITE_APV+0xb2 null_bypass+0xd3 VOP_WRITE_APV+0x141 vn_write+0x2d7 dofilewrite+0x85 kern_writev+0x60 write+0x55 syscall+0x1e7 Xfast_syscall+0xe1 
78595 100275 zfs              -                mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_synced+0x7c dsl_sync_task_group_wait+0x11c dmu_objset_snapshot+0x1b8 zfs_ioc_snapshot+0x7c zfsdev_ioctl+0x8d devfs_ioctl_f+0x77 kern_ioctl+0xf6 ioctl+0xfd syscall+0x1e7 Xfast_syscall+0xe1 
81989 100596 zfs              -                mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_wait_synced+0x7c dsl_sync_task_group_wait+0x11c dmu_objset_snapshot+0x1b8 zfs_ioc_snapshot+0x7c zfsdev_ioctl+0x8d devfs_ioctl_f+0x77 kern_ioctl+0xf6 ioctl+0xfd syscall+0x1e7 Xfast_syscall+0xe1 

I'm not sure if this shows anything useful?

Tim.

-- 
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x5AE7D984



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100831133556.GB45316>