From owner-freebsd-fs@FreeBSD.ORG Sun Mar 13 16:44:25 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5DD591065676 for ; Sun, 13 Mar 2011 16:44:25 +0000 (UTC) (envelope-from tdb@carrick.bishnet.net) Received: from carrick.bishnet.net (carrick.bishnet.net [IPv6:2a01:348:132:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id D99188FC08 for ; Sun, 13 Mar 2011 16:44:24 +0000 (UTC) Received: from [2a01:348:132:51::10] (helo=carrick-users) by carrick.bishnet.net with esmtps (TLSv1:AES256-SHA:256) (Exim 4.74 (FreeBSD)) (envelope-from ) id 1PyoP8-0009Y0-O3 for freebsd-fs@freebsd.org; Sun, 13 Mar 2011 16:44:30 +0000 Received: (from tdb@localhost) by carrick-users (8.14.4/8.14.4/Submit) id p2DGiUpK036703 for freebsd-fs@freebsd.org; Sun, 13 Mar 2011 16:44:30 GMT (envelope-from tdb) Date: Sun, 13 Mar 2011 16:44:30 +0000 From: Tim Bishop To: freebsd-fs@freebsd.org Message-ID: <20110313164430.GA977@carrick-users.bishnet.net> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-PGP-Key: 0x5AE7D984, http://www.bishnet.net/tim/tim-bishnet-net.asc X-PGP-Fingerprint: 1453 086E 9376 1A50 ECF6 AE05 7DCE D659 5AE7 D984 User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: ZFS system unresponsive X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Mar 2011 16:44:25 -0000 On Sun, Feb 27, 2011 at 12:32:17PM +0000, Tim Bishop wrote: > I have a ZFS system that has become unresponsive. It's running amd64 > 8-STABLE as of approximately 20 Dec. It has a UFS-based root file > system and then a ZFS mirror for a handful of jails. > > It seems to get in to this state occasionally, but eventually can > unblock itself. This may take hours though. > > top -HSj shows the following processes active: > > PID JID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 0 0 root -16 0 0K 1456K zio->i 0 28.9H 7.08% {zio_write_issue} > 5 0 root -8 - 0K 60K zio->i 0 776:59 0.29% {txg_thread_enter} > > A procstat on those processes shows: > > 0 100068 kernel zio_write_issue mi_switch sleepq_wait _cv_wait zio_wait dmu_buf_hold_array_by_dnode dmu_read space_map_load metaslab_activate metaslab_alloc zio_dva_allocate zio_execute taskq_run_safe taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline > 5 100094 zfskern txg_thread_enter mi_switch sleepq_wait _cv_wait txg_thread_wait txg_quiesce_thread fork_exit fork_trampoline > 5 100095 zfskern txg_thread_enter mi_switch sleepq_wait _cv_wait zio_wait dsl_pool_sync spa_sync txg_sync_thread fork_exit fork_trampoline > > (I have the full procstat -k output for those PIDs if needed) > > Other processes, such as my hourly zfs snapshots appear to be wedged: > > root 7407 0.0 0.0 14672 1352 ?? D 10:00AM 0:00.46 /sbin/zfs snapshot -r pool0@2011-02-27_10.00.01--1d > root 10184 0.0 0.0 14672 1444 ?? D 11:00AM 0:00.36 /sbin/zfs snapshot -r pool0@2011-02-27_11.00.00--1d > root 12938 0.0 0.0 14672 1516 ?? D 12:00PM 0:00.11 /sbin/zfs snapshot -r pool0@2011-02-27_12.00.01--1d > > PID TID COMM TDNAME KSTACK > 7407 100563 zfs - mi_switch sleepq_wait _cv_wait txg_wait_synced dsl_sync_task_group_wait dmu_objset_snapshot zfs_ioc_snapshot zfsdev_ioctl devfs_ioctl_f kern_ioctl ioctl syscallenter syscall Xfast_syscall > 10184 100707 zfs - mi_switch sleepq_wait _cv_wait txg_wait_synced dsl_sync_task_group_wait dmu_objset_snapshot zfs_ioc_snapshot zfsdev_ioctl devfs_ioctl_f kern_ioctl ioctl syscallenter syscall Xfast_syscall > 12938 100159 zfs - mi_switch sleepq_wait _cv_wait txg_wait_synced dsl_sync_task_group_wait dmu_objset_snapshot zfs_ioc_snapshot zfsdev_ioctl devfs_ioctl_f kern_ioctl ioctl syscallenter syscall Xfast_syscall > > zfs-stats output as follows: > > ------------------------------------------------------------------------ > ZFS Subsystem Report Sun Feb 27 12:20:20 2011 > ------------------------------------------------------------------------ > System Information: > > Kernel Version: 801501 (osreldate) > Hardware Platform: amd64 > Processor Architecture: amd64 > > FreeBSD 8.2-PRERELEASE #3: Mon Dec 20 20:54:55 GMT 2010 tdb > 12:23pm up 68 days, 14:07, 2 users, load averages: 0.35, 0.39, 0.35 > ------------------------------------------------------------------------ > System Memory Statistics: > Physical Memory: 3061.63M > Kernel Memory: 1077.46M > DATA: 99.12% 1067.93M > TEXT: 0.88% 9.53M > ------------------------------------------------------------------------ > ZFS pool information: > Storage pool Version (spa): 15 > Filesystem Version (zpl): 4 > ------------------------------------------------------------------------ > ARC Misc: > Deleted: 148418216 > Recycle Misses: 51095797 > Mutex Misses: 370820 > Evict Skips: 370820 > > ARC Size: > Current Size (arcsize): 55.86% 1087.64M > Target Size (Adaptive, c): 56.50% 1100.22M > Min Size (Hard Limit, c_min): 12.50% 243.40M > Max Size (High Water, c_max): ~8:1 1947.20M > > ARC Size Breakdown: > Recently Used Cache Size (p): 6.25% 68.77M > Freq. Used Cache Size (c-p): 93.75% 1031.45M > > ARC Hash Breakdown: > Elements Max: 398079 > Elements Current: 38.65% 153870 > Collisions: 230805591 > Chain Max: 34 > Chains: 24344 > > ARC Eviction Statistics: > Evicts Total: 4560897494528 > Evicts Eligible for L2: 99.99% 4560573588992 > Evicts Ineligible for L2: 0.01% 323905536 > Evicts Cached to L2: 0 > > ARC Efficiency: > Cache Access Total: 1761824967 > Cache Hit Ratio: 84.82% 1494437389 > Cache Miss Ratio: 15.18% 267387578 > Actual Hit Ratio: 84.82% 1494411236 > > Data Demand Efficiency: 83.35% > > CACHE HITS BY CACHE LIST: > Most Recently Used (mru): 7.86% 117410213 > Most Frequently Used (mfu): 92.14% 1377001023 > MRU Ghost (mru_ghost): 0.63% 9445180 > MFU Ghost (mfu_ghost): 7.99% 119349696 > > CACHE HITS BY DATA TYPE: > Demand Data: 35.75% 534254771 > Prefetch Data: 0.00% 0 > Demand Metadata: 64.25% 960153880 > Prefetch Metadata: 0.00% 28738 > > CACHE MISSES BY DATA TYPE: > Demand Data: 39.91% 106712177 > Prefetch Data: 0.00% 0 > Demand Metadata: 60.01% 160446249 > Prefetch Metadata: 0.09% 229152 > ------------------------------------------------------------------------ > VDEV Cache Summary: > Access Total: 155663083 > Hits Ratio: 70.91% 110387854 > Miss Ratio: 29.09% 45275229 > Delegations: 91183 > ------------------------------------------------------------------------ > ZFS Tunable (sysctl): > kern.maxusers=384 > vfs.zfs.l2c_only_size=0 > vfs.zfs.mfu_ghost_data_lsize=23343104 > vfs.zfs.mfu_ghost_metadata_lsize=302204928 > vfs.zfs.mfu_ghost_size=325548032 > vfs.zfs.mfu_data_lsize=524091904 > vfs.zfs.mfu_metadata_lsize=52224 > vfs.zfs.mfu_size=533595136 > vfs.zfs.mru_ghost_data_lsize=30208 > vfs.zfs.mru_ghost_metadata_lsize=727952896 > vfs.zfs.mru_ghost_size=727983104 > vfs.zfs.mru_data_lsize=100169216 > vfs.zfs.mru_metadata_lsize=0 > vfs.zfs.mru_size=339522048 > vfs.zfs.anon_data_lsize=0 > vfs.zfs.anon_metadata_lsize=0 > vfs.zfs.anon_size=10959360 > vfs.zfs.l2arc_norw=1 > vfs.zfs.l2arc_feed_again=1 > vfs.zfs.l2arc_noprefetch=0 > vfs.zfs.l2arc_feed_min_ms=200 > vfs.zfs.l2arc_feed_secs=1 > vfs.zfs.l2arc_headroom=2 > vfs.zfs.l2arc_write_boost=8388608 > vfs.zfs.l2arc_write_max=8388608 > vfs.zfs.arc_meta_limit=510447616 > vfs.zfs.arc_meta_used=513363680 > vfs.zfs.mdcomp_disable=0 > vfs.zfs.arc_min=255223808 > vfs.zfs.arc_max=2041790464 > vfs.zfs.zfetch.array_rd_sz=1048576 > vfs.zfs.zfetch.block_cap=256 > vfs.zfs.zfetch.min_sec_reap=2 > vfs.zfs.zfetch.max_streams=8 > vfs.zfs.prefetch_disable=1 > vfs.zfs.check_hostid=1 > vfs.zfs.recover=0 > vfs.zfs.txg.write_limit_override=0 > vfs.zfs.txg.synctime=5 > vfs.zfs.txg.timeout=30 > vfs.zfs.scrub_limit=10 > vfs.zfs.vdev.cache.bshift=16 > vfs.zfs.vdev.cache.size=10485760 > vfs.zfs.vdev.cache.max=16384 > vfs.zfs.vdev.aggregation_limit=131072 > vfs.zfs.vdev.ramp_rate=2 > vfs.zfs.vdev.time_shift=6 > vfs.zfs.vdev.min_pending=4 > vfs.zfs.vdev.max_pending=10 > vfs.zfs.cache_flush_disable=0 > vfs.zfs.zil_disable=0 > vfs.zfs.zio.use_uma=0 > vfs.zfs.version.zpl=4 > vfs.zfs.version.spa=15 > vfs.zfs.version.dmu_backup_stream=1 > vfs.zfs.version.dmu_backup_header=2 > vfs.zfs.version.acl=1 > vfs.zfs.debug=0 > vfs.zfs.super_owner=0 > vm.kmem_size=3115532288 > vm.kmem_size_scale=1 > vm.kmem_size_min=0 > vm.kmem_size_max=329853485875 > ------------------------------------------------------------------------ > > I hope somebody can give me some pointers on where to go with this. > > I'm just about to reboot (when it unwedges) and upgrade to the latest > 8-STABLE to see if that helps. I did the upgrade to 8-STABLE and it didn't help. I'm still seeing the same issue. Someone else in another thread mentioned about graphing the zfs-stats output with munin, so I thought I'd give that a whirl. Here are the results: (static snapshot of data so it doesn't change between when I write this and when somebody reads it) http://www.bishnet.net/tim/tmp/munin/carrick/carrick/index.html This slow down is triggered by the tarsnap backups I run each day. They start about 01:15 (in the same timezone as the graphs) and finished at around 11:15 after I poked it. It slows down when processing my Maildir folders, but doesn't actually stop - it's processing a single file every couple of seconds. Here's a graph showing the disk I/O: http://www.bishnet.net/tim/tmp/statgrab/disk.ad4.read_bytes-day.png That flat period from about 06:30 until 11:00 is where it appears to get stuck. Disk I/O stays constant as can be seen in the graph. Killing the tarsnap process unjams it which lets it going on to backing up the next tree of files (I run a handful of tarsnap processes one after another). The zpool iostat looks something like this during that period: capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- pool0 111G 23.2G 248 114 865K 269K mirror 111G 23.2G 248 114 865K 269K ad4s3 - - 43 56 2.47M 269K ad6s3 - - 39 56 2.41M 269K ---------- ----- ----- ----- ----- ----- ----- So it's showing more I/O on the vdevs than on the actual pool. This tied up with my previous findings that the ZFS kernel processes are using a lot of CPU time suggests to me that ZFS is doing stuff internally which is slowing down all external operations on the filesystem. In particular, I find this graph quite interesting: http://www.bishnet.net/tim/tmp/munin/carrick/carrick/zfs_arc_utilization.html The ARC usage is jumping around all over the place while the backups are running. Thanks for taking the time to read this, and I appreciate any input. Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984