Date: Sat, 26 Jan 2013 03:10:01 GMT From: Harry Coin <hgcoin@gmail.com> To: freebsd-fs@FreeBSD.org Subject: Re: kern/169480: [zfs] ZFS stalls on heavy I/O Message-ID: <201301260310.r0Q3A1fe051107@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/169480; it has been noted by GNATS. From: Harry Coin <hgcoin@gmail.com> To: bug-followup@FreeBSD.org, levent.serinol@mynet.com, Jeremy Chadwick <jdc@koitsu.org> Cc: Subject: Re: kern/169480: [zfs] ZFS stalls on heavy I/O Date: Fri, 25 Jan 2013 21:04:05 -0600 Jeremy, Thanks. re: 1. Not only have I removed the zil but I removed the cache before running this test as well, thinking as you did the whole l2arc thing was adding too much complexity. The pool was nothing more than the single raidz when doing the test to cause the failure noted. re 2. The raw benchmarks you saw for ada2 were identically duplicated for 3, 4 and 5. They are all exactly the same drive make and model purchased at the same time. re 3. These are 512 byte/sector drives. v6.0 smartctl -a reports 512 bytes/sector logical / physical. re 4. The drives are partitioned so that each of them can hold boot information should I decide someday to boot off the array, to make sure sector 0 of the partition starts on the afore-noticed 4K boundary, and there is a couple of gig at the end for swap if I decide I want that. Partition 1 is just big enough for the boot code, 2 is the bulk of the storage, and 3 is the last 2 gig. So, no zil or cache used in the test, all formatted zfs v28, scrub reported zero errors on the array two days ago (though the scrub speed was 14M/s or so). It's something about max speed writes that get interrupted by another read that kills zfs/nas4free somehow. re 5: The ata drives indeed are on the ICH7, as the 300gb isn't an issue. The ssd's are on the higher speed controller, and are unused as noted before. re 6: You'll note that 'ls -l' didn't return, only the 'top' program run before the dd was started was launched. Here are the procstat -kk's for the dd and ls during their run (the ls not returning) nas4free:~# ps axu | grep dd root 4006 47.1 0.1 9916 1752 v0 R+ 2:56AM 0:07.92 dd if=/dev/urandom of=foo bs=512 count=20000000 root 4009 0.0 0.1 16280 2092 0 S+ 2:56AM 0:00.00 grep dd nas4free:~# procstat -kk 4006 PID TID COMM TDNAME KSTACK 4006 100114 dd - mi_switch+0x186 sleepq_timedwait+0x42 _cv_timedwait+0x13c txg_delay+0x137 dsl_pool_tempreserve_space+0xd5 dsl_dir_tempreserve_space+0x154 dmu_tx_assign+0x370 zfs_freebsd_write+0x45b VOP_WRITE_APV+0xb2 vn_write+0x38c dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 amd64_syscall+0x546 Xfast_syscall+0xf7 nas4free:~# procstat -kk 4006 PID TID COMM TDNAME KSTACK 4006 100114 dd - mi_switch+0x186 sleepq_timedwait+0x42 _cv_timedwait+0x13c txg_delay+0x137 dsl_pool_tempreserve_space+0xd5 dsl_dir_tempreserve_space+0x154 dmu_tx_assign+0x370 zfs_freebsd_write+0x45b VOP_WRITE_APV+0xb2 vn_write+0x38c dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 amd64_syscall+0x546 Xfast_syscall+0xf7 nas4free:~# procstat -kk 4006 PID TID COMM TDNAME KSTACK 4006 100114 dd - mi_switch+0x186 sleepq_timedwait+0x42 _cv_timedwait+0x13c txg_delay+0x137 dsl_pool_tempreserve_space+0xd5 dsl_dir_tempreserve_space+0x154 dmu_tx_assign+0x370 zfs_freebsd_write+0x45b VOP_WRITE_APV+0xb2 vn_write+0x38c dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 amd64_syscall+0x546 Xfast_syscall+0xf7 nas4free:~# procstat -kk 4006 PID TID COMM TDNAME KSTACK 4006 100114 dd - mi_switch+0x186 sleepq_timedwait+0x42 _cv_timedwait+0x13c txg_delay+0x137 dsl_pool_tempreserve_space+0xd5 dsl_dir_tempreserve_space+0x154 dmu_tx_assign+0x370 zfs_freebsd_write+0x45b VOP_WRITE_APV+0xb2 vn_write+0x38c dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 amd64_syscall+0x546 Xfast_syscall+0xf7 nas4free:~# ps axu | grep ls root 4016 0.0 0.1 14420 2196 v2 D+ 2:57AM 0:00.00 ls -l root 4018 0.0 0.1 16280 2084 0 RL+ 2:57AM 0:00.00 grep ls nas4free:~# procstat -kk 4016 PID TID COMM TDNAME KSTACK 4016 100085 ls - mi_switch+0x186 sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39 VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9 kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a amd64_syscall+0x546 Xfast_syscall+0xf7 nas4free:~# procstat -kk 4016 PID TID COMM TDNAME KSTACK 4016 100085 ls - mi_switch+0x186 sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39 VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9 kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a amd64_syscall+0x546 Xfast_syscall+0xf7 nas4free:~# procstat -kk 4016 PID TID COMM TDNAME KSTACK 4016 100085 ls - mi_switch+0x186 sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39 VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9 kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a amd64_syscall+0x546 Xfast_syscall+0xf7 nas4free:~# procstat -kk 4016 PID TID COMM TDNAME KSTACK 4016 100085 ls - mi_switch+0x186 sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39 VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9 kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a amd64_syscall+0x546 Xfast_syscall+0xf7 nas4free:~# procstat -kk 4006 PID TID COMM TDNAME KSTACK 4006 100114 dd - mi_switch+0x186 sleepq_wait+0x42 _cv_wait+0x121 txg_wait_open+0x85 zfs_freebsd_write+0x47b VOP_WRITE_APV+0xb2 vn_write+0x38c dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 amd64_syscall+0x546 Xfast_syscall+0xf7 nas4free:~# procstat -kk 4006 PID TID COMM TDNAME KSTACK 4006 100114 dd - mi_switch+0x186 sleepq_wait+0x42 _cv_wait+0x121 txg_wait_open+0x85 zfs_freebsd_write+0x47b VOP_WRITE_APV+0xb2 vn_write+0x38c dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 amd64_syscall+0x546 Xfast_syscall+0xf7 nas4free:~# procstat -kk 4006 PID TID COMM TDNAME KSTACK 4006 100114 dd - mi_switch+0x186 sleepq_wait+0x42 _cv_wait+0x121 txg_wait_open+0x85 zfs_freebsd_write+0x47b VOP_WRITE_APV+0xb2 vn_write+0x38c dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 amd64_syscall+0x546 Xfast_syscall+0xf7 nas4free:~# procstat -kk 4016 PID TID COMM TDNAME KSTACK 4016 100085 ls - mi_switch+0x186 sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39 VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9 kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a amd64_syscall+0x546 Xfast_syscall+0xf7 nas4free:~# procstat -kk 4016 PID TID COMM TDNAME KSTACK 4016 100085 ls - mi_switch+0x186 sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39 VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9 kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a amd64_syscall+0x546 Xfast_syscall+0xf7 nas4free:~# procstat -kk 4016 PID TID COMM TDNAME KSTACK 4016 100085 ls - mi_switch+0x186 sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39 VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9 kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a amd64_syscall+0x546 Xfast_syscall+0xf7 nas4free:~# procstat -kk 4006 PID TID COMM TDNAME KSTACK 4006 100114 dd - mi_switch+0x186 sleepq_wait+0x42 _cv_wait+0x121 txg_wait_open+0x85 zfs_freebsd_write+0x47b VOP_WRITE_APV+0xb2 vn_write+0x38c dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 amd64_syscall+0x546 Xfast_syscall+0xf7 nas4free:~# procstat -kk 4006 PID TID COMM TDNAME KSTACK 4006 100114 dd - mi_switch+0x186 sleepq_wait+0x42 _cv_wait+0x121 txg_wait_open+0x85 zfs_freebsd_write+0x47b VOP_WRITE_APV+0xb2 vn_write+0x38c dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 amd64_syscall+0x546 Xfast_syscall+0xf7 nas4free:~# procstat -kk 4006 PID TID COMM TDNAME KSTACK 4006 100114 dd - mi_switch+0x186 sleepq_wait+0x42 _cv_wait+0x121 txg_wait_open+0x85 zfs_freebsd_write+0x47b VOP_WRITE_APV+0xb2 vn_write+0x38c dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 amd64_syscall+0x546 Xfast_syscall+0xf7 nas4free:~# procstat -kk 4006 PID TID COMM TDNAME KSTACK 4006 100114 dd - mi_switch+0x186 sleepq_wait+0x42 _cv_wait+0x121 txg_wait_open+0x85 zfs_freebsd_write+0x47b VOP_WRITE_APV+0xb2 vn_write+0x38c dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 amd64_syscall+0x546 Xfast_syscall+0xf7 nas4free:~# procstat -kk 4016 PID TID COMM TDNAME KSTACK 4016 100085 ls - mi_switch+0x186 sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39 VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9 kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a amd64_syscall+0x546 Xfast_syscall+0xf7 nas4free:~# procstat -kk 4016 PID TID COMM TDNAME KSTACK 4016 100085 ls - mi_switch+0x186 sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39 VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9 kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a amd64_syscall+0x546 Xfast_syscall+0xf7 nas4free:~# procstat -kk 4016 PID TID COMM TDNAME KSTACK 4016 100085 ls - mi_switch+0x186 sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39 VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9 kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a amd64_syscall+0x546 Xfast_syscall+0xf7 nas4free:~# procstat -kk 4016 PID TID COMM TDNAME KSTACK 4016 100085 ls - mi_switch+0x186 sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39 VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9 kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a amd64_syscall+0x546 Xfast_syscall+0xf7 nas4free:~# procstat -kk 4016 PID TID COMM TDNAME KSTACK 4016 100085 ls - mi_switch+0x186 sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39 VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9 kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a amd64_syscall+0x546 Xfast_syscall+0xf7 nas4free:~# re 7: The 'submit a problem' form doesn't list 'fs' as an option, just kern. If there's another place for fs bugs I trust the freebsd folks will know what to do with it. Anyhow I do think it's about write completions getting lost. And, 'nas4free' is a pretty popular project having the major virtue that anyone, anywhere looking to solve this problem can download a livecd iso and reproduce it. This is not my first rodeo, there are several bug reports with various black magic tunable tweaks that generally I think accomplish avoiding the bug only by their usage patterns and the tweaks not hitting it. This particular iso freebsd download has the virtue of being completely reproducible. Does any of the above help? Also here's an update zpool : nas4free:~# zpool status pool1 pool: pool1 state: ONLINE scan: scrub canceled on Fri Jan 25 22:07:51 2013 config: NAME STATE READ WRITE CKSUM pool1 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ada2p2 ONLINE 0 0 0 ada3p2 ONLINE 0 0 0 ada4p2 ONLINE 0 0 0 ada5p2 ONLINE 0 0 0 errors: No known data errors nas4free:~# as4free:~# zpool iostat -v pool1 capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- pool1 3.01T 2.43T 168 0 360K 3.42K raidz1 3.01T 2.43T 168 0 360K 3.42K ada2p2 - - 124 0 104K 2.34K ada3p2 - - 123 0 104K 2.28K ada4p2 - - 124 0 103K 2.34K ada5p2 - - 126 0 98.1K 2.29K ---------- ----- ----- ----- ----- ----- ----- nas4free:~# nas4free:~# zpool get all pool1 NAME PROPERTY VALUE SOURCE pool1 size 5.44T - pool1 capacity 55% - pool1 altroot - default pool1 health ONLINE - pool1 guid 1701438519865110975 default pool1 version 28 default pool1 bootfs - default pool1 delegation on default pool1 autoreplace off default pool1 cachefile - default pool1 failmode wait default pool1 listsnapshots off default pool1 autoexpand off default pool1 dedupditto 0 default pool1 dedupratio 1.46x - pool1 free 2.43T - pool1 allocated 3.01T - pool1 readonly off - pool1 comment - default pool1 expandsize 0 - nas4free:~# nas4free:~# zfs get all pool1/videos NAME PROPERTY VALUE SOURCE pool1/videos type filesystem - pool1/videos creation Sat Jan 19 5:12 2013 - pool1/videos used 526G - pool1/videos available 1.74T - pool1/videos referenced 526G - pool1/videos compressratio 1.00x - pool1/videos mounted yes - pool1/videos quota none local pool1/videos reservation none local pool1/videos recordsize 128K default pool1/videos mountpoint /mnt/pool1/videos inherited from pool1 pool1/videos sharenfs off default pool1/videos checksum on default pool1/videos compression off local pool1/videos atime off local pool1/videos devices on default pool1/videos exec on default pool1/videos setuid on default pool1/videos readonly off local pool1/videos jailed off default pool1/videos snapdir hidden local pool1/videos aclmode discard default pool1/videos aclinherit restricted default pool1/videos canmount on local pool1/videos xattr off temporary pool1/videos copies 1 default pool1/videos version 5 - pool1/videos utf8only off - pool1/videos normalization none - pool1/videos casesensitivity sensitive - pool1/videos vscan off default pool1/videos nbmand off default pool1/videos sharesmb off default pool1/videos refquota none default pool1/videos refreservation none default pool1/videos primarycache all default pool1/videos secondarycache all default pool1/videos usedbysnapshots 0 - pool1/videos usedbydataset 526G - pool1/videos usedbychildren 0 - pool1/videos usedbyrefreservation 0 - pool1/videos logbias latency default pool1/videos dedup off local pool1/videos mlslabel - pool1/videos sync standard local pool1/videos refcompressratio 1.00x - pool1/videos written 526G - nas4free:~#
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201301260310.r0Q3A1fe051107>