Date: Fri, 15 Aug 2014 16:34:11 +0200 From: Bengt Ahlgren <bengta@sics.se> To: stable@freebsd.org Subject: ZFS deadlock? Message-ID: <uh7zjf54yak.fsf@P142s.sics.se>
next in thread | raw e-mail | index | archive | help
Hi! During a copy (zfs send/recv) of a ~1TB dataset from one zpool to another, my system seems to run into some issues. A simultaneous "find" on the source data set deadlocks. This is the kernel stack: $ procstat -kk 1786 PID TID COMM TDNAME KSTACK 1786 101344 find - mi_switch+0x194 sleepq_wait+0x42 _cv_wait+0x112 zio_wait+0x61 dbuf_read+0x619 dmu_buf_hold+0xe0 zap_get_leaf_byblk+0x4a zap_deref_leaf+0x68 fzap_cursor_retrieve+0xe7 zap_cursor_retrieve+0x155 zfs_freebsd_readdir+0x2d8 VOP_READDIR_APV+0x78 kern_getdirentries+0x212 sys_getdirentries+0x23 amd64_syscall+0x5ea Xfast_syscall+0xf7 The zfs send/recv has gotten very slow, albeit seems to make very slow progress (copy is, as obvious, from p0 to p2): p0 15.9T 2.20T 318 0 10.2M 0 p1 11.1T 7.00T 0 0 0 0 p2 2.55T 41.0T 0 0 0 0 ---------- ----- ----- ----- ----- ----- ----- p0 15.9T 2.20T 294 0 9.29M 0 p1 11.1T 7.00T 0 0 0 0 p2 2.55T 41.0T 0 0 0 0 ---------- ----- ----- ----- ----- ----- ----- p0 15.9T 2.20T 307 0 9.12M 0 p1 11.1T 7.00T 0 0 0 0 p2 2.55T 41.0T 0 0 0 0 ---------- ----- ----- ----- ----- ----- ----- p0 15.9T 2.20T 293 0 8.69M 0 p1 11.1T 7.00T 0 0 0 0 p2 2.55T 41.0T 0 58 0 1.61M ---------- ----- ----- ----- ----- ----- ----- p0 15.9T 2.20T 301 0 10.9M 0 p1 11.1T 7.00T 0 0 0 0 p2 2.55T 41.0T 0 1.62K 0 49.6M ---------- ----- ----- ----- ----- ----- ----- The machine is otherwise quite idle. When the copy started, I got around 200MB/s, now it's around 10MB/s. The ARC has gotten large, but that is likely normal: last pid: 1863; load averages: 0.20, 0.33, 0.63 up 0+02:27:44 16:31:52 50 processes: 1 running, 49 sleeping CPU: 0.0% user, 0.0% nice, 0.2% system, 0.0% interrupt, 99.8% idle Mem: 1688M Active, 61M Inact, 107G Wired, 3288K Cache, 126M Buf, 15G Free ARC: 99G Total, 2483M MFU, 89G MRU, 33M Anon, 888M Header, 7427M Other Swap: 128G Total, 128G Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 1229 root 1 20 0 39700K 3292K piperd 7 24:27 1.07% zfs 1228 root 2 20 0 39832K 3420K nanslp 5 17:02 0.39% zfs ... The source pool is pretty filled up, can that be an issue? $ zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT p0 18.1T 15.9T 2.20T 87% 1.00x ONLINE - p1 18.1T 11.1T 7.00T 61% 1.00x ONLINE - p2 43.5T 2.53T 41.0T 5% 1.00x ONLINE - The machine is running 9.3-REL and has two mps controllers. Any ideas? Bengt
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?uh7zjf54yak.fsf>