Date: Mon, 18 Aug 2014 10:20:47 +0200 From: Bengt Ahlgren <bengta@sics.se> To: stable@freebsd.org Subject: Re: ZFS deadlock? Message-ID: <uh7wqa6us2o.fsf@P142s.sics.se> In-Reply-To: <uh7zjf54yak.fsf@P142s.sics.se> (Bengt Ahlgren's message of "Fri, 15 Aug 2014 16:34:11 %2B0200") References: <uh7zjf54yak.fsf@P142s.sics.se>
next in thread | previous in thread | raw e-mail | index | archive | help
Bengt Ahlgren <bengta@sics.se> writes: > During a copy (zfs send/recv) of a ~1TB dataset from one zpool to > another, my system seems to run into some issues. A simultaneous "find" > on the source data set deadlocks. This is the kernel stack: > > $ procstat -kk 1786 > PID TID COMM TDNAME KSTACK > 1786 101344 find - mi_switch+0x194 sleepq_wait+0x42 _cv_wait+0x112 zio_wait+0x61 dbuf_read+0x619 dmu_buf_hold+0xe0 zap_get_leaf_byblk+0x4a zap_deref_leaf+0x68 fzap_cursor_retrieve+0xe7 zap_cursor_retrieve+0x155 zfs_freebsd_readdir+0x2d8 VOP_READDIR_APV+0x78 kern_getdirentries+0x212 sys_getdirentries+0x23 amd64_syscall+0x5ea Xfast_syscall+0xf7 > > The zfs send/recv has gotten very slow, albeit seems to make very slow > progress (copy is, as obvious, from p0 to p2): > > p0 15.9T 2.20T 318 0 10.2M 0 > p1 11.1T 7.00T 0 0 0 0 > p2 2.55T 41.0T 0 0 0 0 > ---------- ----- ----- ----- ----- ----- ----- > p0 15.9T 2.20T 294 0 9.29M 0 > p1 11.1T 7.00T 0 0 0 0 > p2 2.55T 41.0T 0 0 0 0 > ---------- ----- ----- ----- ----- ----- ----- > p0 15.9T 2.20T 307 0 9.12M 0 > p1 11.1T 7.00T 0 0 0 0 > p2 2.55T 41.0T 0 0 0 0 > ---------- ----- ----- ----- ----- ----- ----- > p0 15.9T 2.20T 293 0 8.69M 0 > p1 11.1T 7.00T 0 0 0 0 > p2 2.55T 41.0T 0 58 0 1.61M > ---------- ----- ----- ----- ----- ----- ----- > p0 15.9T 2.20T 301 0 10.9M 0 > p1 11.1T 7.00T 0 0 0 0 > p2 2.55T 41.0T 0 1.62K 0 49.6M > ---------- ----- ----- ----- ----- ----- ----- > > The machine is otherwise quite idle. When the copy started, I got > around 200MB/s, now it's around 10MB/s. > > The ARC has gotten large, but that is likely normal: > > last pid: 1863; load averages: 0.20, 0.33, 0.63 up 0+02:27:44 16:31:52 > 50 processes: 1 running, 49 sleeping > CPU: 0.0% user, 0.0% nice, 0.2% system, 0.0% interrupt, 99.8% idle > Mem: 1688M Active, 61M Inact, 107G Wired, 3288K Cache, 126M Buf, 15G Free > ARC: 99G Total, 2483M MFU, 89G MRU, 33M Anon, 888M Header, 7427M Other > Swap: 128G Total, 128G Free > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 1229 root 1 20 0 39700K 3292K piperd 7 24:27 1.07% zfs > 1228 root 2 20 0 39832K 3420K nanslp 5 17:02 0.39% zfs > ... > > The source pool is pretty filled up, can that be an issue? > > $ zpool list > NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT > p0 18.1T 15.9T 2.20T 87% 1.00x ONLINE - > p1 18.1T 11.1T 7.00T 61% 1.00x ONLINE - > p2 43.5T 2.53T 41.0T 5% 1.00x ONLINE - > > The machine is running 9.3-REL and has two mps controllers. > > Any ideas? Just for the record: there was no deadlock after all. It turned out to be caused by a directory with ~4.5M entries. Bengt
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?uh7wqa6us2o.fsf>