Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Aug 2014 16:34:11 +0200
From:      Bengt Ahlgren <bengta@sics.se>
To:        stable@freebsd.org
Subject:   ZFS deadlock?
Message-ID:  <uh7zjf54yak.fsf@P142s.sics.se>

next in thread | raw e-mail | index | archive | help
Hi!

During a copy (zfs send/recv) of a ~1TB dataset from one zpool to
another, my system seems to run into some issues.  A simultaneous "find"
on the source data set deadlocks.  This is the kernel stack:

$ procstat -kk 1786
  PID    TID COMM             TDNAME           KSTACK                       
 1786 101344 find             -                mi_switch+0x194 sleepq_wait+0x42 _cv_wait+0x112 zio_wait+0x61 dbuf_read+0x619 dmu_buf_hold+0xe0 zap_get_leaf_byblk+0x4a zap_deref_leaf+0x68 fzap_cursor_retrieve+0xe7 zap_cursor_retrieve+0x155 zfs_freebsd_readdir+0x2d8 VOP_READDIR_APV+0x78 kern_getdirentries+0x212 sys_getdirentries+0x23 amd64_syscall+0x5ea Xfast_syscall+0xf7 

The zfs send/recv has gotten very slow, albeit seems to make very slow
progress (copy is, as obvious, from p0 to p2):

p0          15.9T  2.20T    318      0  10.2M      0
p1          11.1T  7.00T      0      0      0      0
p2          2.55T  41.0T      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
p0          15.9T  2.20T    294      0  9.29M      0
p1          11.1T  7.00T      0      0      0      0
p2          2.55T  41.0T      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
p0          15.9T  2.20T    307      0  9.12M      0
p1          11.1T  7.00T      0      0      0      0
p2          2.55T  41.0T      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
p0          15.9T  2.20T    293      0  8.69M      0
p1          11.1T  7.00T      0      0      0      0
p2          2.55T  41.0T      0     58      0  1.61M
----------  -----  -----  -----  -----  -----  -----
p0          15.9T  2.20T    301      0  10.9M      0
p1          11.1T  7.00T      0      0      0      0
p2          2.55T  41.0T      0  1.62K      0  49.6M
----------  -----  -----  -----  -----  -----  -----

The machine is otherwise quite idle.  When the copy started, I got
around 200MB/s, now it's around 10MB/s.

The ARC has gotten large, but that is likely normal:

last pid:  1863;  load averages:  0.20,  0.33,  0.63    up 0+02:27:44  16:31:52
50 processes:  1 running, 49 sleeping
CPU:  0.0% user,  0.0% nice,  0.2% system,  0.0% interrupt, 99.8% idle
Mem: 1688M Active, 61M Inact, 107G Wired, 3288K Cache, 126M Buf, 15G Free
ARC: 99G Total, 2483M MFU, 89G MRU, 33M Anon, 888M Header, 7427M Other
Swap: 128G Total, 128G Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
 1229 root          1  20    0 39700K  3292K piperd  7  24:27   1.07% zfs
 1228 root          2  20    0 39832K  3420K nanslp  5  17:02   0.39% zfs
...

The source pool is pretty filled up, can that be an issue?

$ zpool list
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
p0    18.1T  15.9T  2.20T    87%  1.00x  ONLINE  -
p1    18.1T  11.1T  7.00T    61%  1.00x  ONLINE  -
p2    43.5T  2.53T  41.0T     5%  1.00x  ONLINE  -

The machine is running 9.3-REL and has two mps controllers.

Any ideas?

Bengt



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?uh7zjf54yak.fsf>