Date: Mon, 12 Nov 2018 08:42:22 +0000 From: bugzilla-noreply@freebsd.org To: fs@FreeBSD.org Subject: [Bug 229614] ZFS lockup in zil_commit_impl Message-ID: <bug-229614-3630-FAIh6q6eCO@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-229614-3630@https.bugs.freebsd.org/bugzilla/> References: <bug-229614-3630@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D229614 --- Comment #41 from Andreas Sommer <andreas.sommer87@googlemail.com> --- (In reply to Michael Gmelin from comment #40) Here are the things I tried to capture the latest issue I found. As Michael noted, this happened with the patched kernel (on 11.2), after days of stomp= ing the system with I/O-heavy infinite background loops. Interesting fact: `top` does not update anymore once per second, but can be updated with the Space key as usual. Also, my `screen` processes seem gone = =E2=80=93 the ones I used to run the background loops. Please note that we now have two processes in zil_commit_impl, and possibly those two are interlocked?! I'm keeping the system in this state for now, j= ust in case someone has an idea which debug information we should collect apart from the below. ----------------- root@asommer-devpm-ci:/usr/home/asommer # top last pid: 58504; load averages: 2.42, 3.37, 3.19=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20 up 3+10:22:17 02:49:54 61 processes: 1 running, 59 sleeping, 1 zombie CPU: % user, % nice, % system, % interrupt, % idle Mem: 69M Active, 1650M Inact, 10G Wired, 3544M Free ARC: 4243M Total, 1624M MFU, 1829M MRU, 2539K Anon, 111M Header, 775M Other 2517M Compressed, 4824M Uncompressed, 1.92:1 Ratio Swap: 1024M Total, 1024M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 1326 1001 3 44 0 177M 169M select 3 197:02 53.96% python3.6 # <------------ 1777 1001 1 52 0 38216K 33724K select 0 1:09 0.00% python3.6 23006 root 1 20 0 10352K 6592K select 0 0:55 0.00% ssh 833 root 1 20 0 6464K 2396K nanslp 0 0:23 0.00% cr= on 1220 root 1 20 0 6336K 2016K nanslp 1 0:10 0.00% sv= scan 1047 root 1 20 0 6336K 2016K nanslp 3 0:10 0.00% sv= scan 1066 root 1 20 0 10444K 6260K select 1 0:06 0.00% sendmail 1240 root 1 20 0 6464K 2404K nanslp 3 0:04 0.00% cr= on 816 root 1 20 0 10452K 6300K select 0 0:04 0.00% sendmail 1233 root 1 20 0 10444K 6264K select 0 0:04 0.00% sendmail 1073 root 1 20 0 6464K 2400K nanslp 2 0:03 0.00% cr= on 513 root 1 20 0 9180K 5120K select 2 0:02 0.00% de= vd 596 root 1 20 0 6412K 2420K select 0 0:01 0.00% syslogd 999 root 1 20 0 6420K 2464K select 0 0:01 0.00% syslogd 1167 root 1 20 0 6420K 2524K select 1 0:01 0.00% syslogd 1236 smmsp 1 20 0 10444K 5960K pause 3 0:00 0.00% sendmail 778 www 1 20 0 13332K 8668K kqread 0 0:00 0.00% ng= inx 461 root 1 42 0 6544K 2380K select 0 0:00 0.00% dhclient 23003 root 1 44 0 7596K 3616K select 0 0:00 0.00% su= do 1069 smmsp 1 20 0 10444K 5556K pause 2 0:00 0.00% sendmail 819 smmsp 1 20 0 10452K 5912K pause 2 0:00 0.00% sendmail 885 root 1 52 0 6408K 2124K ttyin 1 0:00 0.00% ge= tty 507 _dhcp 1 20 0 6544K 2512K select 2 0:00 0.00% dhclient 58428 asommer 1 23 0 13160K 8012K select 0 0:00 0.00% ss= hd 58433 root 1 52 0 7412K 4048K pause 3 0:00 0.00% csh 58494 root 1 23 0 13160K 7956K select 3 0:00 0.00% ss= hd 58426 root 1 23 0 13160K 7956K select 0 0:00 0.00% ss= hd 1224 root 1 20 0 4244K 1976K select 0 0:00 0.00% supervise 1052 root 1 20 0 4244K 1976K select 0 0:00 0.00% supervise 58501 root 1 47 0 7412K 4068K pause 0 0:00 0.00% csh 1225 root 1 20 0 4244K 1976K select 2 0:00 0.00% supervise 1053 root 1 20 0 4244K 1976K select 3 0:00 0.00% supervise 58482 root 1 52 0 7064K 2868K wait 2 0:00 0.00% sh 58496 asommer 1 23 0 13160K 7968K select 2 0:00 0.00% ss= hd 58378 asommer 1 21 0 0K 16K vmo_de 2 0:00 0.00% ss= hd 797 root 1 21 0 12848K 7108K select 0 0:00 0.00% ss= hd 1776 root 1 20 0 7584K 3624K select 2 0:00 0.00% su= do 58497 asommer 1 26 0 7796K 3784K wait 2 0:00 0.00% ba= sh 58499 root 1 26 0 7596K 3616K select 1 0:00 0.00% su= do 58431 root 1 26 0 7596K 3616K select 3 0:00 0.00% su= do 58429 asommer 1 26 0 7796K 3784K wait 0 0:00 0.00% ba= sh 58381 asommer 1 20 0 4232K 1928K zcw->z 2 0:00 0.00% sy= nc # <------------ 1230 1001 1 20 0 4260K 1988K piperd 3 0:00 0.00% multilog 58432 root 1 26 0 6932K 2900K wait 2 0:00 0.00% su 23002 asommer 1 20 0 7600K 4132K select 1 0:00 0.00% sc= reen 58500 root 1 26 0 6932K 2900K wait 0 0:00 0.00% su 1059 1001 1 52 0 4260K 1984K piperd 1 0:00 0.00% multilog 58504 root 1 46 0 7916K 3140K CPU3 3 0:00 0.00% top 882 root 1 52 0 6408K 2128K ttyin 0 0:00 0.00% ge= tty 877 root 1 52 0 6408K 2128K ttyin 2 0:00 0.00% ge= tty 879 root 1 52 0 6408K 2128K ttyin 2 0:00 0.00% ge= tty 878 root 1 52 0 6408K 2128K ttyin 3 0:00 0.00% ge= tty 881 root 1 52 0 6408K 2128K ttyin 3 0:00 0.00% ge= tty 884 root 1 52 0 6408K 2128K ttyin 0 0:00 0.00% ge= tty 880 root 1 52 0 6408K 2128K ttyin 1 0:00 0.00% ge= tty 883 root 1 52 0 6408K 2128K ttyin 3 0:00 0.00% ge= tty 1048 root 1 52 0 4232K 1960K piperd 1 0:00 0.00% readproctitle root@asommer-devpm-ci:/usr/home/asommer # procstat -kk 1326 PID TID COMM TDNAME KSTACK 1326 100422 python3.6 - mi_switch+0xe6 sleepq_catch_signals+0x40c sleepq_timedwait_sig+0x14 _cv_timedwait_sig_sbt+0x18c seltdwait+0x7b kern_poll+0x3fd sys_poll+0x50 amd64_syscall+0xa38 fast_syscall_common+0x101 1326 100433 python3.6 - mi_switch+0xe6 sleepq_catch_signals+0x40c sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x1= 43 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa38 fast_syscall_common+0x101 1326 100436 python3.6 - mi_switch+0xe6 sleepq_timedwait+0x2f _cv_timedwait_sbt+0x18c zil_commit_impl+0xcd3 zfs_freebsd_putpages+0x635 VOP_PUTPAGES_APV+0x82 vnode_pager_putpages+0x8e vm_pageout_flush+0xea vm_object_page_collect_flush+0x213 vm_object_page_clean+0x146 vm_object_terminate+0x93 zfs_freebsd_reclaim+0x1e VOP_RECLAIM_APV+0x82 vgonel+0x208 vrecycle+0x4a zfs_freebsd_inactive+0xd VOP_INACTIVE_APV+0x82 vinactive+0xfc root@asommer-devpm-ci:/usr/home/asommer # procstat 58381 PID PPID PGID SID TSID THR LOGIN WCHAN EMUL COMM 58381 1 58381 58379 58379 1 asommer zcw->zcw FreeBSD ELF64 sync root@asommer-devpm-ci:/usr/home/asommer # procstat -f 58381 PID COMM FD T V FLAGS REF OFFSET PRO NAME 58381 sync text v r r------- - - - /bin/sync 58381 sync cwd v d r------- - - - /usr/home/asommer 58381 sync root v d r------- - - - / 58381 sync 0 v x rw------ 3 40 - - 58381 sync 1 v x rw------ 3 40 - - 58381 sync 2 v x rw------ 3 40 - - root@asommer-devpm-ci:/usr/home/asommer # procstat -r 58381 PID COMM RESOURCE VALUE 58381 sync user time 00:00:00.000000 58381 sync system time 00:00:00.003656 58381 sync maximum RSS 0.0 KB 58381 sync integral shared memory 0.0 KB 58381 sync integral unshared data 0.0 KB 58381 sync integral unshared stack 0.0 KB 58381 sync page reclaims 89 58381 sync page faults 0 58381 sync swaps 0 58381 sync block reads 3 58381 sync block writes 0 58381 sync messages sent 0 58381 sync messages received 0 58381 sync signals received 0 58381 sync voluntary context switches 6 58381 sync involuntary context switches 0 root@asommer-devpm-ci:/usr/home/asommer # procstat -t 58381 PID TID COMM TDNAME CPU PRI STATE WCHAN 58381 101315 sync - -1 120 sleep zcw->= zcw root@asommer-devpm-ci:/usr/home/asommer # procstat -k 58381 PID TID COMM TDNAME KSTACK 58381 101315 sync - mi_switch sleepq_timed= wait _cv_timedwait_sbt zil_commit_impl zfs_sync sys_sync amd64_syscall fast_syscall_common root@asommer-devpm-ci:/usr/home/asommer # procstat -kk 58381 PID TID COMM TDNAME KSTACK 58381 101315 sync - mi_switch+0xe6 sleepq_timedwait+0x2f _cv_timedwait_sbt+0x18c zil_commit_impl+0xcd3 zfs_sync+0xa6 sys_sync+0x116 amd64_syscall+0xa38 fast_syscall_common+0x101 --=20 You are receiving this mail because: You are on the CC list for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-229614-3630-FAIh6q6eCO>