Date: Sun, 08 Jul 2018 20:42:53 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 229614] ZFS lockup in zil_commit_impl Message-ID: <bug-229614-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D229614 Bug ID: 229614 Summary: ZFS lockup in zil_commit_impl Product: Base System Version: 11.2-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Many People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: andreas.sommer87@googlemail.com CC: avg@FreeBSD.org, grembo@FreeBSD.org Created attachment 194962 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D194962&action= =3Dedit Debugging attempts (command line output) Relevant part of my research thus far (see attached file for some more comm= ands I've tried to debug a little): # procstat -kk 69994 PID TID COMM TDNAME KSTACK [...] 69994 101224 python3.6 - mi_switch+0xe6 sleepq_wait+0x2c _sx_xlock_hard+0x306 zil_commit_impl+0x11d zfs_freebsd_putpages+0x635 VOP_PUTPAGES_APV+0x82 vnode_pager_putpages+0x8e vm_pageout_flush+0xea vm_object_page_collect_flush+0x213 vm_object_page_clean+0x146 vm_object_terminate+0x93 zfs_freebsd_reclaim+0x1e VOP_RECLAIM_APV+0x82 vgonel+0x208 vrecycle+0x4a zfs_freebsd_inactive+0xd VOP_INACTIVE_APV+0x82 vinactive+0xfc This is luckily on a CI instance in AWS EC2, not a production machine. This happened *multiple* times to me in the last weeks, roughly once per week. So probably I'll reset the machine very soon but will run into it again if you want me to debug something hands-on. The earliest occurrence which I can st= ill see in monitoring graphs was 2018-06-24 i.e. two days before I upgraded to 11.2. Before that, I had run 10.3 until the upgrade to 11.1 on 2018-06-13. Honestly, I don't recall this happening while we were still on 10.3, but I'm human and could be mistaken. Hard restart resolves the problem. In my speci= fic case, I noticed it because builders/workers in my Buildbot web interface we= re not showing anymore and on quick look, the buildbot master process was hang= ing to that extent. Other things like SSH and the web interface were still work= ing. Running `sync` manually hangs, see attached command line output. I've found these possibly related issues: * Sporadic system hang - https://github.com/zfsonlinux/zfs/issues/7425#issuecomment-403312992 * Process hang in state =E2=80=9Czilog->zl_writer_lock=E2=80=9D on Unstable= - https://discourse.trueos.org/t/process-hang-in-state-zilog-zl-writer-lock-o= n-unstable/2193/20 --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-229614-227>