Date: Thu, 06 Apr 2017 16:33:51 +0000 From: bugzilla-noreply@freebsd.org To: freebsd-geom@FreeBSD.org Subject: [Bug 218337] panic: Journal overflow with g_journal_switcher waiting on wswbuf0 Message-ID: <bug-218337-14739-mo5jI0EHNI@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-218337-14739@https.bugs.freebsd.org/bugzilla/> References: <bug-218337-14739@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D218337 longwitz@incore.de changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |longwitz@incore.de --- Comment #2 from longwitz@incore.de --- Thanks for the extensive example, the information of your kerneldump looks nearly identical to mine.=20 I think the problem with u_int/u_long variables in g_journal.c is not the reason for the panic, there is already the PR kern/198500. The panic has to do with the extra physical memory described in man pbuf(9). I see the following user of pbufs (I don't have nfs and fuse): nbuf =3D 105931 nswbuf =3D min(nbuf / 4, 256) =3D 256; User of pbufs: boottime kerneldump md_vnode_pbuf_freecnt 25 25 smbfs_pbuf_freecnt 129 129 ncl_pbuf_freecnt 129 129 cluster_pbuf_freecnt 128 0 vnode_pbuf_freecnt 129 129 nsw_rcount 128 128 nsw_wcount_sync 64 64 The g_journal_switcher (pid 7) is waiting on channel "wswbuf0" because all the cluster_pbufs are in use. Looking at the 256 swbufs I found, that 128 are free and 128 are in use by the g_journal_switcher itself. All the buffers have the same b_bufobj and the same b_iodone address "cluster_callback". One example: (kgdb) p swbuf[126] $352 =3D {b_bufobj =3D 0xfffff80b7e1acbe0, b_bcount =3D 131072, b_caller1 = =3D 0x0, b_data =3D 0xfffffe0c1c2ec000 "", b_error =3D 0, b_iocmd =3D 2 '\002', b_io= flags =3D 0 '\0', b_iooffset =3D 472973312, b_resid =3D 0, b_iodone =3D 0xffffffff8073fcb0 <cluster_callback>, b_blkno =3D 923776, b_offset =3D 2841903104, b_bobufs = =3D {tqe_next =3D 0x0, tqe_prev =3D 0x0}, b_vflags =3D 0, b_freelist =3D {tqe_next =3D 0xfffff= e0bafe182b8, tqe_prev =3D 0xffffffff80eaa460}, b_qindex =3D 0, b_flags =3D 1677721636, b_xflags =3D 0 '\0', b_lock =3D {lock_object =3D {lo_name =3D 0xffffffff8= 0a829a7 "bufwait", lo_flags =3D 108199936, lo_data =3D 0, lo_witness =3D 0x0}, lk_lock =3D 18446744073709551600, lk_exslpfail =3D 0, lk_timo =3D 0, lk= _pri =3D 96}, b_bufsize =3D 131072, b_runningbufspace =3D 131072, b_kvabase =3D 0xfffffe0c1c2ec000 "", b_kvaalloc =3D 0x0, b_kvasize =3D 13= 1072, b_lblkno =3D 86728, b_vp =3D 0xfffff80b7e1acb10, b_dirtyoff =3D 0, b_dirtye= nd =3D 131072, b_rcred =3D 0x0, b_wcred =3D 0x0, b_saveaddr =3D 0xfffffe0c1c2ec000, b_pa= ger =3D {pg_reqpage =3D 0}, b_cluster =3D {cluster_head =3D {tqh_first =3D 0xfffffe= 0bb1bec410, tqh_last =3D 0xfffffe0bb1bebe20}, cluster_entry =3D {tqe_next =3D 0xfffffe0bb1bec410, tqe_prev =3D 0xfffffe0bb1bebe20}}, b_pages =3D {0xfffff80c097f97c0, 0xfffff80c097f9828, 0xfffff80c097f9890, 0xfffff80c097f98f8, 0xfffff80c097f9960, 0xfffff80c097f99c8, 0xfffff80c097f9a30, 0xfffff80c097f9= a98, 0xfffff80c097f9b00, 0xfffff80c097f9b68, 0xfffff80c097f9bd0, 0xfffff80c097f9c38, 0xfffff80c097f9ca0, 0xfffff80c097f9d08, 0xfffff80c097f9= d70, 0xfffff80c097f9dd8, 0xfffff80c097f9e40, 0xfffff80c097f9ea8, 0xfffff80c097f9f10, 0xfffff80c097f9f78, 0xfffff80c097f9fe0, 0xfffff80c097fa= 048, 0xfffff80c097fa0b0, 0xfffff80c097fa118, 0xfffff80c11547980, 0xfffff80c115479e8, 0xfffff80c11547a50, 0xfffff80c11547ab8, 0xfffff80c11547= b20, 0xfffff80c11547b88, 0xfffff80c11547bf0, 0xfffff80c11547c58}, b_npages = =3D 32, b_dep =3D {lh_first =3D 0x0}, b_fsprivate1 =3D 0x0, b_fsprivate2 =3D 0x0, b_fsprivate3 =3D 0x0, b_pin_count =3D 0} Therefore the g_journal_switcher has all his cluster pbufs in use and waits forever for another one. So the worker thread must panic with overflow. In cluster_wbuild() I can't see a check for "cluster_pbuf_freecnt > 0" to a= void the hang on "wswbuf0". I wonder why this seems only a problem with gjournal, other components in the kernel also use VFS_SYNC. I would like to know if this problem can be fixed. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-218337-14739-mo5jI0EHNI>