Date: Mon, 03 Apr 2017 16:37:05 +0000 From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 218337] panic: Journal overflow with g_journal_switcher waiting on wswbuf0 Message-ID: <bug-218337-8@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D218337 Bug ID: 218337 Summary: panic: Journal overflow with g_journal_switcher waiting on wswbuf0 Product: Base System Version: 10.3-STABLE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: freebsd-bugs@FreeBSD.org Reporter: longwitz@incore.de I am running several servers FreeBSD 10.3-STABLE #3 r307955 using gjournal/gmirror: --> gjournal status Name Status Components mirror/gmbkp4p7.journal N/A mirror/gmbkp4p3 mirror/gmbkp4p7 mirror/gmbkp4p8.journal N/A mirror/gmbkp4p4 mirror/gmbkp4p8 mirror/gmbkp4p9.journal N/A mirror/gmbkp4p5 mirror/gmbkp4p9=3D=3D root@dssbkp4 (pts/2= ) ->=20=20 --> gmirror status Name Status Components mirror/gmbkp4 COMPLETE da10 (ACTIVE) da11 (ACTIVE) mirror/gmbkp4p10.journal N/A mirror/gmbkp4p6 mirror/gmbkp4p10 I do not use gjournal on other disks because I am aware of the trouble described in PR kern/164252. This configuration runs correct for many years, all journal partitions have a size of 2 GB. Now a have got a panic on a test machine with 48 GB RAM: panic: Journal overflow (id =3D 1510160743 joffset=3D1950095360 active=3D86= 3204864 inactive=3D1948974592) cpuid =3D 4 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0c4ea36= 410 kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe0c4ea364c0 vpanic() at vpanic+0x126/frame 0xfffffe0c4ea36500 panic() at panic+0x43/frame 0xfffffe0c4ea36560 g_journal_flush() at g_journal_flush+0x8f9/frame 0xfffffe0c4ea36840 g_journal_flush_send() at g_journal_flush_send+0xd1/frame 0xfffffe0c4ea36880 g_journal_worker() at g_journal_worker+0x170a/frame 0xfffffe0c4ea36bb0 fork_exit() at fork_exit+0x9a/frame 0xfffffe0c4ea36bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0c4ea36bf0 --- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 --- KDB: enter: panic [ thread pid 27 tid 100182 ] Stopped at kdb_enter+0x3e: movq $0,kdb_why db:0:kdb.enter.panic> watchdog No argument provided, disabling watchdog db:0:kdb.enter.panic> run ddbinfo db:1:ddbinfo> capture on db:1:on> run lockinfo db:2:lockinfo> show lock Giant class: sleep mutex name: Giant flags: {DEF, RECURSE} state: {UNOWNED} db:2:Giant> show lockedvnods Locked vnodes 0xfffff80b7e1acb10: tag ufs, type VREG usecount 2, writecount 1, refcount 3781 mountedhere 0 flags (VI_ACTIVE) v_object 0xfffff8085348d600 ref 0 pages 30224 cleanbuf 548 dirtybuf 3230 lock type ufs: EXCL by thread 0xfffff800117c1960 (pid 7, g_journal switcher, tid 100169) ino 18, on dev mirror/gmbkp4p8.journal db:2:lockedvnods> show lockchain thread 100182 (pid 27, g_journal mirror/gm) running on CPU 4 db:2:lockchain> show sleepchain thread 100182 (pid 27, g_journal mirror/gm) running on CPU 4 db:1:sleepchain> show pcpu cpuid =3D 4 dynamic pcpu =3D 0xfffffe0c9c5dfe00 curthread =3D 0xfffff80011e014b0: pid 27 "g_journal mirror/gm" curpcb =3D 0xfffffe0c4ea36cc0 fpcurthread =3D none idlethread =3D 0xfffff8000b343960: tid 100007 "idle: cpu4" curpmap =3D 0xffffffff80efa9d8 tssp =3D 0xffffffff80f11ab0 commontssp =3D 0xffffffff80f11ab0 >From kerneldump: kgdb) bt #0 doadump (textdump=3D-1319329552) at pcpu.h:219 #1 0xffffffff80340f05 in db_fncall (dummy1=3D<value optimized out>, dummy2=3D<value optimized out>, dummy3=3D<value optimized out>, dummy4=3D<v= alue optimized out>) at /usr/src/sys/ddb/db_command.c:568 #2 0xffffffff80340bed in db_command (cmd_table=3D0x0) at /usr/src/sys/ddb/db_command.c:440 #3 0xffffffff80344e7f in db_script_exec (scriptname=3D0xfffffe0c4ea360c0 "kdb.enter.panic", warnifnotfound=3D<value optimized out>) at /usr/src/sys/ddb/db_script.c:302 #4 0xffffffff80344cc3 in db_script_kdbenter (eventname=3D0x0) at /usr/src/sys/ddb/db_script.c:324 #5 0xffffffff803432eb in db_trap (type=3D<value optimized out>, code=3D0) = at /usr/src/sys/ddb/db_main.c:230 #6 0xffffffff806df959 in kdb_trap (type=3D3, code=3D0, tf=3D<value optimiz= ed out>) at /usr/src/sys/kern/subr_kdb.c:656 #7 0xffffffff809391d3 in trap (frame=3D0xfffffe0c4ea363f0) at /usr/src/sys/amd64/amd64/trap.c:561 #8 0xffffffff8091eb6c in calltrap () at /usr/src/sys/amd64/amd64/exception.S:238 #9 0xffffffff806df0be in kdb_enter (why=3D0xffffffff80a79f84 "panic", msg= =3D<value optimized out>) at cpufunc.h:63 #10 0xffffffff806a12e6 in vpanic (fmt=3D<value optimized out>, ap=3D<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:882 #11 0xffffffff806a1193 in panic (fmt=3D0xffffffff80d24350 "P=C3=A8=C5=A0\20= 0=C3=BF=C3=BF=C3=BF=C3=BF\001") at /usr/src/sys/kern/kern_shutdown.c:818 #12 0xffffffff8112bbc9 in g_journal_flush (sc=3D<value optimized out>) at /usr/src/sys/modules/geom/geom_journal/../../../geom/journal/g_journal.c:344 #13 0xffffffff8112bca1 in g_journal_flush_send (sc=3D0xfffff80011dee400) at /usr/src/sys/modules/geom/geom_journal/../../../geom/journal/g_journal.c:14= 43 #14 0xffffffff8112f8ea in g_journal_worker (arg=3D0xfffff80011dee400) at /usr/src/sys/modules/geom/geom_journal/../../../geom/journal/g_journal.c:22= 11 #15 0xffffffff8066f99a in fork_exit (callout=3D0xffffffff8112e1e0 <g_journal_worker>, arg=3D0xfffff80011dee400, frame=3D0xfffffe0c4ea36c00) at /usr/src/sys/kern/kern_fork.c:1030 #16 0xffffffff8091f0ae in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:613 (kgdb) f 13 #13 0xffffffff8112bca1 in g_journal_flush_send (sc=3D0xfffff80011dee400) at /usr/src/sys/modules/geom/geom_journal/../../../geom/journal/g_journal.c:14= 43 1443 g_journal_add_current(sc, bp); (kgdb) p *sc $21 =3D {sc_id =3D 1510160743, sc_type =3D 3 '\003', sc_orig_type =3D 1 '\0= 01', sc_geom =3D 0xfffff80011de6b00, sc_flags =3D 64, sc_mtx =3D {lock_object =3D {lo_name =3D 0xffffffff81131= 9f4 "gjournal", lo_flags =3D 16973824, lo_data =3D 0, lo_witness =3D 0x0}, mtx_lock =3D 4}, sc_mediasize =3D= 4294966784, sc_sectorsize =3D 512, sc_bio_flush =3D 3, sc_journal_id =3D 114030489, sc_journal_next_id =3D 3= 023870498, sc_journal_copying =3D 0, sc_journal_offset =3D 1950095360, sc_journal_previous_id =3D 2305361505, sc_back_queue =3D {queue =3D { tqh_first =3D 0xfffff8094ce6e1f0, tqh_last =3D 0xfffff8065935cd10}, last_offset =3D 1945507328, insert_point =3D 0xfffff8065935cc98}, sc_regular_queue =3D {queue =3D {= tqh_first =3D 0xfffff808536333e0, tqh_last =3D 0xfffff80836392740}, last_offset =3D 478609408, insert_p= oint =3D 0xfffff808363926c8}, sc_delayed_queue =3D {queue =3D {tqh_first =3D 0xfffff805bf282d90, tqh_la= st =3D 0xfffff805f2023360}, last_offset =3D 475987968, insert_point =3D 0xfffff805f20232e8}, sc_delayed_count =3D 108, sc_current_queue =3D 0x0, sc_current_count =3D 0, sc_flush_queue =3D 0xfffff805f14fc3e0, sc_flush_count =3D 84, sc_flush_in_progress =3D 16, sc_copy_queue =3D 0x0, sc_copy_in_progress = =3D 0, sc_dconsumer =3D 0xfffff80011dea880, sc_jconsumer =3D 0xfffff80011deb600, sc_inactive =3D {jj_queue =3D 0x0, jj_offset =3D 1948974592}, sc_active =3D {jj_queue =3D 0xfffff80a4748ba= a8, jj_offset =3D 863204864}, sc_jstart =3D 0, sc_jend =3D 2147483136, sc_callout =3D {c_links =3D {le = =3D {le_next =3D 0x0, le_prev =3D 0xfffffe0000806768}, sle =3D {sle_next =3D 0x0}, tqe = =3D {tqe_next =3D 0x0, tqe_prev =3D 0xfffffe0000806768}}, c_time =3D 55532158374, c_precis= ion =3D 1342177187, c_arg =3D 0xfffff80011dee400, c_func =3D 0xffffffff8112e150 <g_journal_timeout>, c_lock =3D 0x0, c_flags =3D 0, c_iflags =3D 16, c_cpu =3D 0}, sc_worker =3D 0xfffff8001= 1df7500, sc_rootmount =3D 0x0} (kgdb) p vm_kmem_size $1 =3D 50189914112 (kgdb) p g_journal_cache_divisor $2 =3D 2 (kgdb) p g_journal_cache_low $3 =3D 3258108450 (kgdb) p g_journal_cache_limit $4 =3D 3620120576 (kgdb) p g_journal_cache_switch $5 =3D 90 (kgdb) p g_journal_force_switch $6 =3D 70 (kgdb) p g_journal_cache_used $7 =3D 1098408960 (kgdb) p g_journal_switch_time $8 =3D 10 (kgdb) p g_journal_switcher_wokenup $9 =3D 1 (kgdb) p g_journal_cache_alloc_failures $10 =3D 0 (kgdb) p g_journal_cache_misses $11 =3D 4 (kgdb) p g_journal_switcher_state $12 =3D 0 (kgdb) p g_journal_sync_requested $13 =3D 0 (kgdb) p g_journal_stats_journal_full $14 =3D 2 (kgdb) p g_journal_switch_time $15 =3D 10 (kgdb) p sc->sc_id $22 =3D 1510160743 (kgdb) p sc->sc_journal_offset $23 =3D 1950095360 (kgdb) p sc->sc_active.jj_offset $24 =3D 863204864 (kgdb) p sc->sc_inactive.jj_offset $25 =3D 1948974592 (kgdb) p g_journal_stats_wait_for_copy $28 =3D 0 (kgdb) thread 465 [Switching to thread 465 (Thread 100169)]#0 sched_switch (td=3D0xfffff800117c1960, newtd=3D<value optimized out>, flags=3D<value opt= imized out>) at /usr/src/sys/kern/sched_ule.c:1945 1945 cpuid =3D PCPU_GET(cpuid); (kgdb) where #0 sched_switch (td=3D0xfffff800117c1960, newtd=3D<value optimized out>, flags=3D<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945 #1 0xffffffff806aabc1 in mi_switch (flags=3D260, newtd=3D0x0) at /usr/src/sys/kern/kern_synch.c:491 #2 0xffffffff806ee73a in sleepq_wait (wchan=3D0x0, pri=3D0) at /usr/src/sys/kern/subr_sleepqueue.c:624 #3 0xffffffff806aa5d7 in _sleep (ident=3D<value optimized out>, lock=3D<va= lue optimized out>, priority=3D<value optimized out>, wmesg=3D<value optimized = out>, sbt=3D<value optimized out>, pr=3D<value optimized out>, flags=3D<value= optimized out>) at /usr/src/sys/kern/kern_synch.c:255 #4 0xffffffff80910c58 in getpbuf (pfreecnt=3D<value optimized out>) at /usr/src/sys/vm/vm_pager.c:336 #5 0xffffffff8073f453 in cluster_wbuild (vp=3D0xfffff80b7e1acb10, size=3D3= 2768, start_lbn=3D<value optimized out>, len=3D<value optimized out>, gbflags=3D<= value optimized out>) at /usr/src/sys/kern/vfs_cluster.c:840 #6 0xffffffff808d8748 in ffs_syncvnode (vp=3D0xfffff80b7e1acb10, waitfor= =3D1, flags=3D<value optimized out>) at /usr/src/sys/ufs/ffs/ffs_vnops.c:298 #7 0xffffffff808d5f6d in ffs_sync (mp=3D0xfffff8002ba4b000, waitfor=3DCann= ot access memory at address 0x1 ) at /usr/src/sys/ufs/ffs/ffs_vfsops.c:1591 #8 0xffffffff807618fc in vfs_write_suspend (mp=3D0xfffff8002ba4b000, flags=3D<value optimized out>) at /usr/src/sys/kern/vfs_vnops.c:1877 #9 0xffffffff8112a3e4 in g_journal_switcher (arg=3D0xffffffff81132f28) at /usr/src/sys/modules/geom/geom_journal/../../../geom/journal/g_journal.c:29= 64 #10 0xffffffff8066f99a in fork_exit (callout=3D0xffffffff81129a40 <g_journal_switcher>, arg=3D0xffffffff81132f28, frame=3D0xfffffe0c4e49cc00)= at /usr/src/sys/kern/kern_fork.c:1030 #11 0xffffffff8091f0ae in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:613 #12 0x0000000000000000 in ?? () (kgdb) f 4 #4 0xffffffff80910c58 in getpbuf (pfreecnt=3D<value optimized out>) at /usr/src/sys/vm/vm_pager.c:336 336 msleep(pfreecnt, &pbuf_mtx, PVM, "wswbuf0", 0); (kgdb) list 331 mtx_lock(&pbuf_mtx); 332 333 for (;;) { 334 if (pfreecnt) { 335 while (*pfreecnt =3D=3D 0) { 336 msleep(pfreecnt, &pbuf_mtx, PVM, "wswbuf0", 0); 337 } 338 } The panic was caused by a vi-Problem: I tried to insert some bytes per cut/paste into an opened file and vi started (I do not know why) to loop and wrote my copystring over and over again in his address space until vi dumpe= d. The kernel wrote the core dump of vi to /var/core and doing this issued the gjournal panic. (kgdb) thread 541 (vi) [Switching to thread 541 (Thread 101086)]#0 sched_switch (td=3D0xfffff805564a7960, newtd=3D<value optimized out>, flags=3D<value opt= imized out>) at /usr/src/sys/kern/sched_ule.c:1945 1945 cpuid =3D PCPU_GET(cpuid); (kgdb) bt #0 sched_switch (td=3D0xfffff805564a7960, newtd=3D<value optimized out>, flags=3D<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945 #1 0xffffffff806aabc1 in mi_switch (flags=3D260, newtd=3D0x0) at /usr/src/sys/kern/kern_synch.c:491 #2 0xffffffff806ee73a in sleepq_wait (wchan=3D0x0, pri=3D0) at /usr/src/sys/kern/subr_sleepqueue.c:624 #3 0xffffffff806aa5d7 in _sleep (ident=3D<value optimized out>, lock=3D<va= lue optimized out>, priority=3D<value optimized out>, wmesg=3D<value optimized = out>, sbt=3D<value optimized out>, pr=3D<value optimized out>, flags=3D<value= optimized out>) at /usr/src/sys/kern/kern_synch.c:255 #4 0xffffffff80761487 in vn_start_write_locked (mp=3D0xfffff8002ba4b000, flags=3D257) at /usr/src/sys/kern/vfs_vnops.c:1673 #5 0xffffffff8075fc23 in vn_start_write (vp=3D<value optimized out>, mpp= =3D<value optimized out>, flags=3D<value optimized out>) at /usr/src/sys/kern/vfs_vnops.c:1725 #6 0xffffffff807605db in vn_rdwr (rw=3DUIO_WRITE, vp=3D0xfffff80b7e1acb10, base=3D<value optimized out>, len=3D<value optimized out>, offset=3D<value = optimized out>, segflg=3D<value optimized out>, ioflg=3D<value optimized out>, active_cred=3D0xfffff8055623b280, file_cred=3D0x1, aresid=3D<value optimize= d out>, td=3D<value optimized out>) at /usr/src/sys/kern/vfs_vnops.c:530 #7 0xffffffff80760d47 in vn_rdwr_inchunks (rw=3DUIO_WRITE, vp=3D0xfffff80b7e1acb10, base=3D<value optimized out>, len=3D<value optimiz= ed out>, offset=3D<value optimized out>, segflg=3DUIO_USERSPACE, aresid=3D<value optimized out>) at /usr/src/sys/kern/vfs_vnops.c:630 #8 0xffffffff80645e14 in elf64_coredump (td=3D<value optimized out>, vp=3D0xfffff80b7e1acb10, limit=3D<value optimized out>, flags=3D<value opti= mized out>) at /usr/src/sys/kern/imgact_elf.c:1179 #9 0xffffffff806a4a6c in sigexit (td=3D0xfffff805564a7960, sig=3D11) at /usr/src/sys/kern/kern_sig.c:3366 #10 0xffffffff806a50c6 in postsig (sig=3D<value optimized out>) at /usr/src/sys/kern/kern_sig.c:2911 #11 0xffffffff806f40f7 in ast (framep=3D<value optimized out>) at /usr/src/sys/kern/subr_trap.c:316 #12 0xffffffff8091fe89 in doreti_ast () at /usr/src/sys/amd64/amd64/exception.S:685 #13 0x0000000801853c00 in ?? () #14 0x0000000801853c80 in ?? () #15 0x0000062e00000008 in ?? () #16 0x0000000000000604 in ?? () #17 0x0000000000000002 in ?? () #18 0x0000000000000001 in ?? () #19 0x0000000801815400 in ?? () #20 0x0000000000000008 in ?? () #21 0x00007fffffffe190 in ?? () #22 0x0000000000000000 in ?? () (kgdb) f 4 #4 0xffffffff80761487 in vn_start_write_locked (mp=3D0xfffff8002ba4b000, flags=3D257) at /usr/src/sys/kern/vfs_vnops.c:1673 1673 error =3D msleep(&mp->mnt_flag, MNT_MTX(mp), mflags, (kgdb) list 1668 while ((mp->mnt_kern_flag & MNTK_SUSPEND) !=3D 0) { 1669 if (flags & V_NOWAIT) { 1670 error =3D EWOULDBLOCK; 1671 goto unlock; 1672 } 1673 error =3D msleep(&mp->mnt_flag, MNT_MTX(mp), mflags, 1674 "suspfs", 0); 1675 if (error) 1676 goto unlock; 1677 } After reboot I saw (inode 18 is the inode the gjournal switcher had locked): =3D=3D root@dssbkp4 (pts/1) -> ls -li /var/core total 1704864 18 -rw------- 1 root wheel 1745289216 29 M=C3=A4r 22:34 vi-70628.core =3D=3D=3D root@dssbkp4 (pts/1) -> ls -li /var/tmp/vi.recover/ total 395232 80384 -rw------- 1 root wheel 404533248 29 M=C3=A4r 22:34 vi.CqtHml This server has faster disks than my others, maybe the journal space is too low. But the journal_switcher sits on wswbuf0 (but why ?) and every journal space will be filled bringing the overflow. Looking at the gjournal variables, we have (/sys/geom/journal/g_journal.c) static u_int g_journal_cache_limit =3D 64 * 1024 * 1024; static u_int g_journal_cache_divisor =3D 2; static u_int g_journal_cache_switch =3D 90; static u_int g_journal_cache_low =3D 0; and in g_journal_init()=20=20 if (g_journal_cache_limit > 0) { g_journal_cache_limit =3D vm_kmem_size / g_journal_cache_divisor; g_journal_cache_low =3D (g_journal_cache_limit / 100) * g_journal_cache_switch; } But vm_kmem_size is defined u_long. Therefore we have g_journal_cache_limit =3D 3.620.120.576, but we should have g_journal_cache_limit =3D 50.189.914.112 / 2. This looks to me like a programming error in g_journal.c, relevant for serv= ers with more RAM than 4 GB. Can this be respinsible for the panic ? --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-218337-8>