Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 03 Apr 2017 16:37:05 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 218337] panic: Journal overflow with g_journal_switcher waiting on wswbuf0
Message-ID:  <bug-218337-8@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D218337

            Bug ID: 218337
           Summary: panic: Journal overflow with g_journal_switcher
                    waiting on wswbuf0
           Product: Base System
           Version: 10.3-STABLE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: longwitz@incore.de

I am running several servers FreeBSD 10.3-STABLE #3 r307955 using
gjournal/gmirror:

 --> gjournal status
                    Name  Status  Components
 mirror/gmbkp4p7.journal     N/A  mirror/gmbkp4p3
                                  mirror/gmbkp4p7
 mirror/gmbkp4p8.journal     N/A  mirror/gmbkp4p4
                                  mirror/gmbkp4p8
 mirror/gmbkp4p9.journal     N/A  mirror/gmbkp4p5
                                  mirror/gmbkp4p9=3D=3D root@dssbkp4 (pts/2=
) ->=20=20
 --> gmirror status
         Name    Status  Components
 mirror/gmbkp4  COMPLETE  da10 (ACTIVE)
                          da11 (ACTIVE)
 mirror/gmbkp4p10.journal     N/A  mirror/gmbkp4p6
                                   mirror/gmbkp4p10

I do not use gjournal on other disks because I am aware of the trouble
described in PR kern/164252. This configuration runs correct for many years,
all journal partitions have a size of 2 GB.

Now a have got a panic on a test machine with 48 GB RAM:

panic: Journal overflow (id =3D 1510160743 joffset=3D1950095360 active=3D86=
3204864
inactive=3D1948974592)
cpuid =3D 4
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0c4ea36=
410
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe0c4ea364c0
vpanic() at vpanic+0x126/frame 0xfffffe0c4ea36500
panic() at panic+0x43/frame 0xfffffe0c4ea36560
g_journal_flush() at g_journal_flush+0x8f9/frame 0xfffffe0c4ea36840
g_journal_flush_send() at g_journal_flush_send+0xd1/frame 0xfffffe0c4ea36880
g_journal_worker() at g_journal_worker+0x170a/frame 0xfffffe0c4ea36bb0
fork_exit() at fork_exit+0x9a/frame 0xfffffe0c4ea36bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0c4ea36bf0
--- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 ---
KDB: enter: panic
[ thread pid 27 tid 100182 ]
Stopped at      kdb_enter+0x3e: movq    $0,kdb_why
db:0:kdb.enter.panic> watchdog
No argument provided, disabling watchdog
db:0:kdb.enter.panic>  run ddbinfo
db:1:ddbinfo> capture on
db:1:on>  run lockinfo
db:2:lockinfo> show lock Giant
 class: sleep mutex
 name: Giant
 flags: {DEF, RECURSE}
 state: {UNOWNED}
db:2:Giant>  show lockedvnods
Locked vnodes

0xfffff80b7e1acb10: tag ufs, type VREG
    usecount 2, writecount 1, refcount 3781 mountedhere 0
    flags (VI_ACTIVE)
    v_object 0xfffff8085348d600 ref 0 pages 30224 cleanbuf 548 dirtybuf 3230
    lock type ufs: EXCL by thread 0xfffff800117c1960 (pid 7, g_journal
switcher, tid 100169)
        ino 18, on dev mirror/gmbkp4p8.journal
db:2:lockedvnods>  show lockchain
thread 100182 (pid 27, g_journal mirror/gm) running on CPU 4
db:2:lockchain>  show sleepchain
thread 100182 (pid 27, g_journal mirror/gm) running on CPU 4
db:1:sleepchain>  show pcpu
cpuid        =3D 4
dynamic pcpu =3D 0xfffffe0c9c5dfe00
curthread    =3D 0xfffff80011e014b0: pid 27 "g_journal mirror/gm"
curpcb       =3D 0xfffffe0c4ea36cc0
fpcurthread  =3D none
idlethread   =3D 0xfffff8000b343960: tid 100007 "idle: cpu4"
curpmap      =3D 0xffffffff80efa9d8
tssp         =3D 0xffffffff80f11ab0
commontssp   =3D 0xffffffff80f11ab0

>From kerneldump:

kgdb) bt
#0  doadump (textdump=3D-1319329552) at pcpu.h:219
#1  0xffffffff80340f05 in db_fncall (dummy1=3D<value optimized out>,
dummy2=3D<value optimized out>, dummy3=3D<value optimized out>, dummy4=3D<v=
alue
optimized out>)
    at /usr/src/sys/ddb/db_command.c:568
#2  0xffffffff80340bed in db_command (cmd_table=3D0x0) at
/usr/src/sys/ddb/db_command.c:440
#3  0xffffffff80344e7f in db_script_exec (scriptname=3D0xfffffe0c4ea360c0
"kdb.enter.panic", warnifnotfound=3D<value optimized out>) at
/usr/src/sys/ddb/db_script.c:302
#4  0xffffffff80344cc3 in db_script_kdbenter (eventname=3D0x0) at
/usr/src/sys/ddb/db_script.c:324
#5  0xffffffff803432eb in db_trap (type=3D<value optimized out>, code=3D0) =
at
/usr/src/sys/ddb/db_main.c:230
#6  0xffffffff806df959 in kdb_trap (type=3D3, code=3D0, tf=3D<value optimiz=
ed out>)
at /usr/src/sys/kern/subr_kdb.c:656
#7  0xffffffff809391d3 in trap (frame=3D0xfffffe0c4ea363f0) at
/usr/src/sys/amd64/amd64/trap.c:561
#8  0xffffffff8091eb6c in calltrap () at
/usr/src/sys/amd64/amd64/exception.S:238
#9  0xffffffff806df0be in kdb_enter (why=3D0xffffffff80a79f84 "panic", msg=
=3D<value
optimized out>) at cpufunc.h:63
#10 0xffffffff806a12e6 in vpanic (fmt=3D<value optimized out>, ap=3D<value
optimized out>) at /usr/src/sys/kern/kern_shutdown.c:882
#11 0xffffffff806a1193 in panic (fmt=3D0xffffffff80d24350 "P=C3=A8=C5=A0\20=
0=C3=BF=C3=BF=C3=BF=C3=BF\001") at
/usr/src/sys/kern/kern_shutdown.c:818
#12 0xffffffff8112bbc9 in g_journal_flush (sc=3D<value optimized out>) at
/usr/src/sys/modules/geom/geom_journal/../../../geom/journal/g_journal.c:344
#13 0xffffffff8112bca1 in g_journal_flush_send (sc=3D0xfffff80011dee400) at
/usr/src/sys/modules/geom/geom_journal/../../../geom/journal/g_journal.c:14=
43
#14 0xffffffff8112f8ea in g_journal_worker (arg=3D0xfffff80011dee400) at
/usr/src/sys/modules/geom/geom_journal/../../../geom/journal/g_journal.c:22=
11
#15 0xffffffff8066f99a in fork_exit (callout=3D0xffffffff8112e1e0
<g_journal_worker>, arg=3D0xfffff80011dee400, frame=3D0xfffffe0c4ea36c00) at
/usr/src/sys/kern/kern_fork.c:1030
#16 0xffffffff8091f0ae in fork_trampoline () at
/usr/src/sys/amd64/amd64/exception.S:613

(kgdb) f 13
#13 0xffffffff8112bca1 in g_journal_flush_send (sc=3D0xfffff80011dee400)
    at
/usr/src/sys/modules/geom/geom_journal/../../../geom/journal/g_journal.c:14=
43
1443                    g_journal_add_current(sc, bp);
(kgdb) p *sc
$21 =3D {sc_id =3D 1510160743, sc_type =3D 3 '\003', sc_orig_type =3D 1 '\0=
01', sc_geom
=3D 0xfffff80011de6b00,
  sc_flags =3D 64, sc_mtx =3D {lock_object =3D {lo_name =3D 0xffffffff81131=
9f4
"gjournal", lo_flags =3D 16973824,
      lo_data =3D 0, lo_witness =3D 0x0}, mtx_lock =3D 4}, sc_mediasize =3D=
 4294966784,
sc_sectorsize =3D 512,
  sc_bio_flush =3D 3, sc_journal_id =3D 114030489, sc_journal_next_id =3D 3=
023870498,
sc_journal_copying =3D 0,
  sc_journal_offset =3D 1950095360, sc_journal_previous_id =3D 2305361505,
sc_back_queue =3D {queue =3D {
      tqh_first =3D 0xfffff8094ce6e1f0, tqh_last =3D 0xfffff8065935cd10},
last_offset =3D 1945507328,
    insert_point =3D 0xfffff8065935cc98}, sc_regular_queue =3D {queue =3D {=
tqh_first
=3D 0xfffff808536333e0,
      tqh_last =3D 0xfffff80836392740}, last_offset =3D 478609408, insert_p=
oint =3D
0xfffff808363926c8},
  sc_delayed_queue =3D {queue =3D {tqh_first =3D 0xfffff805bf282d90, tqh_la=
st =3D
0xfffff805f2023360},
    last_offset =3D 475987968, insert_point =3D 0xfffff805f20232e8},
sc_delayed_count =3D 108,
  sc_current_queue =3D 0x0, sc_current_count =3D 0, sc_flush_queue =3D
0xfffff805f14fc3e0, sc_flush_count =3D 84,
  sc_flush_in_progress =3D 16, sc_copy_queue =3D 0x0, sc_copy_in_progress =
=3D 0,
  sc_dconsumer =3D 0xfffff80011dea880, sc_jconsumer =3D 0xfffff80011deb600,
sc_inactive =3D {jj_queue =3D 0x0,
    jj_offset =3D 1948974592}, sc_active =3D {jj_queue =3D 0xfffff80a4748ba=
a8,
jj_offset =3D 863204864},
  sc_jstart =3D 0, sc_jend =3D 2147483136, sc_callout =3D {c_links =3D {le =
=3D {le_next =3D
0x0,
        le_prev =3D 0xfffffe0000806768}, sle =3D {sle_next =3D 0x0}, tqe =
=3D {tqe_next
=3D 0x0,
        tqe_prev =3D 0xfffffe0000806768}}, c_time =3D 55532158374, c_precis=
ion =3D
1342177187,
    c_arg =3D 0xfffff80011dee400, c_func =3D 0xffffffff8112e150
<g_journal_timeout>, c_lock =3D 0x0,
    c_flags =3D 0, c_iflags =3D 16, c_cpu =3D 0}, sc_worker =3D 0xfffff8001=
1df7500,
sc_rootmount =3D 0x0}

(kgdb) p vm_kmem_size
$1 =3D 50189914112
(kgdb) p g_journal_cache_divisor
$2 =3D 2
(kgdb) p g_journal_cache_low
$3 =3D 3258108450
(kgdb) p g_journal_cache_limit
$4 =3D 3620120576
(kgdb) p g_journal_cache_switch
$5 =3D 90
(kgdb) p g_journal_force_switch
$6 =3D 70
(kgdb) p g_journal_cache_used
$7 =3D 1098408960
(kgdb) p g_journal_switch_time
$8 =3D 10
(kgdb) p g_journal_switcher_wokenup
$9 =3D 1
(kgdb) p g_journal_cache_alloc_failures
$10 =3D 0
(kgdb) p g_journal_cache_misses
$11 =3D 4
(kgdb) p g_journal_switcher_state
$12 =3D 0
(kgdb) p g_journal_sync_requested
$13 =3D 0
(kgdb) p g_journal_stats_journal_full
$14 =3D 2
(kgdb) p g_journal_switch_time
$15 =3D 10
(kgdb) p sc->sc_id
$22 =3D 1510160743
(kgdb) p sc->sc_journal_offset
$23 =3D 1950095360
(kgdb) p sc->sc_active.jj_offset
$24 =3D 863204864
(kgdb) p sc->sc_inactive.jj_offset
$25 =3D 1948974592
(kgdb) p  g_journal_stats_wait_for_copy
$28 =3D 0

(kgdb) thread 465
[Switching to thread 465 (Thread 100169)]#0  sched_switch
(td=3D0xfffff800117c1960, newtd=3D<value optimized out>, flags=3D<value opt=
imized
out>)
    at /usr/src/sys/kern/sched_ule.c:1945
1945                    cpuid =3D PCPU_GET(cpuid);
(kgdb) where
#0  sched_switch (td=3D0xfffff800117c1960, newtd=3D<value optimized out>,
flags=3D<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945
#1  0xffffffff806aabc1 in mi_switch (flags=3D260, newtd=3D0x0) at
/usr/src/sys/kern/kern_synch.c:491
#2  0xffffffff806ee73a in sleepq_wait (wchan=3D0x0, pri=3D0) at
/usr/src/sys/kern/subr_sleepqueue.c:624
#3  0xffffffff806aa5d7 in _sleep (ident=3D<value optimized out>, lock=3D<va=
lue
optimized out>, priority=3D<value optimized out>, wmesg=3D<value optimized =
out>,
    sbt=3D<value optimized out>, pr=3D<value optimized out>, flags=3D<value=
 optimized
out>) at /usr/src/sys/kern/kern_synch.c:255
#4  0xffffffff80910c58 in getpbuf (pfreecnt=3D<value optimized out>) at
/usr/src/sys/vm/vm_pager.c:336
#5  0xffffffff8073f453 in cluster_wbuild (vp=3D0xfffff80b7e1acb10, size=3D3=
2768,
start_lbn=3D<value optimized out>, len=3D<value optimized out>, gbflags=3D<=
value
optimized out>)
    at /usr/src/sys/kern/vfs_cluster.c:840
#6  0xffffffff808d8748 in ffs_syncvnode (vp=3D0xfffff80b7e1acb10, waitfor=
=3D1,
flags=3D<value optimized out>) at /usr/src/sys/ufs/ffs/ffs_vnops.c:298
#7  0xffffffff808d5f6d in ffs_sync (mp=3D0xfffff8002ba4b000, waitfor=3DCann=
ot
access memory at address 0x1
) at /usr/src/sys/ufs/ffs/ffs_vfsops.c:1591
#8  0xffffffff807618fc in vfs_write_suspend (mp=3D0xfffff8002ba4b000,
flags=3D<value optimized out>) at /usr/src/sys/kern/vfs_vnops.c:1877
#9  0xffffffff8112a3e4 in g_journal_switcher (arg=3D0xffffffff81132f28) at
/usr/src/sys/modules/geom/geom_journal/../../../geom/journal/g_journal.c:29=
64
#10 0xffffffff8066f99a in fork_exit (callout=3D0xffffffff81129a40
<g_journal_switcher>, arg=3D0xffffffff81132f28, frame=3D0xfffffe0c4e49cc00)=
 at
/usr/src/sys/kern/kern_fork.c:1030
#11 0xffffffff8091f0ae in fork_trampoline () at
/usr/src/sys/amd64/amd64/exception.S:613
#12 0x0000000000000000 in ?? ()
(kgdb) f 4
#4  0xffffffff80910c58 in getpbuf (pfreecnt=3D<value optimized out>) at
/usr/src/sys/vm/vm_pager.c:336
336                                     msleep(pfreecnt, &pbuf_mtx, PVM,
"wswbuf0", 0);
(kgdb) list
331             mtx_lock(&pbuf_mtx);
332
333             for (;;) {
334                     if (pfreecnt) {
335                             while (*pfreecnt =3D=3D 0) {
336                                     msleep(pfreecnt, &pbuf_mtx, PVM,
"wswbuf0", 0);
337                             }
338                     }

The panic was caused by a vi-Problem: I tried to insert some bytes per
cut/paste into an opened file and vi started (I do not know why) to loop and
wrote my copystring over and over again in his address space until vi dumpe=
d.
The kernel wrote the core dump of vi to /var/core and doing this issued the
gjournal panic.

(kgdb) thread 541 (vi)
[Switching to thread 541 (Thread 101086)]#0  sched_switch
(td=3D0xfffff805564a7960, newtd=3D<value optimized out>, flags=3D<value opt=
imized
out>)
    at /usr/src/sys/kern/sched_ule.c:1945
1945                    cpuid =3D PCPU_GET(cpuid);
(kgdb) bt
#0  sched_switch (td=3D0xfffff805564a7960, newtd=3D<value optimized out>,
flags=3D<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945
#1  0xffffffff806aabc1 in mi_switch (flags=3D260, newtd=3D0x0) at
/usr/src/sys/kern/kern_synch.c:491
#2  0xffffffff806ee73a in sleepq_wait (wchan=3D0x0, pri=3D0) at
/usr/src/sys/kern/subr_sleepqueue.c:624
#3  0xffffffff806aa5d7 in _sleep (ident=3D<value optimized out>, lock=3D<va=
lue
optimized out>, priority=3D<value optimized out>, wmesg=3D<value optimized =
out>,
    sbt=3D<value optimized out>, pr=3D<value optimized out>, flags=3D<value=
 optimized
out>) at /usr/src/sys/kern/kern_synch.c:255
#4  0xffffffff80761487 in vn_start_write_locked (mp=3D0xfffff8002ba4b000,
flags=3D257) at /usr/src/sys/kern/vfs_vnops.c:1673
#5  0xffffffff8075fc23 in vn_start_write (vp=3D<value optimized out>, mpp=
=3D<value
optimized out>, flags=3D<value optimized out>) at
/usr/src/sys/kern/vfs_vnops.c:1725
#6  0xffffffff807605db in vn_rdwr (rw=3DUIO_WRITE, vp=3D0xfffff80b7e1acb10,
base=3D<value optimized out>, len=3D<value optimized out>, offset=3D<value =
optimized
out>,
    segflg=3D<value optimized out>, ioflg=3D<value optimized out>,
active_cred=3D0xfffff8055623b280, file_cred=3D0x1, aresid=3D<value optimize=
d out>,
td=3D<value optimized out>)
    at /usr/src/sys/kern/vfs_vnops.c:530
#7  0xffffffff80760d47 in vn_rdwr_inchunks (rw=3DUIO_WRITE,
vp=3D0xfffff80b7e1acb10, base=3D<value optimized out>, len=3D<value optimiz=
ed out>,
offset=3D<value optimized out>,
    segflg=3DUIO_USERSPACE, aresid=3D<value optimized out>) at
/usr/src/sys/kern/vfs_vnops.c:630
#8  0xffffffff80645e14 in elf64_coredump (td=3D<value optimized out>,
vp=3D0xfffff80b7e1acb10, limit=3D<value optimized out>, flags=3D<value opti=
mized
out>)
    at /usr/src/sys/kern/imgact_elf.c:1179
#9  0xffffffff806a4a6c in sigexit (td=3D0xfffff805564a7960, sig=3D11) at
/usr/src/sys/kern/kern_sig.c:3366
#10 0xffffffff806a50c6 in postsig (sig=3D<value optimized out>) at
/usr/src/sys/kern/kern_sig.c:2911
#11 0xffffffff806f40f7 in ast (framep=3D<value optimized out>) at
/usr/src/sys/kern/subr_trap.c:316
#12 0xffffffff8091fe89 in doreti_ast () at
/usr/src/sys/amd64/amd64/exception.S:685
#13 0x0000000801853c00 in ?? ()
#14 0x0000000801853c80 in ?? ()
#15 0x0000062e00000008 in ?? ()
#16 0x0000000000000604 in ?? ()
#17 0x0000000000000002 in ?? ()
#18 0x0000000000000001 in ?? ()
#19 0x0000000801815400 in ?? ()
#20 0x0000000000000008 in ?? ()
#21 0x00007fffffffe190 in ?? ()
#22 0x0000000000000000 in ?? ()

(kgdb) f 4
#4  0xffffffff80761487 in vn_start_write_locked (mp=3D0xfffff8002ba4b000,
flags=3D257) at /usr/src/sys/kern/vfs_vnops.c:1673
1673                            error =3D msleep(&mp->mnt_flag, MNT_MTX(mp),
mflags,
(kgdb) list
1668                    while ((mp->mnt_kern_flag & MNTK_SUSPEND) !=3D 0) {
1669                            if (flags & V_NOWAIT) {
1670                                    error =3D EWOULDBLOCK;
1671                                    goto unlock;
1672                            }
1673                            error =3D msleep(&mp->mnt_flag, MNT_MTX(mp),
mflags,
1674                                "suspfs", 0);
1675                            if (error)
1676                                    goto unlock;
1677                    }

After reboot I saw (inode 18 is the inode the gjournal switcher had locked):
=3D=3D root@dssbkp4 (pts/1) -> ls -li /var/core
total 1704864
18 -rw-------  1 root  wheel  1745289216 29 M=C3=A4r 22:34 vi-70628.core

=3D=3D=3D root@dssbkp4 (pts/1) -> ls -li /var/tmp/vi.recover/
total 395232
80384 -rw-------  1 root  wheel  404533248 29 M=C3=A4r 22:34 vi.CqtHml

This server has faster disks than my others, maybe the journal space is too
low.
But the journal_switcher sits on wswbuf0 (but why ?) and every journal space
will be filled bringing the overflow.

Looking at the gjournal variables, we have (/sys/geom/journal/g_journal.c)
  static u_int g_journal_cache_limit =3D 64 * 1024 * 1024;
  static u_int g_journal_cache_divisor =3D 2;
  static u_int g_journal_cache_switch =3D 90;
  static u_int g_journal_cache_low =3D 0;
and in g_journal_init()=20=20
  if (g_journal_cache_limit > 0) {
           g_journal_cache_limit =3D vm_kmem_size / g_journal_cache_divisor;
           g_journal_cache_low =3D
               (g_journal_cache_limit / 100) * g_journal_cache_switch;
  }
But vm_kmem_size is defined u_long. Therefore we have
  g_journal_cache_limit =3D 3.620.120.576,
but we should have
  g_journal_cache_limit =3D 50.189.914.112 / 2.

This looks to me like a programming error in g_journal.c, relevant for serv=
ers
with more RAM than 4 GB.

Can this be respinsible for the panic ?

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-218337-8>