From owner-freebsd-geom@freebsd.org  Thu Apr  6 16:33:51 2017
Return-Path: <owner-freebsd-geom@freebsd.org>
Delivered-To: freebsd-geom@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0DD5AD304C7
 for <freebsd-geom@mailman.ysv.freebsd.org>;
 Thu,  6 Apr 2017 16:33:51 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id F19788C4
 for <freebsd-geom@FreeBSD.org>; Thu,  6 Apr 2017 16:33:50 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v36GXo4l013644
 for <freebsd-geom@FreeBSD.org>; Thu, 6 Apr 2017 16:33:50 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-geom@FreeBSD.org
Subject: [Bug 218337] panic: Journal overflow with g_journal_switcher waiting
 on wswbuf0
Date: Thu, 06 Apr 2017 16:33:51 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 10.3-STABLE
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Some People
X-Bugzilla-Who: longwitz@incore.de
X-Bugzilla-Status: New
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-geom@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: cc
Message-ID: <bug-218337-14739-mo5jI0EHNI@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-218337-14739@https.bugs.freebsd.org/bugzilla/>
References: <bug-218337-14739@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: GEOM-specific discussions and implementations
 <freebsd-geom.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-geom>,
 <mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom/>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
 <mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Apr 2017 16:33:51 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D218337

longwitz@incore.de changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |longwitz@incore.de

--- Comment #2 from longwitz@incore.de ---
Thanks for the extensive example, the information of your kerneldump looks
nearly identical to mine.=20

I think the problem with u_int/u_long variables in g_journal.c is not the
reason for the panic, there is already the PR kern/198500.

The panic has to do with the extra physical memory described in man pbuf(9).
I see the following user of pbufs (I don't have nfs and fuse):

nbuf =3D 105931
nswbuf =3D min(nbuf / 4, 256) =3D 256;

User of pbufs:                 boottime       kerneldump
   md_vnode_pbuf_freecnt          25              25
   smbfs_pbuf_freecnt            129             129
   ncl_pbuf_freecnt              129             129
   cluster_pbuf_freecnt          128               0
   vnode_pbuf_freecnt            129             129
   nsw_rcount                    128             128
   nsw_wcount_sync                64              64

The g_journal_switcher (pid 7) is waiting on channel "wswbuf0"
because all the cluster_pbufs are in use. Looking at the 256 swbufs I found,
that 128 are free and 128 are in use by the g_journal_switcher itself. All
the buffers have the same b_bufobj and the same b_iodone address
"cluster_callback". One example:

(kgdb) p swbuf[126]
$352 =3D {b_bufobj =3D 0xfffff80b7e1acbe0, b_bcount =3D 131072, b_caller1 =
=3D 0x0,
b_data =3D 0xfffffe0c1c2ec000 "", b_error =3D 0, b_iocmd =3D 2 '\002', b_io=
flags =3D 0
'\0',
  b_iooffset =3D 472973312, b_resid =3D 0, b_iodone =3D 0xffffffff8073fcb0
<cluster_callback>, b_blkno =3D 923776, b_offset =3D 2841903104, b_bobufs =
=3D
{tqe_next =3D 0x0,
    tqe_prev =3D 0x0}, b_vflags =3D 0, b_freelist =3D {tqe_next =3D 0xfffff=
e0bafe182b8,
tqe_prev =3D 0xffffffff80eaa460}, b_qindex =3D 0, b_flags =3D 1677721636,
  b_xflags =3D 0 '\0', b_lock =3D {lock_object =3D {lo_name =3D 0xffffffff8=
0a829a7
"bufwait", lo_flags =3D 108199936, lo_data =3D 0, lo_witness =3D 0x0},
    lk_lock =3D 18446744073709551600, lk_exslpfail =3D 0, lk_timo =3D 0, lk=
_pri =3D
96}, b_bufsize =3D 131072, b_runningbufspace =3D 131072,
  b_kvabase =3D 0xfffffe0c1c2ec000 "", b_kvaalloc =3D 0x0, b_kvasize =3D 13=
1072,
b_lblkno =3D 86728, b_vp =3D 0xfffff80b7e1acb10, b_dirtyoff =3D 0, b_dirtye=
nd =3D
131072,
  b_rcred =3D 0x0, b_wcred =3D 0x0, b_saveaddr =3D 0xfffffe0c1c2ec000, b_pa=
ger =3D
{pg_reqpage =3D 0}, b_cluster =3D {cluster_head =3D {tqh_first =3D 0xfffffe=
0bb1bec410,
      tqh_last =3D 0xfffffe0bb1bebe20}, cluster_entry =3D {tqe_next =3D
0xfffffe0bb1bec410, tqe_prev =3D 0xfffffe0bb1bebe20}}, b_pages =3D
{0xfffff80c097f97c0,
    0xfffff80c097f9828, 0xfffff80c097f9890, 0xfffff80c097f98f8,
0xfffff80c097f9960, 0xfffff80c097f99c8, 0xfffff80c097f9a30, 0xfffff80c097f9=
a98,
    0xfffff80c097f9b00, 0xfffff80c097f9b68, 0xfffff80c097f9bd0,
0xfffff80c097f9c38, 0xfffff80c097f9ca0, 0xfffff80c097f9d08, 0xfffff80c097f9=
d70,
    0xfffff80c097f9dd8, 0xfffff80c097f9e40, 0xfffff80c097f9ea8,
0xfffff80c097f9f10, 0xfffff80c097f9f78, 0xfffff80c097f9fe0, 0xfffff80c097fa=
048,
    0xfffff80c097fa0b0, 0xfffff80c097fa118, 0xfffff80c11547980,
0xfffff80c115479e8, 0xfffff80c11547a50, 0xfffff80c11547ab8, 0xfffff80c11547=
b20,
    0xfffff80c11547b88, 0xfffff80c11547bf0, 0xfffff80c11547c58}, b_npages =
=3D 32,
b_dep =3D {lh_first =3D 0x0}, b_fsprivate1 =3D 0x0, b_fsprivate2 =3D 0x0,
  b_fsprivate3 =3D 0x0, b_pin_count =3D 0}

Therefore the g_journal_switcher has all his cluster pbufs in use and waits
forever for another one. So the worker thread must panic with overflow.

In cluster_wbuild() I can't see a check for "cluster_pbuf_freecnt > 0" to a=
void
the hang on "wswbuf0". I wonder why this seems only a problem with gjournal,
other components in the kernel also use VFS_SYNC.

I would like to know if this problem can be fixed.

--=20
You are receiving this mail because:
You are the assignee for the bug.=