Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 22 Jul 2009 19:17:41 +0200
From:      Peter Schuller <peter.schuller@infidyne.com>
To:        freebsd-current@freebsd.org
Subject:   vm_page_remove() crash on sys_exit() (possibly ZFS related)
Message-ID:  <20090722171741.GB17684@hyperion.scode.org>

next in thread | raw e-mail | index | archive | help

--s2ZSL+KKDSLx8OML
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hello,

so I finally got my crash dump. I'll include some more history further
down. First off:

   http://distfiles.scode.org/mlref/crashdump_20090722/core.txt.0
   http://distfiles.scode.org/mlref/crashdump_20090722/backtrace.txt

Inline version of backtrace appears below[1] (after background).

So this is a general protection fault in vm_page_remove called
indirectly from sys_exit(). Worth nothing is that at least once (the
previous crash, without a dump) I got a "logic" panic rather than a
memory error; I'm pretty sure the panic message was related to page
*inserts*. Grepping the source indicates:

  vm_page.c:              panic("vm_page_insert: page already inserted");
  vm_page.c:                      panic("vm_page_insert: offset already all=
ocated");

However I could not say for sure whether one of these was indeed the
exact panic I got and I neither have a crash nor was able to see a
track trace at the time.

Some further background and speculation:

This system is root-on-ZFS where I have been tracking CURRENT for
several months. I updated every month or so in part to test
improvements to ZFS; specifically the fixes that have gone in for
deadlock/hang issues.

My "test case" is to run bulk building of all my ports (the port list
is a semi-typical desktop; about 700 or so packages in total). It
would very often hang (before) or crash (now) at least once during
such a build; the building of firefox was in particular extremely
over-represented, at least now that I see the crash symptome.

Going back to my tracking of current, at some point, I think roughly a
couple of months ago by now, I stopped experiencing deadlocks/hangs
(or at least have not seen it yet), but instead began seeing
panic:s. No longer seeing hangs was expected because the reason I
updated that particular time, if I recall correctly, was specifically
that I believed that all the work-in-progress ZFS fixes had gone
in. However I am not 100% sure of the timing.

Since then I've updated a couple of times more, most recently to
BETA1, but am still seeing this crash.

Wannabe speculation based on insufficient understanding of the VM
system:

vm_page_remove() requires, according to comments, that the object and
page must be locked. The actual crash in this case happens when
checking m->oflags:

        if (m->oflags & VPO_BUSY) {
                m->oflags &=3D ~VPO_BUSY;
                vm_page_flash(m);
        }

The "m->oflags & VPO_BUSY" evaluation is the culprit, if line numbers
can be trusted.

If I recall correctly, at least one of the deadlock/hang fixes for ZFS
did involve a change to locking, so I'm thinking the introduction of
the crashing may in fact be related to the ZFS fix itself. However now
that I think about it perhaps the only locking changes were vnode ones
rather than vm objects/pages? Also interestingly reading m->object
right before suceeds, and the lock assert on the object does too.

Is it possible the vm page was NOT locked even though m->object was
locked?

[1] Inline backtrace:

#0  doadump () at pcpu.h:223
#1  0xffffffff801d248c in db_fncall (dummy1=3DVariable "dummy1" is not avai=
lable.
) at /usr/src/sys/ddb/db_command.c:548
#2  0xffffffff801d27c1 in db_command (last_cmdp=3D0xffffffff80b667a0, cmd_t=
able=3DVariable "cmd_table" is not available.
) at /usr/src/sys/ddb/db_command.c:445
#3  0xffffffff801d2a10 in db_command_loop () at /usr/src/sys/ddb/db_command=
=2Ec:498
#4  0xffffffff801d49a9 in db_trap (type=3DVariable "type" is not available.
) at /usr/src/sys/ddb/db_main.c:229
#5  0xffffffff805b5f25 in kdb_trap (type=3D9, code=3D0, tf=3D0xffffff805b96=
08d0) at /usr/src/sys/kern/subr_kdb.c:534
#6  0xffffffff80812efd in trap_fatal (frame=3D0xffffff805b9608d0, eva=3DVar=
iable "eva" is not available.
) at /usr/src/sys/amd64/amd64/trap.c:847
#7  0xffffffff80813a1d in trap (frame=3D0xffffff805b9608d0) at /usr/src/sys=
/amd64/amd64/trap.c:639
#8  0xffffffff807f9793 in calltrap () at /usr/src/sys/amd64/amd64/exception=
=2ES:223
#9  0xffffffff807d941f in vm_page_remove (m=3D0xffffff00bebe7f90) at /usr/s=
rc/sys/vm/vm_page.c:730
#10 0xffffffff807d957d in vm_page_free_toq (m=3D0xffffff00bebe7f90) at /usr=
/src/sys/vm/vm_page.c:1394
#11 0xffffffff807d7c6b in vm_object_terminate (object=3D0xffffff0066392948)=
 at /usr/src/sys/vm/vm_object.c:694
#12 0xffffffff807d821c in vm_object_deallocate (object=3D0xffffff0066392948=
) at /usr/src/sys/vm/vm_object.c:592
#13 0xffffffff807cfad0 in _vm_map_unlock (map=3D0xffffff0004811310, file=3D=
Variable "file" is not available.
) at /usr/src/sys/vm/vm_map.c:480
#14 0xffffffff807cff8f in vm_map_remove (map=3D0xffffff0004811310, start=3D=
Variable "start" is not available.
) at /usr/src/sys/vm/vm_map.c:2765
#15 0xffffffff807d2e44 in vmspace_exit (td=3D0xffffff004eb78ab0) at /usr/sr=
c/sys/vm/vm_map.c:329
#16 0xffffffff8055a33e in exit1 (td=3D0xffffff004eb78ab0, rv=3D0) at /usr/s=
rc/sys/kern/kern_exit.c:299
#17 0xffffffff8055b43e in sys_exit (td=3DVariable "td" is not available.
) at /usr/src/sys/kern/kern_exit.c:110
#18 0xffffffff80813546 in syscall (frame=3D0xffffff805b960c90) at /usr/src/=
sys/amd64/amd64/trap.c:984
#19 0xffffffff807f9a20 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exce=
ption.S:364
#20 0x000000000047f63c in ?? ()
Previous frame inner to this frame (corrupt stack?)



--=20
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey@scode.org
E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org


--s2ZSL+KKDSLx8OML
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.12 (FreeBSD)

iEYEARECAAYFAkpnSbMACgkQDNor2+l1i31stwCcDVn4u/Do7JwnSwG9AUO+k3AQ
xXIAnimLX6qk7uDVtQrl/dlzX83y20nN
=dU7K
-----END PGP SIGNATURE-----

--s2ZSL+KKDSLx8OML--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090722171741.GB17684>