Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 26 Nov 2005 18:22:45 -0500
From:      Kris Kennaway <kris@obsecurity.org>
To:        Kris Kennaway <kris@obsecurity.org>
Cc:        amd64@freeBSD.org
Subject:   Re: spin lock smp rendezvous held by 0xffffff01250a7980 for > 5 seconds
Message-ID:  <20051126232244.GA83432@xor.obsecurity.org>
In-Reply-To: <20051124232616.GA32023@xor.obsecurity.org>
References:  <20051124232616.GA32023@xor.obsecurity.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--BOKacYhQ+x31HxR3
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Nov 24, 2005 at 06:26:16PM -0500, Kris Kennaway wrote:
> I got this on a quad amd64 machine running 6.0-STABLE.  At the time it
> was running 21 simultaneous tar extractions onto a sync-mounted md.
>=20
> panic() at panic+0x1e6
> _mtx_lock_spin() at _mtx_lock_spin+0xad
> pmap_invalidate_range() at pmap_invalidate_range+0xb3
> pmap_qremove() at pmap_qremove+0x53
> vfs_vmio_release() at vfs_vmio_release+0x1e0
> getnewbuf() at getnewbuf+0x368
> getblk() at getblk+0x3d9
> ffs_balloc_ufs1() at ffs_balloc_ufs1+0x662
> ffs_write() at ffs_write+0x31b
> VOP_WRITE_APV() at VOP_WRITE_APV+0xed
> vn_write() at vn_write+0x228
> dofilewrite() at dofilewrite+0x90
> kern_writev() at kern_writev+0x54
> write() at write+0x4b
>=20
> Unfortunately I can't dump on this machine (and no debugging is
> currently enabled), but I can try to reproduce it.

I tried for 24 hours with witness enabled but couldn't reproduce.  The
same panic happened in the same way when witness was disabled, although the=
 failure mode was a bit different:


Fatal double fault
cpuid =3D 3; apic id =3D 03
panic: double fault
cpuid =3D 3
KDB: enter: panic
[...]
mtx_lock_spin() at _mtx_lock_spin+0x6b
getit() at getit+0x6f
DELAY() at DELAY+0x44
_mtx_lock_spin() at _mtx_lock_spin+0x6b
getit() at getit+0x6f
DELAY() at DELAY+0x44
_mtx_lock_spin() at _mtx_lock_spin+0x6b
getit() at getit+0x6f
DELAY() at DELAY+0x44
_mtx_lock_spin() at _mtx_lock_spin+0x6b
getit() at getit+0x6f
DELAY() at DELAY+0x44
_mtx_lock_spin() at _mtx_lock_spin+0x6b
pmap_invalidate_range() at pmap_invalidate_range+0xb3
pmap_qremove() at pmap_qremove+0x53
vfs_vmio_release() at vfs_vmio_release+0x1e0
getnewbuf() at getnewbuf+0x368
getblk() at getblk+0x3d9
ffs_balloc_ufs1() at ffs_balloc_ufs1+0x662
ffs_write() at ffs_write+0x31b
VOP_WRITE_APV() at VOP_WRITE_APV+0xed
vn_write() at vn_write+0x228
dofilewrite() at dofilewrite+0x90
kern_writev() at kern_writev+0x54
write() at write+0x4b
syscall() at syscall+0x404
Xfast_syscall() at Xfast_syscall+0xa8
--- syscall (4, FreeBSD ELF64, write), rip =3D 0x80070ea6c, rsp =3D 0x7ffff=
fffe6a8, rbp =3D 0x52ae00 ---

i.e. the first _mtx_lock_spin() tried to acquire the ipi lock and
spun, which called DELAY and getit, which tried to acquire the clock
lock:

        mtx_lock_spin(&clock_lock);

which *also* spun, and called DELAY...and at that point things went to
hell and it recursed until it blew out the stack.

I guess the next step is to try INVARIANTS alone in case that catches
something.

Kris

--BOKacYhQ+x31HxR3
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (FreeBSD)

iD8DBQFDiO43Wry0BWjoQKURAhsxAJ9KDUyMD0x3Ce/jtB2QDry+kxfyrQCg4inc
pO713nUMAEgFuuRg88J+0eI=
=cJAh
-----END PGP SIGNATURE-----

--BOKacYhQ+x31HxR3--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20051126232244.GA83432>