From owner-freebsd-amd64@FreeBSD.ORG Sat Nov 26 23:22:47 2005 Return-Path: X-Original-To: amd64@freeBSD.org Delivered-To: freebsd-amd64@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1C1C816A41F for ; Sat, 26 Nov 2005 23:22:47 +0000 (GMT) (envelope-from kris@obsecurity.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9813443D5A for ; Sat, 26 Nov 2005 23:22:46 +0000 (GMT) (envelope-from kris@obsecurity.org) Received: from obsecurity.dyndns.org (elvis.mu.org [192.203.228.196]) by elvis.mu.org (Postfix) with ESMTP id 80CAA1A3C2C for ; Sat, 26 Nov 2005 15:22:46 -0800 (PST) Received: by obsecurity.dyndns.org (Postfix, from userid 1000) id 5298A51494; Sat, 26 Nov 2005 18:22:45 -0500 (EST) Date: Sat, 26 Nov 2005 18:22:45 -0500 From: Kris Kennaway To: Kris Kennaway Message-ID: <20051126232244.GA83432@xor.obsecurity.org> References: <20051124232616.GA32023@xor.obsecurity.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="BOKacYhQ+x31HxR3" Content-Disposition: inline In-Reply-To: <20051124232616.GA32023@xor.obsecurity.org> User-Agent: Mutt/1.4.2.1i Cc: amd64@freeBSD.org Subject: Re: spin lock smp rendezvous held by 0xffffff01250a7980 for > 5 seconds X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 Nov 2005 23:22:47 -0000 --BOKacYhQ+x31HxR3 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Nov 24, 2005 at 06:26:16PM -0500, Kris Kennaway wrote: > I got this on a quad amd64 machine running 6.0-STABLE. At the time it > was running 21 simultaneous tar extractions onto a sync-mounted md. >=20 > panic() at panic+0x1e6 > _mtx_lock_spin() at _mtx_lock_spin+0xad > pmap_invalidate_range() at pmap_invalidate_range+0xb3 > pmap_qremove() at pmap_qremove+0x53 > vfs_vmio_release() at vfs_vmio_release+0x1e0 > getnewbuf() at getnewbuf+0x368 > getblk() at getblk+0x3d9 > ffs_balloc_ufs1() at ffs_balloc_ufs1+0x662 > ffs_write() at ffs_write+0x31b > VOP_WRITE_APV() at VOP_WRITE_APV+0xed > vn_write() at vn_write+0x228 > dofilewrite() at dofilewrite+0x90 > kern_writev() at kern_writev+0x54 > write() at write+0x4b >=20 > Unfortunately I can't dump on this machine (and no debugging is > currently enabled), but I can try to reproduce it. I tried for 24 hours with witness enabled but couldn't reproduce. The same panic happened in the same way when witness was disabled, although the= failure mode was a bit different: Fatal double fault cpuid =3D 3; apic id =3D 03 panic: double fault cpuid =3D 3 KDB: enter: panic [...] mtx_lock_spin() at _mtx_lock_spin+0x6b getit() at getit+0x6f DELAY() at DELAY+0x44 _mtx_lock_spin() at _mtx_lock_spin+0x6b getit() at getit+0x6f DELAY() at DELAY+0x44 _mtx_lock_spin() at _mtx_lock_spin+0x6b getit() at getit+0x6f DELAY() at DELAY+0x44 _mtx_lock_spin() at _mtx_lock_spin+0x6b getit() at getit+0x6f DELAY() at DELAY+0x44 _mtx_lock_spin() at _mtx_lock_spin+0x6b pmap_invalidate_range() at pmap_invalidate_range+0xb3 pmap_qremove() at pmap_qremove+0x53 vfs_vmio_release() at vfs_vmio_release+0x1e0 getnewbuf() at getnewbuf+0x368 getblk() at getblk+0x3d9 ffs_balloc_ufs1() at ffs_balloc_ufs1+0x662 ffs_write() at ffs_write+0x31b VOP_WRITE_APV() at VOP_WRITE_APV+0xed vn_write() at vn_write+0x228 dofilewrite() at dofilewrite+0x90 kern_writev() at kern_writev+0x54 write() at write+0x4b syscall() at syscall+0x404 Xfast_syscall() at Xfast_syscall+0xa8 --- syscall (4, FreeBSD ELF64, write), rip =3D 0x80070ea6c, rsp =3D 0x7ffff= fffe6a8, rbp =3D 0x52ae00 --- i.e. the first _mtx_lock_spin() tried to acquire the ipi lock and spun, which called DELAY and getit, which tried to acquire the clock lock: mtx_lock_spin(&clock_lock); which *also* spun, and called DELAY...and at that point things went to hell and it recursed until it blew out the stack. I guess the next step is to try INVARIANTS alone in case that catches something. Kris --BOKacYhQ+x31HxR3 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (FreeBSD) iD8DBQFDiO43Wry0BWjoQKURAhsxAJ9KDUyMD0x3Ce/jtB2QDry+kxfyrQCg4inc pO713nUMAEgFuuRg88J+0eI= =cJAh -----END PGP SIGNATURE----- --BOKacYhQ+x31HxR3--