Date: Sat, 26 Nov 2005 22:20:12 -0500 From: Kris Kennaway <kris@obsecurity.org> To: Kris Kennaway <kris@obsecurity.org> Cc: amd64@freeBSD.org Subject: smp_tlb_shootdown loop (Re: spin lock smp rendezvous held by 0xffffff01250a7980 for > 5 seconds) Message-ID: <20051127032012.GA86016@xor.obsecurity.org> In-Reply-To: <20051126232244.GA83432@xor.obsecurity.org> References: <20051124232616.GA32023@xor.obsecurity.org> <20051126232244.GA83432@xor.obsecurity.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--opJtzjQTFsWo+cga Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Nov 26, 2005 at 06:22:45PM -0500, Kris Kennaway wrote: > On Thu, Nov 24, 2005 at 06:26:16PM -0500, Kris Kennaway wrote: > > I got this on a quad amd64 machine running 6.0-STABLE. At the time it > > was running 21 simultaneous tar extractions onto a sync-mounted md. > >=20 > > panic() at panic+0x1e6 > > _mtx_lock_spin() at _mtx_lock_spin+0xad > > pmap_invalidate_range() at pmap_invalidate_range+0xb3 > > pmap_qremove() at pmap_qremove+0x53 > > vfs_vmio_release() at vfs_vmio_release+0x1e0 > > getnewbuf() at getnewbuf+0x368 > > getblk() at getblk+0x3d9 > > ffs_balloc_ufs1() at ffs_balloc_ufs1+0x662 > > ffs_write() at ffs_write+0x31b > > VOP_WRITE_APV() at VOP_WRITE_APV+0xed > > vn_write() at vn_write+0x228 > > dofilewrite() at dofilewrite+0x90 > > kern_writev() at kern_writev+0x54 > > write() at write+0x4b Another CPU is here: smp_tlb_shootdown() at smp_tlb_shootdown+0x40 smp_invlpg_range() at smp_invlpg_range+0x1e pmap_invalidate_range() at pmap_invalidate_range+0xf9 pmap_qenter() at pmap_qenter+0x64 allocbuf() at allocbuf+0x9a0 getblk() at getblk+0x52d ffs_balloc_ufs1() at ffs_balloc_ufs1+0x662 ffs_write() at ffs_write+0x31b VOP_WRITE_APV() at VOP_WRITE_APV+0xed vn_write() at vn_write+0x228 dofilewrite() at dofilewrite+0x90 kern_writev() at kern_writev+0x54 write() at write+0x4b syscall() at syscall+0x404 Xfast_syscall() at Xfast_syscall+0xa8 --- syscall (4, FreeBSD ELF64, write), rip =3D 0x80070ea6c, rsp =3D 0x7ffff= fffe6a8, rbp =3D 0x52a800 --- - It is looping: smp_tlb_shootdown+0x40: repe nop smp_tlb_shootdown+0x42: movl 0x21c4f8,%eax smp_tlb_shootdown+0x48: cmpl %ebx,%eax smp_tlb_shootdown+0x4a: jb smp_tlb_shootdown+0x40 smp_tlb_shootdown(u_int vector, vm_offset_t addr1, vm_offset_t addr2) { u_int ncpu; ncpu =3D mp_ncpus - 1; /* does not shootdown self */ if (ncpu < 1) return; /* no other cpus */ mtx_assert(&smp_ipi_mtx, MA_OWNED); smp_tlb_addr1 =3D addr1; smp_tlb_addr2 =3D addr2; atomic_store_rel_int(&smp_tlb_wait, 0); ipi_all_but_self(vector); while (smp_tlb_wait < ncpu) ia32_pause(); } which seems to be the while loop at the end. db> x/x smp_tlb_wait smp_tlb_wait: 1 db> x mp_ncpus mp_ncpus: 4 So it looks like it's stuck waiting for the tlb shootdown on the other processors. However, the 3 other CPUs are all in the same place: > _mtx_lock_spin() at _mtx_lock_spin+0x6b > getit() at getit+0x6f > DELAY() at DELAY+0x44 > _mtx_lock_spin() at _mtx_lock_spin+0x6b > pmap_invalidate_range() at pmap_invalidate_range+0xb3 > pmap_qremove() at pmap_qremove+0x53 > vfs_vmio_release() at vfs_vmio_release+0x1e0 > getnewbuf() at getnewbuf+0x368 > getblk() at getblk+0x3d9 > ffs_balloc_ufs1() at ffs_balloc_ufs1+0x662 > ffs_write() at ffs_write+0x31b > VOP_WRITE_APV() at VOP_WRITE_APV+0xed > vn_write() at vn_write+0x228 > dofilewrite() at dofilewrite+0x90 > kern_writev() at kern_writev+0x54 > write() at write+0x4b > syscall() at syscall+0x404 > Xfast_syscall() at Xfast_syscall+0xa8 > --- syscall (4, FreeBSD ELF64, write), rip =3D 0x80070ea6c, rsp =3D 0x7ff= fffffe6a8, rbp =3D 0x52ae00 --- >=20 > i.e. the first _mtx_lock_spin() tried to acquire the ipi lock and > spun, which called DELAY and getit, which tried to acquire the clock > lock: >=20 > mtx_lock_spin(&clock_lock); >=20 > which *also* spun, and called DELAY...and at that point things went to > hell and it recursed until it blew out the stack. So why aren't they processing the IPI? Was the IPI lost somehow? Kris --opJtzjQTFsWo+cga Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (FreeBSD) iD8DBQFDiSXrWry0BWjoQKURApEbAJ9GfAv+JVE6KdBEigKU/Dh9WGbAoACgkcks vsIgmV7M7nBQC8H6QFDtgYg= =0l2N -----END PGP SIGNATURE----- --opJtzjQTFsWo+cga--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20051127032012.GA86016>