From owner-freebsd-sparc64@FreeBSD.ORG Thu Feb 23 23:07:36 2006 Return-Path: X-Original-To: sparc64@FreeBSD.org Delivered-To: freebsd-sparc64@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 19AB316A420; Thu, 23 Feb 2006 23:07:36 +0000 (GMT) (envelope-from kris@obsecurity.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id C688143D49; Thu, 23 Feb 2006 23:07:35 +0000 (GMT) (envelope-from kris@obsecurity.org) Received: from obsecurity.dyndns.org (elvis.mu.org [192.203.228.196]) by elvis.mu.org (Postfix) with ESMTP id 694B01A3C1F; Thu, 23 Feb 2006 15:07:35 -0800 (PST) Received: by obsecurity.dyndns.org (Postfix, from userid 1000) id 5A89E51271; Thu, 23 Feb 2006 18:07:34 -0500 (EST) Date: Thu, 23 Feb 2006 18:07:34 -0500 From: Kris Kennaway To: Kris Kennaway Message-ID: <20060223230734.GA93088@xor.obsecurity.org> References: <20060223204716.GA90985@xor.obsecurity.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ZGiS0Q5IWpPtfppv" Content-Disposition: inline In-Reply-To: <20060223204716.GA90985@xor.obsecurity.org> User-Agent: Mutt/1.4.2.1i Cc: jhb@FreeBSD.org, sparc64@FreeBSD.org Subject: Re: "sched_lock held too long" panic + trace X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Feb 2006 23:07:36 -0000 --ZGiS0Q5IWpPtfppv Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Feb 23, 2006 at 03:47:16PM -0500, Kris Kennaway wrote: > One of my e4500s has started panicking regularly under load because > sched_lock was held for > 5 seconds. Since on sparc64 it always > deadlocks after this panic instead of entering DDB, I wasn't able to > track down the cause. Instead, I changed the panic to first > DELAY(1000000*PCPU_GET(cpuid)) (so that different CPUs don't overlap > the printfs) and then kdb_backtrace(). >=20 > Doing so I obtained the following trace (still a bit corrupted, but > hopefully more useful). > KDB: stack backtrace: > hardclock_cpu() at hardclock_cpu+0x6c > tick_hardclock() at tick_hardclock+0xc4 > -- interrupt level=3D0xe pil=3D0 %o7=3D0xc0190a98 -- > _mtx_lock_spin() at _mtx_lock_spin+0xf4 > tlb_page_demap() at tlb_page_demap+0xa0 > pmap_zero_page_idle() at pmap_zero_page_idle+0xdc > vm_page_zero_idle() at vm_page_zero_idle+0x108 > vm_pagezero() at vm_pagezero+0x4c > fork_exit() at fork_exit+0x94 > fork_trampoline() at fork_trampoline+0x8 Witness seems to have caught this: panic: blockable sleep lock (sleep mutex) system map @ vm/vm_map.c:2995 db> wh Tracing pid 1267 tid 100248 td 0xfffff800612a0540 panic() at panic+0x164 witness_checkorder() at witness_checkorder+0xc8 _mtx_lock_flags() at _mtx_lock_flags+0x80 _vm_map_lock_read() at _vm_map_lock_read+0x3c vm_map_lookup() at vm_map_lookup+0x1c vm_fault() at vm_fault+0x68 trap_pfault() at trap_pfault+0x1a8 trap() at trap+0x2b0 -- fast data access mmu miss tar=3D0xe819c000 %o7=3D0xc031d204 -- cpu_ipi_selected() at cpu_ipi_selected+0x2c tlb_page_demap() at tlb_page_demap+0x74 pmap_copy_page() at pmap_copy_page+0x39c vm_fault() at vm_fault+0xe5c trap_pfault() at trap_pfault+0x134 trap() at trap+0xa0 -- data access protection tar=3D0x4065c524 sfar=3D0x4065d314 sfsr=3D0x80000= 5 %o7=3D0x40350c94 -- db> show alllocks Process 1273 (creat) thread 0xfffff802274dad20 (100604) exclusive sleep mutex vm object (standard object) r =3D 0 (0xfffff80223c256= a8) locked @ vm/vm_fault.c:686 exclusive sx user map r =3D 0 (0xfffff802284e2280) locked @ vm/vm_map.c:2995 Process 1272 (thr1) thread 0xfffff80228ab1500 (100353) exclusive sleep mutex vm object (standard object) r =3D 0 (0xfffff802276599= 00) locked @ vm/vm_fault.c:686 exclusive sx user map r =3D 0 (0xfffff80227cbccd0) locked @ vm/vm_map.c:2995 Process 1271 (swap) thread 0xfffff80226ec3260 (100625) exclusive sleep mutex vm object (standard object) r =3D 0 (0xfffff80228772e= d8) locked @ vm/vm_fault.c:686 exclusive sx user map r =3D 0 (0xfffff8005ee15b40) locked @ vm/vm_map.c:2995 Process 1270 (creat) thread 0xfffff80050078fc0 (100605) exclusive sleep mutex vm object (standard object) r =3D 0 (0xfffff8021faa39= 00) locked @ vm/vm_fault.c:789 exclusive sx user map r =3D 0 (0xfffff802284e2070) locked @ vm/vm_map.c:2995 Process 1267 (thr1) thread 0xfffff800612a0540 (100248) exclusive sleep mutex pmap r =3D 0 (0xc0513458) locked @ sparc64/sparc64/pm= ap.c:1649 exclusive sleep mutex vm object (standard object) r =3D 0 (0xfffff802279ae0= c8) locked @ vm/vm_fault.c:686 exclusive sx user map r =3D 0 (0xfffff80061410ee0) locked @ vm/vm_map.c:2995 exclusive spin mutex ipi r =3D 0 (0xc05133c0) locked @ ./machine/smp.h:174 Process 972 (thr1) thread 0xfffff800612a1a40 (100240) exclusive sleep mutex vm object (standard object) r =3D 0 (0xfffff802276599= c8) locked @ vm/vm_fault.c:686 exclusive sx user map r =3D 0 (0xfffff8006158e6a0) locked @ vm/vm_map.c:2995 Process 948 (rw) thread 0xfffff80226a02fc0 (100900) exclusive sleep mutex vm page queue mutex r =3D 0 (0xc04e4f60) locked @ spa= rc64/sparc64/pmap.c:937 exclusive sx user map r =3D 0 (0xfffff8022273c8b0) locked @ vm/vm_map.c:2487 Process 919 (creat) thread 0xfffff8004fef0000 (100350) exclusive sx user map r =3D 0 (0xfffff80228064070) locked @ vm/vm_map.c:2995 Process 906 (swap) thread 0xfffff80228aa02a0 (100308) exclusive sx user map r =3D 0 (0xfffff80050848ee0) locked @ vm/vm_map.c:2487 Process 35 (syncer) thread 0xfffff802be5a4a80 (100044) exclusive sleep mutex Giant r =3D 0 (0xc044b038) locked @ kern/vfs_subr.c:1= 607 db> Kris --ZGiS0Q5IWpPtfppv Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (FreeBSD) iD8DBQFD/kA1Wry0BWjoQKURAjagAKDJG12vYN5LQUpoBuc3PF4A6c7awgCdEZsF nm5+9LIY+xVyK+aH9wY+Wxw= =4ErH -----END PGP SIGNATURE----- --ZGiS0Q5IWpPtfppv--