From owner-freebsd-sparc64@FreeBSD.ORG Fri Feb 24 16:10:47 2006 Return-Path: X-Original-To: freebsd-sparc64@freebsd.org Delivered-To: freebsd-sparc64@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CFF6116A420 for ; Fri, 24 Feb 2006 16:10:47 +0000 (GMT) (envelope-from kip.macy@gmail.com) Received: from wproxy.gmail.com (wproxy.gmail.com [64.233.184.202]) by mx1.FreeBSD.org (Postfix) with ESMTP id ED03243D78 for ; Fri, 24 Feb 2006 16:10:46 +0000 (GMT) (envelope-from kip.macy@gmail.com) Received: by wproxy.gmail.com with SMTP id i30so313483wra for ; Fri, 24 Feb 2006 08:10:45 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=hoCG9/G2pNqeM29URSfijJSCk90iOFz9aCfKyWnzLP/s1AzYo6PHC02mL9LpCLkDY66/pcmQpODPyB4PYTda86s2mUcw3fq+l0Pn9ikNzoMDRaFf0+2161GyBevStWjbROxKmw2dBVHG77Wx73Yt5cW0fq7TFszqfZqg+h1cBXs= Received: by 10.54.144.10 with SMTP id r10mr1625500wrd; Fri, 24 Feb 2006 08:10:44 -0800 (PST) Received: by 10.54.92.12 with HTTP; Fri, 24 Feb 2006 08:10:44 -0800 (PST) Message-ID: Date: Fri, 24 Feb 2006 08:10:44 -0800 From: Kip Macy To: John Baldwin In-Reply-To: <200602240743.22979.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: <20060223204716.GA90985@xor.obsecurity.org> <20060223230734.GA93088@xor.obsecurity.org> <200602240743.22979.jhb@freebsd.org> Cc: Kris Kennaway , freebsd-sparc64@freebsd.org, sparc64@freebsd.org Subject: Re: "sched_lock held too long" panic + trace X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: kmacy@fsmware.com List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Feb 2006 16:10:48 -0000 No, you're right. The easiest fix is to statically allocate the pcpu pages. -Kip On 2/24/06, John Baldwin wrote: > On Thursday 23 February 2006 06:07 pm, Kris Kennaway wrote: > > On Thu, Feb 23, 2006 at 03:47:16PM -0500, Kris Kennaway wrote: > > > One of my e4500s has started panicking regularly under load because > > > sched_lock was held for > 5 seconds. Since on sparc64 it always > > > deadlocks after this panic instead of entering DDB, I wasn't able to > > > track down the cause. Instead, I changed the panic to first > > > DELAY(1000000*PCPU_GET(cpuid)) (so that different CPUs don't overlap > > > the printfs) and then kdb_backtrace(). > > > > > > Doing so I obtained the following trace (still a bit corrupted, but > > > hopefully more useful). > > > > > > KDB: stack backtrace: > > > hardclock_cpu() at hardclock_cpu+0x6c > > > tick_hardclock() at tick_hardclock+0xc4 > > > -- interrupt level=3D0xe pil=3D0 %o7=3D0xc0190a98 -- > > > _mtx_lock_spin() at _mtx_lock_spin+0xf4 > > > tlb_page_demap() at tlb_page_demap+0xa0 > > > pmap_zero_page_idle() at pmap_zero_page_idle+0xdc > > > vm_page_zero_idle() at vm_page_zero_idle+0x108 > > > vm_pagezero() at vm_pagezero+0x4c > > > fork_exit() at fork_exit+0x94 > > > fork_trampoline() at fork_trampoline+0x8 > > > > Witness seems to have caught this: > > > > panic: blockable sleep lock (sleep mutex) system map @ vm/vm_map.c:2995 > > db> wh > > Tracing pid 1267 tid 100248 td 0xfffff800612a0540 > > panic() at panic+0x164 > > witness_checkorder() at witness_checkorder+0xc8 > > _mtx_lock_flags() at _mtx_lock_flags+0x80 > > _vm_map_lock_read() at _vm_map_lock_read+0x3c > > vm_map_lookup() at vm_map_lookup+0x1c > > vm_fault() at vm_fault+0x68 > > trap_pfault() at trap_pfault+0x1a8 > > trap() at trap+0x2b0 > > -- fast data access mmu miss tar=3D0xe819c000 %o7=3D0xc031d204 -- > > cpu_ipi_selected() at cpu_ipi_selected+0x2c > > tlb_page_demap() at tlb_page_demap+0x74 > > pmap_copy_page() at pmap_copy_page+0x39c > > vm_fault() at vm_fault+0xe5c > > trap_pfault() at trap_pfault+0x134 > > trap() at trap+0xa0 > > -- data access protection tar=3D0x4065c524 sfar=3D0x4065d314 sfsr=3D0x8= 00005 > > %o7=3D0x40350c94 -- > > This is just a bug. You shouldn't get a pagefault in cpu_ipi_selected(). > > -- > John Baldwin <>< http://www.FreeBSD.org/~jhb/ > "Power Users Use the Power to Serve" =3D http://www.FreeBSD.org > _______________________________________________ > freebsd-sparc64@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64 > To unsubscribe, send any mail to "freebsd-sparc64-unsubscribe@freebsd.org= " >