From owner-freebsd-current@FreeBSD.ORG Thu Aug 13 13:33:00 2009 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 632BA1065672; Thu, 13 Aug 2009 13:33:00 +0000 (UTC) (envelope-from uqs@spoerlein.net) Received: from acme.spoerlein.net (cl-43.dus-01.de.sixxs.net [IPv6:2a01:198:200:2a::2]) by mx1.freebsd.org (Postfix) with ESMTP id 0E9768FC1F; Thu, 13 Aug 2009 13:32:59 +0000 (UTC) Received: from roadrunner.spoerlein.net (e180177181.adsl.alicedsl.de [85.180.177.181]) by acme.spoerlein.net (8.14.3/8.14.3) with ESMTP id n7DDWrZL023790 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 13 Aug 2009 15:32:54 +0200 (CEST) (envelope-from uqs@spoerlein.net) Received: from roadrunner.spoerlein.net (localhost [127.0.0.1]) by roadrunner.spoerlein.net (8.14.3/8.14.3) with ESMTP id n7DDT8tH011663 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 13 Aug 2009 15:29:08 +0200 (CEST) (envelope-from uqs@spoerlein.net) Received: (from uqs@localhost) by roadrunner.spoerlein.net (8.14.3/8.14.3/Submit) id n7DDT771011662; Thu, 13 Aug 2009 15:29:07 +0200 (CEST) (envelope-from uqs@spoerlein.net) Date: Thu, 13 Aug 2009 15:29:07 +0200 From: Ulrich =?utf-8?B?U3DDtnJsZWlu?= To: Alan Cox Message-ID: <20090813132907.GA1591@roadrunner.spoerlein.net> Mail-Followup-To: Alan Cox , current@freebsd.org, Kip Macy References: <20090713181650.GB76464@acme.spoerlein.net> <4A5B7D24.60100@cs.rice.edu> <20090714105245.GR2145@acme.spoerlein.net> <4A82DFBF.5020101@cs.rice.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <4A82DFBF.5020101@cs.rice.edu> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: current@freebsd.org, Kip Macy Subject: Re: panic: vm_page_free_toq: freeing mapped page X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Aug 2009 13:33:00 -0000 On Wed, 12.08.2009 at 10:29:03 -0500, Alan Cox wrote: > Ulrich Spörlein wrote: > > On Mon, 13.07.2009 at 13:29:56 -0500, Alan Cox wrote: > > > >> Ulrich Spörlein wrote: > >> > >>> On Mon, 13.07.2009 at 19:15:03 +0200, Ulrich Spörlein wrote: > >>> > >>> > >>>> On Sun, 12.07.2009 at 14:22:23 -0700, Kip Macy wrote: > >>>> > >>>> > >>>>> On Sun, Jul 12, 2009 at 1:31 PM, Ulrich Spörlein wrote: > >>>>> > >>>>> > >>>>>> Hi, > >>>>>> > >>>>>> 8.0 BETA1 @ r195622 will panic reliably when running the clang static > >>>>>> analyzer on a buildworld with something like the following panic: > >>>>>> > >>>>>> panic: vm_page_free_toq: freeing mapped page 0xffffff00c9715b30 > >>>>>> cpuid = 1 > >>>>>> KDB: stack backtrace: > >>>>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > >>>>>> panic() at panic+0x182 > >>>>>> vm_page_free_toq() at vm_page_free_toq+0x1f6 > >>>>>> vm_object_terminate() at vm_object_terminate+0xb7 > >>>>>> vm_object_deallocate() at vm_object_deallocate+0x17a > >>>>>> _vm_map_unlock() at _vm_map_unlock+0x70 > >>>>>> vm_map_remove() at vm_map_remove+0x6f > >>>>>> vmspace_free() at vmspace_free+0x56 > >>>>>> vmspace_exec() at vmspace_exec+0x56 > >>>>>> exec_new_vmspace() at exec_new_vmspace+0x133 > >>>>>> exec_elf32_imgact() at exec_elf32_imgact+0x2ee > >>>>>> kern_execve() at kern_execve+0x3b2 > >>>>>> execve() at execve+0x3d > >>>>>> syscall() at syscall+0x1af > >>>>>> Xfast_syscall() at Xfast_syscall+0xe1 > >>>>>> --- syscall (59, FreeBSD ELF64, execve), rip = 0x800c20d0c, rsp = 0x7fffffffd6f8, rbp = 0x7fffffffdbf0 --- > >>>>>> > >>>>>> > >>>>> Can you try the following change: > >>>>> > >>>>> http://svn.freebsd.org/viewvc/base/user/kmacy/releng_7_2_fcs/sys/vm/vm_object.c?r1=192842&r2=195297 > >>>>> > >>>>> > >>>> Applied this to HEAD by hand an ran with it, it died 20-30 minutes into > >>>> the scan-build run. So no luck there. Next up is a test using the > >>>> GENERIC kernel. > >>>> > >>> No improvement with a GENERIC kernel. Next up will be to run this with > >>> clean sysctl, loader.conf, etc. Then I'll try disabling SMP. > >>> > >>> Does the backtrace above point to any specific subsystem? I'm using UFS, > >>> ZFS and GELI on this machine and could try a few combinations... > >>> > >> The interesting thing about the backtrace is that it shows a 32-bit i386 > >> executable being started on a 64-bit amd64 machine. I've seen this > >> backtrace once before, and you'll find it in the PR database. In that > >> case, the problem "went away" after the known-to-be-broken > >> ZERO_COPY_SOCKETS option was removed from the reporter's kernel > >> configuration. However, I don't see that as the culprit here. > >> > > > > Hi Alan, first the bad news > > > > I ran this test with a GENERIC kernel, SMP disabled, hw.physmem set to 2 > > GB in single user mode, so no other processes or deamons running, > > nothing special in loader.conf except for ZFS and GELI. It reliably > > panics, so nothing new here. > > > > Now the good news, you may be able to crash your own amd64 box in 3 > > minutes by doing: > > > > mkdir /tmp/foo && cd /tmp/foo > > fetch -o- https://www.spoerlein.net/pub/llvm-clang.tar.gz | tar xf - > > while :; do for d in bin sbin usr.bin usr.sbin; do $PWD/scan-build -o /dev/null -k make -C /usr/src/$d clean obj depend all; done; done > > > > Please note that scan-build/ccc-analyzer wont actually do anything, as > > they cannot create output in /dev/null. So this is just running the > > perl-script and forking make/sh/awk/ccc-analyzer like mad. It does not > > survive 3 minutes on my Core2 Duo 3.3 GHz. > > > > Hi Ulrich, > > I finally got a chance to try this workload. I'm afraid that I can't > reproduce the assertion failure on my amd64 test machine. I left the > test running overnight, and it was still going strong this morning. > > I am using neither ZFS nor GELI. Is it possible for you to repeat this > test without ZFS and/or GELI? > > I would also be curious if anyone else reading this message can > reproduce the assertion failure with the above test. Now isn't this great :/ I haven't tracked the bug for the last couple of weeks, but the system was updated to recent HEAD and got its ports rebuild (several times). I don't know which change "fixed" it, but I think it was the perl rebuild (I had some trouble with perl5.10 on 8.0 at first). Besides, the process doing the fork in the backtrace was always the perl binary, IIRC. So right now I'm no longer able to reproduce it myself ... Regards, Uli