Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 12 Aug 2009 10:29:03 -0500
From:      Alan Cox <alc@cs.rice.edu>
To:        current@freebsd.org, =?UTF-8?B?VWxyaWNoIFNww7ZybGVpbg==?= <uqs@spoerlein.net>
Cc:        Alan Cox <alc@cs.rice.edu>, Kip Macy <kmacy@freebsd.org>
Subject:   Re: panic: vm_page_free_toq: freeing mapped page
Message-ID:  <4A82DFBF.5020101@cs.rice.edu>
In-Reply-To: <20090714105245.GR2145@acme.spoerlein.net>
References:  <20090713181650.GB76464@acme.spoerlein.net> <4A5B7D24.60100@cs.rice.edu> <20090714105245.GR2145@acme.spoerlein.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Ulrich Spörlein wrote:
> On Mon, 13.07.2009 at 13:29:56 -0500, Alan Cox wrote:
>   
>> Ulrich Spörlein wrote:
>>     
>>> On Mon, 13.07.2009 at 19:15:03 +0200, Ulrich Spörlein wrote:
>>>   
>>>       
>>>> On Sun, 12.07.2009 at 14:22:23 -0700, Kip Macy wrote:
>>>>     
>>>>         
>>>>> On Sun, Jul 12, 2009 at 1:31 PM, Ulrich Spörlein<uqs@spoerlein.net> wrote:
>>>>>       
>>>>>           
>>>>>> Hi,
>>>>>>
>>>>>> 8.0 BETA1 @ r195622 will panic reliably when running the clang static
>>>>>> analyzer on a buildworld with something like the following panic:
>>>>>>
>>>>>> panic: vm_page_free_toq: freeing mapped page 0xffffff00c9715b30
>>>>>> cpuid = 1
>>>>>> KDB: stack backtrace:
>>>>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
>>>>>> panic() at panic+0x182
>>>>>> vm_page_free_toq() at vm_page_free_toq+0x1f6
>>>>>> vm_object_terminate() at vm_object_terminate+0xb7
>>>>>> vm_object_deallocate() at vm_object_deallocate+0x17a
>>>>>> _vm_map_unlock() at _vm_map_unlock+0x70
>>>>>> vm_map_remove() at vm_map_remove+0x6f
>>>>>> vmspace_free() at vmspace_free+0x56
>>>>>> vmspace_exec() at vmspace_exec+0x56
>>>>>> exec_new_vmspace() at exec_new_vmspace+0x133
>>>>>> exec_elf32_imgact() at exec_elf32_imgact+0x2ee
>>>>>> kern_execve() at kern_execve+0x3b2
>>>>>> execve() at execve+0x3d
>>>>>> syscall() at syscall+0x1af
>>>>>> Xfast_syscall() at Xfast_syscall+0xe1
>>>>>> --- syscall (59, FreeBSD ELF64, execve), rip = 0x800c20d0c, rsp = 0x7fffffffd6f8, rbp = 0x7fffffffdbf0 ---
>>>>>>         
>>>>>>             
>>>>> Can you try the following change:
>>>>>
>>>>> http://svn.freebsd.org/viewvc/base/user/kmacy/releng_7_2_fcs/sys/vm/vm_object.c?r1=192842&r2=195297
>>>>>       
>>>>>           
>>>> Applied this to HEAD by hand an ran with it, it died 20-30 minutes into
>>>> the scan-build run. So no luck there. Next up is a test using the
>>>> GENERIC kernel.
>>>>         
>>> No improvement with a GENERIC kernel. Next up will be to run this with
>>> clean sysctl, loader.conf, etc. Then I'll try disabling SMP.
>>>
>>> Does the backtrace above point to any specific subsystem? I'm using UFS,
>>> ZFS and GELI on this machine and could try a few combinations...
>>>       
>> The interesting thing about the backtrace is that it shows a 32-bit i386 
>> executable being started on a 64-bit amd64 machine.  I've seen this 
>> backtrace once before, and you'll find it in the PR database.  In that 
>> case, the problem "went away" after the known-to-be-broken 
>> ZERO_COPY_SOCKETS option was removed from the reporter's kernel 
>> configuration.  However, I don't see that as the culprit here.
>>     
>
> Hi Alan, first the bad news
>
> I ran this test with a GENERIC kernel, SMP disabled, hw.physmem set to 2
> GB in single user mode, so no other processes or deamons running,
> nothing special in loader.conf except for ZFS and GELI. It reliably
> panics, so nothing new here.
>
> Now the good news, you may be able to crash your own amd64 box in 3
> minutes by doing:
>
> mkdir /tmp/foo && cd /tmp/foo
> fetch -o- https://www.spoerlein.net/pub/llvm-clang.tar.gz | tar xf -
> while :; do for d in bin sbin usr.bin usr.sbin; do $PWD/scan-build -o /dev/null -k make -C /usr/src/$d clean obj depend all; done; done
>
> Please note that scan-build/ccc-analyzer wont actually do anything, as
> they cannot create output in /dev/null. So this is just running the
> perl-script and forking make/sh/awk/ccc-analyzer like mad. It does not
> survive 3 minutes on my Core2 Duo 3.3 GHz.
>   

Hi Ulrich,

I finally got a chance to try this workload.  I'm afraid that I can't 
reproduce the assertion failure on my amd64 test machine.  I left the 
test running overnight, and it was still going strong this morning.

I am using neither ZFS nor GELI.  Is it possible for you to repeat this 
test without ZFS and/or GELI?

I would also be curious if anyone else reading this message can 
reproduce the assertion failure with the above test.

Regards,
Alan





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4A82DFBF.5020101>