Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Nov 2008 19:44:37 +0300
From:      Anton Yuzhaninov <citrin@citrin.ru>
To:        freebsd-stable@FreeBSD.org
Subject:   Re: RELENG_7 panic under load: vm_page_unwire: invalid wire count: 0
Message-ID:  <492C2B75.8090007@citrin.ru>
In-Reply-To: <492BE78B.6070202@citrin.ru>
References:  <492B2F46.9000709@citrin.ru> <492B3B01.9040105@citrin.ru> <492BE78B.6070202@citrin.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
On 25.11.2008 14:54, Anton Yuzhaninov wrote:
> On 25.11.2008 02:38, Anton Yuzhaninov wrote:
>> On 25.11.2008 01:48, Anton Yuzhaninov wrote:
>>> Box with fresh RELENG_7 panic under heavy network load (more than 50k 
>>> connections).
>>>
>>> This panics seems to be senfile(2) related, because when sendfile 
>>> disabled in nginx, I can't reproduce the problem.
>>>
>>> Backtrace in all cases like this:
>>>
>>> # kgdb kernel /spool/crash/vmcore.1
>>> GNU gdb 6.1.1 [FreeBSD]
>>> Copyright 2004 Free Software Foundation, Inc.
>>> GDB is free software, covered by the GNU General Public License, and 
>>> you are
>>> welcome to change it and/or distribute copies of it under certain 
>>> conditions.
>>> Type "show copying" to see the conditions.
>>> There is absolutely no warranty for GDB.  Type "show warranty" for 
>>> details.
>>> This GDB was configured as "amd64-marcel-freebsd"...
>>>
>>> Unread portion of the kernel message buffer:
>>> panic: vm_page_unwire: invalid wire count: 0
>>> cpuid = 0
>>> KDB: stack backtrace:
>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
>>> panic() at panic+0x182
>>> vm_page_unwire() at vm_page_unwire+0x84
>>> sf_buf_mext() at sf_buf_mext+0x3c
>>> mb_free_ext() at mb_free_ext+0x99
>>> sbdrop_internal() at sbdrop_internal+0x1e8
>>> tcp_do_segment() at tcp_do_segment+0x1512
>>> tcp_input() at tcp_input+0x7f7
>>> ip_input() at ip_input+0xa8
>>> ether_demux() at ether_demux+0x1b4
>>> ether_input() at ether_input+0x1bb
>>> bge_intr() at bge_intr+0x3ca
>>> ithread_loop() at ithread_loop+0x180
>>> fork_exit() at fork_exit+0x11f
>>> fork_trampoline() at fork_trampoline+0xe
>>> --- trap 0, rip = 0, rsp = 0xffffffffea28fd30, rbp = 0 ---
>>> Uptime: 36m47s
>>> Physical memory: 4087 MB
>>> Dumping 708 MB: 693 677 661 645 629 613 597 581 565 549 533 517 501 
>>> 485 469 453 437 421 405 389 373 357 341 325 309 293 277 261 245 229 
>>> 213 197 181 165 149 133 117 101 85 69 53 37 21 5
>>>
>>> #0  doadump () at pcpu.h:195
>>> 195             __asm __volatile("movq %%gs:0,%0" : "=r" (td));
>>> (kgdb) bt
>>> #0  doadump () at pcpu.h:195
>>> #1  0xffffffff8031adf8 in boot (howto=260) at 
>>> /usr/src/sys/kern/kern_shutdown.c:418
>>> #2  0xffffffff8031b25c in panic (fmt=Variable "fmt" is not available.
>>> ) at /usr/src/sys/kern/kern_shutdown.c:574
>>> #3  0xffffffff8044a084 in vm_page_unwire (m=Variable "m" is not 
>>> available.
>>> ) at /usr/src/sys/vm/vm_page.c:1410
>>> #4  0xffffffff80379a4c in sf_buf_mext (addr=Variable "addr" is not 
>>> available.
>>> ) at /usr/src/sys/kern/uipc_syscalls.c:1720
>>> #5  0xffffffff8036e9c9 in mb_free_ext (m=0xffffff0081f93d00) at 
>>> /usr/src/sys/kern/uipc_mbuf.c:257
>>
>> May be it is wire_count integer overflow?
>>
>> wire_count type is u_short...
>>
> 
> Yes, It clearly wire_count integer overflow
> 
> On kernel with INVARIANTS panic string is:
> 
> vm_page_wire: wire_count overflow m=0xffffff00d7d270c8
> 
> (kgdb) bt
> #0  doadump () at pcpu.h:195
> #1  0xffffffff8030b806 in boot (howto=260) at 
> /usr/src/sys/kern/kern_shutdown.c:418
> #2  0xffffffff8030bc6c in panic (fmt=Variable "fmt" is not available.
> ) at /usr/src/sys/kern/kern_shutdown.c:574
> #3  0xffffffff8042e289 in vm_page_wire (m=Variable "m" is not available.
> ) at /usr/src/sys/vm/vm_page.c:1358
> #4  0xffffffff8042f62e in vm_page_grab (object=0xffffff0003cfa340, 
> pindex=7, allocflags=672)
>     at /usr/src/sys/vm/vm_page.c:1695
> #5  0xffffffff8036a9ca in kern_sendfile (td=0xffffff0003870a50, 
> uap=0xffffffffec6e1bf0, hdr_uio=0x1000,
>     trl_uio=0xffffff001ea7e6c0, compat=Variable "compat" is not available.
> ) at /usr/src/sys/kern/uipc_syscalls.c:2050
> #6  0xffffffff8036b0f6 in sendfile (td=0xffffff0003870a50, 
> uap=0xffffffffec6e1bf0) at /usr/src/sys/kern/uipc_syscalls.c:1775
> #7  0xffffffff804593ec in syscall (frame=0xffffffffec6e1c80) at 
> /usr/src/sys/amd64/amd64/trap.c:907
> 
> I see 2 ways to fix this bug:
> 
> 1. Change wire_count type to u_int - it bad, because vm_page will eats 
> more memory.
> 
> I have tested with u_int wire_count - panic don't repeated.
>

It seems to be good solution.

due to alignment vm_page has same size

On unpatched kernel (u_short wire_count):

(kgdb) print sizeof(struct vm_page)
$1 = 120

On patched kernel (u_int wire_count):

(kgdb) print sizeof(struct vm_page)
$1 = 120

arch is amd64 on both boxes.

-- 
  Anton Yuzhaninov



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?492C2B75.8090007>