Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Nov 2008 14:54:51 +0300
From:      Anton Yuzhaninov <citrin@citrin.ru>
To:        freebsd-stable@FreeBSD.org
Subject:   Re: RELENG_7 panic under load: vm_page_unwire: invalid wire count: 0
Message-ID:  <492BE78B.6070202@citrin.ru>
In-Reply-To: <492B3B01.9040105@citrin.ru>
References:  <492B2F46.9000709@citrin.ru> <492B3B01.9040105@citrin.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
On 25.11.2008 02:38, Anton Yuzhaninov wrote:
> On 25.11.2008 01:48, Anton Yuzhaninov wrote:
>> Box with fresh RELENG_7 panic under heavy network load (more than 50k 
>> connections).
>>
>> This panics seems to be senfile(2) related, because when sendfile 
>> disabled in nginx, I can't reproduce the problem.
>>
>> Backtrace in all cases like this:
>>
>> # kgdb kernel /spool/crash/vmcore.1
>> GNU gdb 6.1.1 [FreeBSD]
>> Copyright 2004 Free Software Foundation, Inc.
>> GDB is free software, covered by the GNU General Public License, and 
>> you are
>> welcome to change it and/or distribute copies of it under certain 
>> conditions.
>> Type "show copying" to see the conditions.
>> There is absolutely no warranty for GDB.  Type "show warranty" for 
>> details.
>> This GDB was configured as "amd64-marcel-freebsd"...
>>
>> Unread portion of the kernel message buffer:
>> panic: vm_page_unwire: invalid wire count: 0
>> cpuid = 0
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
>> panic() at panic+0x182
>> vm_page_unwire() at vm_page_unwire+0x84
>> sf_buf_mext() at sf_buf_mext+0x3c
>> mb_free_ext() at mb_free_ext+0x99
>> sbdrop_internal() at sbdrop_internal+0x1e8
>> tcp_do_segment() at tcp_do_segment+0x1512
>> tcp_input() at tcp_input+0x7f7
>> ip_input() at ip_input+0xa8
>> ether_demux() at ether_demux+0x1b4
>> ether_input() at ether_input+0x1bb
>> bge_intr() at bge_intr+0x3ca
>> ithread_loop() at ithread_loop+0x180
>> fork_exit() at fork_exit+0x11f
>> fork_trampoline() at fork_trampoline+0xe
>> --- trap 0, rip = 0, rsp = 0xffffffffea28fd30, rbp = 0 ---
>> Uptime: 36m47s
>> Physical memory: 4087 MB
>> Dumping 708 MB: 693 677 661 645 629 613 597 581 565 549 533 517 501 
>> 485 469 453 437 421 405 389 373 357 341 325 309 293 277 261 245 229 
>> 213 197 181 165 149 133 117 101 85 69 53 37 21 5
>>
>> #0  doadump () at pcpu.h:195
>> 195             __asm __volatile("movq %%gs:0,%0" : "=r" (td));
>> (kgdb) bt
>> #0  doadump () at pcpu.h:195
>> #1  0xffffffff8031adf8 in boot (howto=260) at 
>> /usr/src/sys/kern/kern_shutdown.c:418
>> #2  0xffffffff8031b25c in panic (fmt=Variable "fmt" is not available.
>> ) at /usr/src/sys/kern/kern_shutdown.c:574
>> #3  0xffffffff8044a084 in vm_page_unwire (m=Variable "m" is not 
>> available.
>> ) at /usr/src/sys/vm/vm_page.c:1410
>> #4  0xffffffff80379a4c in sf_buf_mext (addr=Variable "addr" is not 
>> available.
>> ) at /usr/src/sys/kern/uipc_syscalls.c:1720
>> #5  0xffffffff8036e9c9 in mb_free_ext (m=0xffffff0081f93d00) at 
>> /usr/src/sys/kern/uipc_mbuf.c:257
> 
> May be it is wire_count integer overflow?
> 
> wire_count type is u_short...
> 

Yes, It clearly wire_count integer overflow

On kernel with INVARIANTS panic string is:

vm_page_wire: wire_count overflow m=0xffffff00d7d270c8

(kgdb) bt
#0  doadump () at pcpu.h:195
#1  0xffffffff8030b806 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
#2  0xffffffff8030bc6c in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:574
#3  0xffffffff8042e289 in vm_page_wire (m=Variable "m" is not available.
) at /usr/src/sys/vm/vm_page.c:1358
#4  0xffffffff8042f62e in vm_page_grab (object=0xffffff0003cfa340, pindex=7, allocflags=672)
     at /usr/src/sys/vm/vm_page.c:1695
#5  0xffffffff8036a9ca in kern_sendfile (td=0xffffff0003870a50, uap=0xffffffffec6e1bf0, hdr_uio=0x1000,
     trl_uio=0xffffff001ea7e6c0, compat=Variable "compat" is not available.
) at /usr/src/sys/kern/uipc_syscalls.c:2050
#6  0xffffffff8036b0f6 in sendfile (td=0xffffff0003870a50, uap=0xffffffffec6e1bf0) at /usr/src/sys/kern/uipc_syscalls.c:1775
#7  0xffffffff804593ec in syscall (frame=0xffffffffec6e1c80) at /usr/src/sys/amd64/amd64/trap.c:907

I see 2 ways to fix this bug:

1. Change wire_count type to u_int - it bad, because vm_page will eats more memory.

I have tested with u_int wire_count - panic don't repeated.

2. Check wire_count overflow and return error to sendfile(2) caller, but it is not easy to implement...

-- 
  Anton Yuzhaninov



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?492BE78B.6070202>