Date: Tue, 25 Nov 2008 14:54:51 +0300 From: Anton Yuzhaninov <citrin@citrin.ru> To: freebsd-stable@FreeBSD.org Subject: Re: RELENG_7 panic under load: vm_page_unwire: invalid wire count: 0 Message-ID: <492BE78B.6070202@citrin.ru> In-Reply-To: <492B3B01.9040105@citrin.ru> References: <492B2F46.9000709@citrin.ru> <492B3B01.9040105@citrin.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
On 25.11.2008 02:38, Anton Yuzhaninov wrote: > On 25.11.2008 01:48, Anton Yuzhaninov wrote: >> Box with fresh RELENG_7 panic under heavy network load (more than 50k >> connections). >> >> This panics seems to be senfile(2) related, because when sendfile >> disabled in nginx, I can't reproduce the problem. >> >> Backtrace in all cases like this: >> >> # kgdb kernel /spool/crash/vmcore.1 >> GNU gdb 6.1.1 [FreeBSD] >> Copyright 2004 Free Software Foundation, Inc. >> GDB is free software, covered by the GNU General Public License, and >> you are >> welcome to change it and/or distribute copies of it under certain >> conditions. >> Type "show copying" to see the conditions. >> There is absolutely no warranty for GDB. Type "show warranty" for >> details. >> This GDB was configured as "amd64-marcel-freebsd"... >> >> Unread portion of the kernel message buffer: >> panic: vm_page_unwire: invalid wire count: 0 >> cpuid = 0 >> KDB: stack backtrace: >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a >> panic() at panic+0x182 >> vm_page_unwire() at vm_page_unwire+0x84 >> sf_buf_mext() at sf_buf_mext+0x3c >> mb_free_ext() at mb_free_ext+0x99 >> sbdrop_internal() at sbdrop_internal+0x1e8 >> tcp_do_segment() at tcp_do_segment+0x1512 >> tcp_input() at tcp_input+0x7f7 >> ip_input() at ip_input+0xa8 >> ether_demux() at ether_demux+0x1b4 >> ether_input() at ether_input+0x1bb >> bge_intr() at bge_intr+0x3ca >> ithread_loop() at ithread_loop+0x180 >> fork_exit() at fork_exit+0x11f >> fork_trampoline() at fork_trampoline+0xe >> --- trap 0, rip = 0, rsp = 0xffffffffea28fd30, rbp = 0 --- >> Uptime: 36m47s >> Physical memory: 4087 MB >> Dumping 708 MB: 693 677 661 645 629 613 597 581 565 549 533 517 501 >> 485 469 453 437 421 405 389 373 357 341 325 309 293 277 261 245 229 >> 213 197 181 165 149 133 117 101 85 69 53 37 21 5 >> >> #0 doadump () at pcpu.h:195 >> 195 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); >> (kgdb) bt >> #0 doadump () at pcpu.h:195 >> #1 0xffffffff8031adf8 in boot (howto=260) at >> /usr/src/sys/kern/kern_shutdown.c:418 >> #2 0xffffffff8031b25c in panic (fmt=Variable "fmt" is not available. >> ) at /usr/src/sys/kern/kern_shutdown.c:574 >> #3 0xffffffff8044a084 in vm_page_unwire (m=Variable "m" is not >> available. >> ) at /usr/src/sys/vm/vm_page.c:1410 >> #4 0xffffffff80379a4c in sf_buf_mext (addr=Variable "addr" is not >> available. >> ) at /usr/src/sys/kern/uipc_syscalls.c:1720 >> #5 0xffffffff8036e9c9 in mb_free_ext (m=0xffffff0081f93d00) at >> /usr/src/sys/kern/uipc_mbuf.c:257 > > May be it is wire_count integer overflow? > > wire_count type is u_short... > Yes, It clearly wire_count integer overflow On kernel with INVARIANTS panic string is: vm_page_wire: wire_count overflow m=0xffffff00d7d270c8 (kgdb) bt #0 doadump () at pcpu.h:195 #1 0xffffffff8030b806 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 #2 0xffffffff8030bc6c in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:574 #3 0xffffffff8042e289 in vm_page_wire (m=Variable "m" is not available. ) at /usr/src/sys/vm/vm_page.c:1358 #4 0xffffffff8042f62e in vm_page_grab (object=0xffffff0003cfa340, pindex=7, allocflags=672) at /usr/src/sys/vm/vm_page.c:1695 #5 0xffffffff8036a9ca in kern_sendfile (td=0xffffff0003870a50, uap=0xffffffffec6e1bf0, hdr_uio=0x1000, trl_uio=0xffffff001ea7e6c0, compat=Variable "compat" is not available. ) at /usr/src/sys/kern/uipc_syscalls.c:2050 #6 0xffffffff8036b0f6 in sendfile (td=0xffffff0003870a50, uap=0xffffffffec6e1bf0) at /usr/src/sys/kern/uipc_syscalls.c:1775 #7 0xffffffff804593ec in syscall (frame=0xffffffffec6e1c80) at /usr/src/sys/amd64/amd64/trap.c:907 I see 2 ways to fix this bug: 1. Change wire_count type to u_int - it bad, because vm_page will eats more memory. I have tested with u_int wire_count - panic don't repeated. 2. Check wire_count overflow and return error to sendfile(2) caller, but it is not easy to implement... -- Anton Yuzhaninov
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?492BE78B.6070202>