Date: Tue, 25 Nov 2008 19:44:37 +0300 From: Anton Yuzhaninov <citrin@citrin.ru> To: freebsd-stable@FreeBSD.org Subject: Re: RELENG_7 panic under load: vm_page_unwire: invalid wire count: 0 Message-ID: <492C2B75.8090007@citrin.ru> In-Reply-To: <492BE78B.6070202@citrin.ru> References: <492B2F46.9000709@citrin.ru> <492B3B01.9040105@citrin.ru> <492BE78B.6070202@citrin.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
On 25.11.2008 14:54, Anton Yuzhaninov wrote: > On 25.11.2008 02:38, Anton Yuzhaninov wrote: >> On 25.11.2008 01:48, Anton Yuzhaninov wrote: >>> Box with fresh RELENG_7 panic under heavy network load (more than 50k >>> connections). >>> >>> This panics seems to be senfile(2) related, because when sendfile >>> disabled in nginx, I can't reproduce the problem. >>> >>> Backtrace in all cases like this: >>> >>> # kgdb kernel /spool/crash/vmcore.1 >>> GNU gdb 6.1.1 [FreeBSD] >>> Copyright 2004 Free Software Foundation, Inc. >>> GDB is free software, covered by the GNU General Public License, and >>> you are >>> welcome to change it and/or distribute copies of it under certain >>> conditions. >>> Type "show copying" to see the conditions. >>> There is absolutely no warranty for GDB. Type "show warranty" for >>> details. >>> This GDB was configured as "amd64-marcel-freebsd"... >>> >>> Unread portion of the kernel message buffer: >>> panic: vm_page_unwire: invalid wire count: 0 >>> cpuid = 0 >>> KDB: stack backtrace: >>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a >>> panic() at panic+0x182 >>> vm_page_unwire() at vm_page_unwire+0x84 >>> sf_buf_mext() at sf_buf_mext+0x3c >>> mb_free_ext() at mb_free_ext+0x99 >>> sbdrop_internal() at sbdrop_internal+0x1e8 >>> tcp_do_segment() at tcp_do_segment+0x1512 >>> tcp_input() at tcp_input+0x7f7 >>> ip_input() at ip_input+0xa8 >>> ether_demux() at ether_demux+0x1b4 >>> ether_input() at ether_input+0x1bb >>> bge_intr() at bge_intr+0x3ca >>> ithread_loop() at ithread_loop+0x180 >>> fork_exit() at fork_exit+0x11f >>> fork_trampoline() at fork_trampoline+0xe >>> --- trap 0, rip = 0, rsp = 0xffffffffea28fd30, rbp = 0 --- >>> Uptime: 36m47s >>> Physical memory: 4087 MB >>> Dumping 708 MB: 693 677 661 645 629 613 597 581 565 549 533 517 501 >>> 485 469 453 437 421 405 389 373 357 341 325 309 293 277 261 245 229 >>> 213 197 181 165 149 133 117 101 85 69 53 37 21 5 >>> >>> #0 doadump () at pcpu.h:195 >>> 195 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); >>> (kgdb) bt >>> #0 doadump () at pcpu.h:195 >>> #1 0xffffffff8031adf8 in boot (howto=260) at >>> /usr/src/sys/kern/kern_shutdown.c:418 >>> #2 0xffffffff8031b25c in panic (fmt=Variable "fmt" is not available. >>> ) at /usr/src/sys/kern/kern_shutdown.c:574 >>> #3 0xffffffff8044a084 in vm_page_unwire (m=Variable "m" is not >>> available. >>> ) at /usr/src/sys/vm/vm_page.c:1410 >>> #4 0xffffffff80379a4c in sf_buf_mext (addr=Variable "addr" is not >>> available. >>> ) at /usr/src/sys/kern/uipc_syscalls.c:1720 >>> #5 0xffffffff8036e9c9 in mb_free_ext (m=0xffffff0081f93d00) at >>> /usr/src/sys/kern/uipc_mbuf.c:257 >> >> May be it is wire_count integer overflow? >> >> wire_count type is u_short... >> > > Yes, It clearly wire_count integer overflow > > On kernel with INVARIANTS panic string is: > > vm_page_wire: wire_count overflow m=0xffffff00d7d270c8 > > (kgdb) bt > #0 doadump () at pcpu.h:195 > #1 0xffffffff8030b806 in boot (howto=260) at > /usr/src/sys/kern/kern_shutdown.c:418 > #2 0xffffffff8030bc6c in panic (fmt=Variable "fmt" is not available. > ) at /usr/src/sys/kern/kern_shutdown.c:574 > #3 0xffffffff8042e289 in vm_page_wire (m=Variable "m" is not available. > ) at /usr/src/sys/vm/vm_page.c:1358 > #4 0xffffffff8042f62e in vm_page_grab (object=0xffffff0003cfa340, > pindex=7, allocflags=672) > at /usr/src/sys/vm/vm_page.c:1695 > #5 0xffffffff8036a9ca in kern_sendfile (td=0xffffff0003870a50, > uap=0xffffffffec6e1bf0, hdr_uio=0x1000, > trl_uio=0xffffff001ea7e6c0, compat=Variable "compat" is not available. > ) at /usr/src/sys/kern/uipc_syscalls.c:2050 > #6 0xffffffff8036b0f6 in sendfile (td=0xffffff0003870a50, > uap=0xffffffffec6e1bf0) at /usr/src/sys/kern/uipc_syscalls.c:1775 > #7 0xffffffff804593ec in syscall (frame=0xffffffffec6e1c80) at > /usr/src/sys/amd64/amd64/trap.c:907 > > I see 2 ways to fix this bug: > > 1. Change wire_count type to u_int - it bad, because vm_page will eats > more memory. > > I have tested with u_int wire_count - panic don't repeated. > It seems to be good solution. due to alignment vm_page has same size On unpatched kernel (u_short wire_count): (kgdb) print sizeof(struct vm_page) $1 = 120 On patched kernel (u_int wire_count): (kgdb) print sizeof(struct vm_page) $1 = 120 arch is amd64 on both boxes. -- Anton Yuzhaninov
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?492C2B75.8090007>