Date: Fri, 31 Jul 2009 05:40:49 +0400 From: Kamigishi Rei <spambox@haruhiism.net> To: FreeBSD Current <freebsd-current@freebsd.org> Subject: Re: [follow-up] FreeBSD/amd64 r195146 to r195848, fatal trap 12 under network load Message-ID: <4A724BA1.7050303@haruhiism.net> In-Reply-To: <4A6F0A35.7050809@haruhiism.net> References: <4A6F0A35.7050809@haruhiism.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Kamigishi Rei wrote: > Revisions mentioned are those which were tested by me; r195849+ has > the corruption padded somewhere else so it might produce a panic with > a different set of options. For reference, my test kernel uses a > GENERIC config from May 09 snapshot without WITNESS and with > IPFIREWALL, IPFIREWALL_DEFAULT_TO_ACCEPT and DEVICE_POLLING enabled. r195981 (latest checkout) traps with the *GENERIC* kernel (with WITNESS enabled). Same backtrace, same cause, and UP systems are not affected again. Apparently, my diagnostics patch from the previous message seems to pad the corruption somewhere, so I can't use it to check lo_witness or other fields of nws_mtx at the time when mtx_lock gets corrupted. Trap can be triggered with "ping -f -s 65507 localhost", iperf (just "iperf -c localhost" works for me), or by generating some high-speed network throughput (even a mysql query over localhost will do as we have a race here). Running ping will mostly trigger the trap inside swi_net(); iperf - inside netisr_queue_internal(). I will be grateful if someone could provide me some information on how to further debug it. Currently, I suspect that there's something about handling modspace (incorrect dereference somewhere, or something like that). Crash info: Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 03 fault virtual address = 0x4c89d38 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff8056ffca stack pointer = 0x28:0xffffff800003eae0 frame pointer = 0x28:0xffffff800003eb10 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (swi1: netisr 0) Physical memory: 998 MB Dumping 1137 MB: 1122 1106 1090 1074 1058 1042 1026 1010 994 978 962 946 930 914 898 882 866 850 834 818 802 786 770 754 738 722 706 690 674 658 642 626 610 594 578 562 546 530 514 498 482 466 450 434 418 402 386 370 354 338 322 306 290 274 258 242 226 210 194 178 162 146 130 114 98 82 66 50 34 18 2 Reading symbols from /boot/kernel/ahci.ko...Reading symbols from /boot/kernel/ahci.ko.symbols...done. done. Loaded symbols for /boot/kernel/ahci.ko #0 doadump () at pcpu.h:223 223 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump () at pcpu.h:223 #1 0xffffffff801d8a9c in db_fncall (dummy1=Variable "dummy1" is not available. ) at /usr/src/sys/ddb/db_command.c:548 #2 0xffffffff801d8dd1 in db_command (last_cmdp=0xffffffff80be2720, cmd_table=Variable "cmd_table" is not available. ) at /usr/src/sys/ddb/db_command.c:445 #3 0xffffffff801d9020 in db_command_loop () at /usr/src/sys/ddb/db_command.c:498 #4 0xffffffff801daff9 in db_trap (type=Variable "type" is not available. ) at /usr/src/sys/ddb/db_main.c:229 #5 0xffffffff805adf65 in kdb_trap (type=12, code=0, tf=0xffffff800003ea30) at /usr/src/sys/kern/subr_kdb.c:534 #6 0xffffffff8085e7bd in trap_fatal (frame=0xffffff800003ea30, eva=Variable "eva" is not available. ) at /usr/src/sys/amd64/amd64/trap.c:847 #7 0xffffffff8085eb2d in trap_pfault (frame=0xffffff800003ea30, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:768 #8 0xffffffff8085f523 in trap (frame=0xffffff800003ea30) at /usr/src/sys/amd64/amd64/trap.c:494 #9 0xffffffff80844fe3 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:224 #10 0xffffffff8056ffca in _mtx_lock_sleep (m=0xffffffff81006824, tid=18446742974233875344, opts=Variable "opts" is not available. ) at /usr/src/sys/kern/kern_mutex.c:369 #11 0xffffffff805701b1 in _mtx_lock_flags (m=0xffffffff81006824, opts=0, file=0xffffffff8096c255 "/usr/src/sys/net/netisr.c", line=723) at /usr/src/sys/kern/kern_mutex.c:203 #12 0xffffffff8063411c in swi_net (arg=Variable "arg" is not available. ) at /usr/src/sys/net/netisr.c:723 Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x45b4288 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff8056ffca stack pointer = 0x28:0xffffff800003eae0 frame pointer = 0x28:0xffffff800003eb10 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (swi1: netisr 0) Physical memory: 998 MB Dumping 1233 MB: 1218 1202 1186 1170 1154 1138 1122 1106 1090 1074 1058 1042 1026 1010 994 978 962 946 930 914 898 882 866 850 834 818 802 786 770 754 738 722 706 690 674 658 642 626 610 594 578 562 546 530 514 498 482 466 450 434 418 402 386 370 354 338 322 306 290 274 258 242 226 210 194 178 162 146 130 114 98 82 66 50 34 18 2 Reading symbols from /boot/kernel/ahci.ko...Reading symbols from /boot/kernel/ahci.ko.symbols...done. done. Loaded symbols for /boot/kernel/ahci.ko #0 doadump () at pcpu.h:223 223 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump () at pcpu.h:223 #1 0xffffffff801d8a9c in db_fncall (dummy1=Variable "dummy1" is not available. ) at /usr/src/sys/ddb/db_command.c:548 #2 0xffffffff801d8dd1 in db_command (last_cmdp=0xffffffff80be2720, cmd_table=Variable "cmd_table" is not available. ) at /usr/src/sys/ddb/db_command.c:445 #3 0xffffffff801d9020 in db_command_loop () at /usr/src/sys/ddb/db_command.c:498 #4 0xffffffff801daff9 in db_trap (type=Variable "type" is not available. ) at /usr/src/sys/ddb/db_main.c:229 #5 0xffffffff805adf65 in kdb_trap (type=12, code=0, tf=0xffffff800003ea30) at /usr/src/sys/kern/subr_kdb.c:534 #6 0xffffffff8085e7bd in trap_fatal (frame=0xffffff800003ea30, eva=Variable "eva" is not available. ) at /usr/src/sys/amd64/amd64/trap.c:847 #7 0xffffffff8085eb2d in trap_pfault (frame=0xffffff800003ea30, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:768 #8 0xffffffff8085f523 in trap (frame=0xffffff800003ea30) at /usr/src/sys/amd64/amd64/trap.c:494 #9 0xffffffff80844fe3 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:224 #10 0xffffffff8056ffca in _mtx_lock_sleep (m=0xffffffff81006824, tid=18446742974233875344, opts=Variable "opts" is not available. ) at /usr/src/sys/kern/kern_mutex.c:369 #11 0xffffffff805701b1 in _mtx_lock_flags (m=0xffffffff81006824, opts=0, file=0xffffffff8096c255 "/usr/src/sys/net/netisr.c", line=753) at /usr/src/sys/kern/kern_mutex.c:203 #12 0xffffffff80633fc2 in swi_net (arg=Variable "arg" is not available. ) at /usr/src/sys/net/netisr.c:753 These two are from ping -f. And this one is from iperf: (kgdb) #0 doadump () at pcpu.h:223 #1 0xffffffff801d8a9c in db_fncall (dummy1=Variable "dummy1" is not available. ) at /usr/src/sys/ddb/db_command.c:548 #2 0xffffffff801d8dd1 in db_command (last_cmdp=0xffffffff80be2720, cmd_table=Variable "cmd_table" is not available. ) at /usr/src/sys/ddb/db_command.c:445 #3 0xffffffff801d9020 in db_command_loop () at /usr/src/sys/ddb/db_command.c:498 #4 0xffffffff801daff9 in db_trap (type=Variable "type" is not available. ) at /usr/src/sys/ddb/db_main.c:229 #5 0xffffffff805adf65 in kdb_trap (type=12, code=0, tf=0xffffff80238764d0) at /usr/src/sys/kern/subr_kdb.c:534 #6 0xffffffff8085e7bd in trap_fatal (frame=0xffffff80238764d0, eva=Variable "eva" is not available. ) at /usr/src/sys/amd64/amd64/trap.c:847 #7 0xffffffff8085f48c in trap (frame=0xffffff80238764d0) at /usr/src/sys/amd64/amd64/trap.c:345 #8 0xffffffff80844fe3 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:224 #9 0xffffffff8056ffca in _mtx_lock_sleep (m=0xffffffff81006824, tid=18446742974277752608, opts=Variable "opts" is not available. ) at /usr/src/sys/kern/kern_mutex.c:369 #10 0xffffffff805701b1 in _mtx_lock_flags (m=0xffffffff81006824, opts=0, file=0xffffffff8096c255 "/usr/src/sys/net/netisr.c", line=830) at /usr/src/sys/kern/kern_mutex.c:203 #11 0xffffffff806344a5 in netisr_queue_internal (proto=1, m=0xffffff0004fa6400, cpuid=Variable "cpuid" is not available. ) at /usr/src/sys/net/netisr.c:830 #12 0xffffffff80634589 in netisr_queue_src (proto=1, source=Variable "source" is not available. ) at /usr/src/sys/net/netisr.c:860 -- Kamigishi Rei KREI-RIPE
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4A724BA1.7050303>