Date: Fri, 29 Feb 2008 11:37:28 +0100 From: Kris Kennaway <kris@FreeBSD.org> To: matthew <matthew@matthew.sk>, freebsd-net@freebsd.org, freebsd-ipfw@FreeBSD.org Subject: Lock order reversals with dummynet (Re: FreeBSD 7.0 Beta, RC, RELEASE (amd64) freezes with dummynet enabled) Message-ID: <47C7E068.5040208@FreeBSD.org> In-Reply-To: <47C7BB11.9020604@matthew.sk> References: <47C5753A.8010806@matthew.sk> <47C57D01.8070007@FreeBSD.org> <47C7BB11.9020604@matthew.sk>
next in thread | previous in thread | raw e-mail | index | archive | help
Adding back the mailing list so others can help. matthew wrote: > Kris Kennaway wrote: > >> matthew wrote: >>> I have posted before that i have a stability issue with the 7.0 branch >>> on my servers. Tested on BETA2,BETA4,RC1,RC2,RELEASE >>> >>> The original thread and my post with details is at: >>> http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-12/msg00674.html >>> >>> >>> I was waiting for the 7.0-RELEASE, updated the whole servers, and >>> enabled dummynet again, but it always freezes after some minutes, 100% >>> reproducible. >>> >>> I tested it also on a HP 140 G3 1U server, where 6.3 has absolutely no >>> problems, but the 7.0 branch keeps freezing. >>> >>> Again, if it helps to solve this bug, i can rebuild the kernel with >>> debug symbols a take some screenshots :) >> >> Please follow the steps at >> >> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html >> >> >> Kris >> _______________________________________________ > I added to the kernel these options: > makeoptions DEBUG=-g # Build kernel with gdb(1) debug > symbols > options INVARIANTS > options INVARIANT_SUPPORT > options WITNESS > options DEBUG_LOCKS > options DEBUG_VFS_LOCKS > options DIAGNOSTIC > options KDB > options DDB > options GDB > options SOCKBUF_DEBUG > > But still, the server freezes withoutany debug message, And you can't break to DDB? It is expected you will have to when you encounter a deadlock. > panic whatever, > but at leasts it dumps the crash kernel to /var/crash, but the debug > looks like this: Dumping is an activity the occurs after panicking, so if you have a dump then you must have a panic. > root@hanka:/usr/src/sys/amd64/compile/HANKA-debug# kgdb kernel.debug > /var/crash/vmcore.0 > [GDB will not be able to debug user-mode threads: > /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you > are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd". > Cannot access memory at address 0x57 > (kgdb) backtrace > #0 0x0000000000000000 in ?? () > Cannot access memory at address 0x0 > (kgdb) > I have some debug messages at boot time: This usually means you are not running the dump against the kernel.debug corresponding exactly to the kernel that crashed. Probably the dump you found was old. > Feb 28 17:01:29 hanka lock order reversal: > Feb 28 17:01:29 hanka 1st 0xffffffff80b0e1e8 PFil hook read/write mutex > (PFil hook read/write mutex) @ net/pfil.c:73 > Feb 28 17:01:29 hanka 2nd 0xffffffff80b0f8e8 udp (udp) @ > /usr/src/sys/modules/pf/../../contrib/pf/net/pf.c:2970 > Feb 28 17:01:29 hanka KDB: stack backtrace: > Feb 28 17:01:29 hanka db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > Feb 28 17:01:29 hanka witness_checkorder() at witness_checkorder+0x605 > Feb 28 17:01:29 hanka _mtx_lock_flags() at _mtx_lock_flags+0x75 > Feb 28 17:01:29 hanka pf_socket_lookup() at pf_socket_lookup+0x241 > Feb 28 17:01:29 hanka pf_test_udp() at pf_test_udp+0x890 > Feb 28 17:01:29 hanka pf_test() at pf_test+0xeb2 > Feb 28 17:01:29 hanka pf_check_in() at pf_check_in+0x2b > Feb 28 17:01:29 hanka pfil_run_hooks() at pfil_run_hooks+0xbc > Feb 28 17:01:29 hanka ip_input() at ip_input+0x2c4 > Feb 28 17:01:29 hanka ether_demux() at ether_demux+0x1d9 > Feb 28 17:01:29 hanka ether_input() at ether_input+0x19d > Feb 28 17:01:29 hanka ether_demux() at ether_demux+0xe9 > Feb 28 17:01:29 hanka ether_input() at ether_input+0x19d > Feb 28 17:01:29 hanka em_rxeof() at em_rxeof+0x1ca > Feb 28 17:01:29 hanka em_poll() at em_poll+0x6f > Feb 28 17:01:29 hanka netisr_poll() at netisr_poll+0x87 > Feb 28 17:01:29 hanka swi_net() at swi_net+0xea > Feb 28 17:01:29 hanka ithread_loop() at ithread_loop+0xda > Feb 28 17:01:29 hanka fork_exit() at fork_exit+0x12a > Feb 28 17:01:29 hanka fork_trampoline() at fork_trampoline+0xe > Feb 28 17:01:29 hanka --- trap 0, rip = 0, rsp = 0xffffffff9feead30, rbp > = 0 --- > Feb 28 17:01:29 hanka lock order reversal: > Feb 28 17:01:29 hanka 1st 0xffffff0001237430 em0 (EM Core Lock) @ > dev/em/if_em.c:1416 > Feb 28 17:01:29 hanka 2nd 0xffffffff80b0f8e8 udp (udp) @ > netinet/udp_usrreq.c:385 > Feb 28 17:01:29 hanka KDB: stack backtrace: > Feb 28 17:01:29 hanka db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > Feb 28 17:01:29 hanka witness_checkorder() at witness_checkorder+0x605 > Feb 28 17:01:29 hanka _mtx_lock_flags() at _mtx_lock_flags+0x75 > Feb 28 17:01:29 hanka udp_input() at udp_input+0x1a4 > Feb 28 17:01:29 hanka ip_input() at ip_input+0xc0 > Feb 28 17:01:29 hanka ether_demux() at ether_demux+0x1d9 > Feb 28 17:01:29 hanka ether_input() at ether_input+0x19d > Feb 28 17:01:29 hanka ether_demux() at ether_demux+0xe9 > Feb 28 17:01:29 hanka ether_input() at ether_input+0x19d > Feb 28 17:01:29 hanka em_rxeof() at em_rxeof+0x1ca > Feb 28 17:01:29 hanka em_poll() at em_poll+0x6f > Feb 28 17:01:29 hanka netisr_poll() at netisr_poll+0x87 > Feb 28 17:01:29 hanka swi_net() at swi_net+0xea > Feb 28 17:01:29 hanka ithread_loop() at ithread_loop+0xda > Feb 28 17:01:29 hanka fork_exit() at fork_exit+0x12a > Feb 28 17:01:29 hanka fork_trampoline() at fork_trampoline+0xe > Feb 28 17:01:29 hanka --- trap 0, rip = 0, rsp = 0xffffffff9feead30, rbp > = 0 --- > Feb 28 17:01:29 hanka xinetd[1356]: xinetd Version 2.3.14 started with > libwrap loadavg options compiled in. > Feb 28 17:01:29 hanka xinetd[1356]: Started working: 1 available service > Feb 28 17:01:33 hanka lock order reversal: > Feb 28 17:01:33 hanka 1st 0xffffff00042f5b10 inp (tcpinp) @ > netinet/tcp_usrreq.c:781 > Feb 28 17:01:33 hanka 2nd 0xffffffff80b0e1e8 PFil hook read/write mutex > (PFil hook read/write mutex) @ net/pfil.c:73 > Feb 28 17:01:33 hanka KDB: stack backtrace: > Feb 28 17:01:33 hanka db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > Feb 28 17:01:33 hanka witness_checkorder() at witness_checkorder+0x605 > Feb 28 17:01:33 hanka _rw_rlock() at _rw_rlock+0x5b > Feb 28 17:01:33 hanka pfil_run_hooks() at pfil_run_hooks+0x44 > Feb 28 17:01:33 hanka ip_output() at ip_output+0x38e > Feb 28 17:01:33 hanka tcp_output() at tcp_output+0xacd > Feb 28 17:01:33 hanka tcp_usr_send() at tcp_usr_send+0x272 > Feb 28 17:01:33 hanka sosend_generic() at sosend_generic+0x347 > Feb 28 17:01:33 hanka soo_write() at soo_write+0x38 > Feb 28 17:01:33 hanka dofilewrite() at dofilewrite+0x85 > Feb 28 17:01:33 hanka kern_writev() at > Feb 28 17:01:33 hanka kern_writev+0x4c > Feb 28 17:01:33 hanka write() at write+0x54 > Feb 28 17:01:33 hanka syscall() at syscall+0x1f6 > Feb 28 17:01:33 hanka Xfast_syscall() at Xfast_syscall+0xab > Feb 28 17:01:33 hanka --- syscall (4, FreeBSD ELF64, write), rip = > 0x800e1fd3c, rsp = 0x7fffffffea88, rbp = 0x16 --- > Feb 28 17:03:54 hanka lock order reversal: > Feb 28 17:03:54 hanka 1st 0xffffffff80b0f308 tcp (tcp) @ > netinet/tcp_input.c:400 > Feb 28 17:03:54 hanka 2nd 0xffffffff80b0e1e8 PFil hook read/write mutex > (PFil hook read/write mutex) @ net/pfil.c:73 > Feb 28 17:03:54 hanka KDB: stack backtrace: > Feb 28 17:03:54 hanka db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > Feb 28 17:03:54 hanka witness_checkorder() at witness_checkorder+0x605 > Feb 28 17:03:54 hanka _rw_rlock() at _rw_rlock+0x5b > Feb 28 17:03:54 hanka pfil_run_hooks() at pfil_run_hooks+0x44 > Feb 28 17:03:54 hanka ip_output() at ip_output+0x38e > Feb 28 17:03:54 hanka tcp_respond() at tcp_respond+0x2d5 > Feb 28 17:03:54 hanka tcp_dropwithreset() at tcp_dropwithreset+0x131 > Feb 28 17:03:54 hanka tcp_input() at tcp_input+0x6d0 > Feb 28 17:03:54 hanka ip_input() at ip_input+0xc0 > Feb 28 17:03:54 hanka ether_demux() at ether_demux+0x1d9 > Feb 28 17:03:54 hanka ether_input() at ether_input+0x19d > Feb 28 17:03:54 hanka em_rxeof() at em_rxeof+0x1ca > Feb 28 17:03:54 hanka em_poll() at em_poll+0x6f > Feb 28 17:03:54 hanka netisr_poll() at netisr_poll+0x87 > Feb 28 17:03:54 hanka swi_net() at swi_net+0xea > Feb 28 17:03:54 hanka ithread_loop() at ithread_loop+0xda > Feb 28 17:03:54 hanka fork_exit() at fork_exit+0x12a > Feb 28 17:03:54 hanka fork_trampoline() at fork_trampoline+0xe > Feb 28 17:03:54 hanka --- trap 0, rip = 0, rsp = 0xffffffff9feead30, rbp > = 0 --- > Feb 28 17:23:36 hanka lock order reversal: > Feb 28 17:23:36 hanka 1st 0xffffffff810f9ef0 tcp_sc_head (tcp_sc_head) @ > netinet/tcp_syncache.c:477 > Feb 28 17:23:36 hanka 2nd 0xffffffff80b0e1e8 PFil hook read/write mutex > (PFil hook read/write mutex) @ net/pfil.c:73 > Feb 28 17:23:36 hanka KDB: stack backtrace: > Feb 28 17:23:36 hanka db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > Feb 28 17:23:36 hanka witness_checkorder() at witness_checkorder+0x605 > Feb 28 17:23:36 hanka _rw_rlock() at _rw_rlock+0x5b > Feb 28 17:23:36 hanka pfil_run_hooks() at pfil_run_hooks+0x44 > Feb 28 17:23:36 hanka ip_output() at ip_output+0x38e > Feb 28 17:23:36 hanka syncache_respond() at syncache_respond+0x32a > Feb 28 17:23:36 hanka syncache_add() at syncache_add+0x206 > Feb 28 17:23:36 hanka tcp_input() at tcp_input+0xa77 > Feb 28 17:23:36 hanka ip_input() at ip_input+0xc0 > Feb 28 17:23:36 hanka ether_demux() at ether_demux+0x1d9 > Feb 28 17:23:36 hanka ether_input() at ether_input+0x19d > Feb 28 17:23:36 hanka em_rxeof() at em_rxeof+0x1ca > Feb 28 17:23:36 hanka em_poll() at em_poll+0x6f > Feb 28 17:23:36 hanka netisr_poll() at netisr_poll+0x87 > Feb 28 17:23:36 hanka swi_net() at swi_net+0xea > Feb 28 17:23:36 hanka ithread_loop() at ithread_loop+0xda > Feb 28 17:23:36 hanka fork_exit() at fork_exit+0x12a > Feb 28 17:23:36 hanka fork_trampoline() at fork_trampoline+0xe > Feb 28 17:23:36 hanka --- trap 0, rip = 0, rsp = 0xffffffff9feead30, rbp > = 0 --- These look useful and should point to the problem. It is also probably related to something in your script: > I tried it with pf disabled, only running ipfw without dummynet, > everything is ok. > I tried it witout pf disabled, running ipfw with dummynet queue for > small traffic, everything is ok. > I tried it with/without pf disabled, running ipfw with dummynet queue > for hight traffic >100Mbit/s, the box freezes, everytime, after enabling > rules with the dummynet queue, after some second or few minutes, maybe > depending on the high traffic. > > The shaper script: > # SHAPER > IPFW_PIPE_DOWNLOAD=1 > IPFW_PIPE_UPLOAD=11 > IPFW_QUEUE_DOWNLOAD=1 > IPFW_QUEUE_UPLOAD=11 > > DOWNLOAD_ROOT="450Mbit/s" > UPLOAD_ROOT="450Mbit/s" > SHAPER_BUCKETS="1024" > > IPFW_SHAPER_ROOT=1000 > > $CMD pipe $IPFW_PIPE_DOWNLOAD config bw $DOWNLOAD_ROOT buckets > $SHAPER_BUCKETS > $CMD pipe $IPFW_PIPE_UPLOAD config bw $UPLOAD_ROOT buckets $SHAPER_BUCKETS > > $CMD queue $IPFW_QUEUE_DOWNLOAD config pipe $IPFW_PIPE_DOWNLOAD buckets > $SHAPER_BUCKETS mask dst-ip 0xFFFFFFFF > $CMD queue $IPFW_QUEUE_UPLOAD config pipe $IPFW_PIPE_UPLOAD buckets > $SHAPER_BUCKETS mask src-ip 0xFFFFFFFF > > $CMD delete $IPFW_SHAPER_ROOT > $CMD $IPFW_SHAPER_ROOT add queue $IPFW_QUEUE_DOWNLOAD ip from me to any > out // Share download > $CMD $IPFW_SHAPER_ROOT add queue $IPFW_QUEUE_UPLOAD ip from any to me in > // Share upload Thanks for the information. Kris
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?47C7E068.5040208>