Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 29 Feb 2008 11:37:28 +0100
From:      Kris Kennaway <kris@FreeBSD.org>
To:        matthew <matthew@matthew.sk>, freebsd-net@freebsd.org,  freebsd-ipfw@FreeBSD.org
Subject:   Lock order reversals with dummynet (Re: FreeBSD 7.0 Beta, RC, RELEASE (amd64) freezes with dummynet enabled)
Message-ID:  <47C7E068.5040208@FreeBSD.org>
In-Reply-To: <47C7BB11.9020604@matthew.sk>
References:  <47C5753A.8010806@matthew.sk> <47C57D01.8070007@FreeBSD.org> <47C7BB11.9020604@matthew.sk>

next in thread | previous in thread | raw e-mail | index | archive | help
Adding back the mailing list so others can help.

matthew wrote:
> Kris Kennaway wrote:
> 
>> matthew wrote:
>>> I have posted before that i have a stability issue with the 7.0 branch
>>> on my servers. Tested on BETA2,BETA4,RC1,RC2,RELEASE
>>>
>>> The original thread and my post with details is at:
>>> http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-12/msg00674.html 
>>>
>>>
>>> I was waiting for the 7.0-RELEASE, updated the whole servers, and
>>> enabled dummynet again, but it always freezes after some minutes, 100%
>>> reproducible.
>>>
>>> I tested it also on a HP 140 G3 1U server, where 6.3 has absolutely no
>>> problems, but the 7.0 branch keeps freezing.
>>>
>>> Again, if it helps to solve this bug, i can rebuild the kernel with
>>> debug symbols a take some screenshots :)
>>
>> Please follow the steps at
>>
>> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html 
>>
>>
>> Kris
>> _______________________________________________
> I added to the kernel these options:
> makeoptions     DEBUG=-g                # Build kernel with gdb(1) debug 
> symbols
> options         INVARIANTS
> options         INVARIANT_SUPPORT
> options         WITNESS
> options         DEBUG_LOCKS
> options         DEBUG_VFS_LOCKS
> options         DIAGNOSTIC
> options         KDB
> options         DDB
> options         GDB
> options         SOCKBUF_DEBUG
> 
> But still, the server freezes withoutany debug message,

And you can't break to DDB?  It is expected you will have to when you 
encounter a deadlock.

 > panic whatever,
> but at leasts it dumps the crash kernel to /var/crash, but the debug 
> looks like this:

Dumping is an activity the occurs after panicking, so if you have a dump 
then you must have a panic.

> root@hanka:/usr/src/sys/amd64/compile/HANKA-debug# kgdb kernel.debug 
> /var/crash/vmcore.0
> [GDB will not be able to debug user-mode threads: 
> /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you 
> are
> welcome to change it and/or distribute copies of it under certain 
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd".
> Cannot access memory at address 0x57
> (kgdb) backtrace
> #0  0x0000000000000000 in ?? ()
> Cannot access memory at address 0x0
> (kgdb)                                   
> I have some debug messages at boot time:

This usually means you are not running the dump against the kernel.debug 
corresponding exactly to the kernel that crashed.  Probably the dump you 
found was old.

> Feb 28 17:01:29 hanka lock order reversal:
> Feb 28 17:01:29 hanka 1st 0xffffffff80b0e1e8 PFil hook read/write mutex 
> (PFil hook read/write mutex) @ net/pfil.c:73
> Feb 28 17:01:29 hanka 2nd 0xffffffff80b0f8e8 udp (udp) @ 
> /usr/src/sys/modules/pf/../../contrib/pf/net/pf.c:2970
> Feb 28 17:01:29 hanka KDB: stack backtrace:
> Feb 28 17:01:29 hanka db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> Feb 28 17:01:29 hanka witness_checkorder() at witness_checkorder+0x605
> Feb 28 17:01:29 hanka _mtx_lock_flags() at _mtx_lock_flags+0x75
> Feb 28 17:01:29 hanka pf_socket_lookup() at pf_socket_lookup+0x241
> Feb 28 17:01:29 hanka pf_test_udp() at pf_test_udp+0x890
> Feb 28 17:01:29 hanka pf_test() at pf_test+0xeb2
> Feb 28 17:01:29 hanka pf_check_in() at pf_check_in+0x2b
> Feb 28 17:01:29 hanka pfil_run_hooks() at pfil_run_hooks+0xbc
> Feb 28 17:01:29 hanka ip_input() at ip_input+0x2c4
> Feb 28 17:01:29 hanka ether_demux() at ether_demux+0x1d9
> Feb 28 17:01:29 hanka ether_input() at ether_input+0x19d
> Feb 28 17:01:29 hanka ether_demux() at ether_demux+0xe9
> Feb 28 17:01:29 hanka ether_input() at ether_input+0x19d
> Feb 28 17:01:29 hanka em_rxeof() at em_rxeof+0x1ca
> Feb 28 17:01:29 hanka em_poll() at em_poll+0x6f
> Feb 28 17:01:29 hanka netisr_poll() at netisr_poll+0x87
> Feb 28 17:01:29 hanka swi_net() at swi_net+0xea
> Feb 28 17:01:29 hanka ithread_loop() at ithread_loop+0xda
> Feb 28 17:01:29 hanka fork_exit() at fork_exit+0x12a
> Feb 28 17:01:29 hanka fork_trampoline() at fork_trampoline+0xe
> Feb 28 17:01:29 hanka --- trap 0, rip = 0, rsp = 0xffffffff9feead30, rbp 
> = 0 ---


> Feb 28 17:01:29 hanka lock order reversal:
> Feb 28 17:01:29 hanka 1st 0xffffff0001237430 em0 (EM Core Lock) @ 
> dev/em/if_em.c:1416
> Feb 28 17:01:29 hanka 2nd 0xffffffff80b0f8e8 udp (udp) @ 
> netinet/udp_usrreq.c:385
> Feb 28 17:01:29 hanka KDB: stack backtrace:
> Feb 28 17:01:29 hanka db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> Feb 28 17:01:29 hanka witness_checkorder() at witness_checkorder+0x605
> Feb 28 17:01:29 hanka _mtx_lock_flags() at _mtx_lock_flags+0x75
> Feb 28 17:01:29 hanka udp_input() at udp_input+0x1a4
> Feb 28 17:01:29 hanka ip_input() at ip_input+0xc0
> Feb 28 17:01:29 hanka ether_demux() at ether_demux+0x1d9
> Feb 28 17:01:29 hanka ether_input() at ether_input+0x19d
> Feb 28 17:01:29 hanka ether_demux() at ether_demux+0xe9
> Feb 28 17:01:29 hanka ether_input() at ether_input+0x19d
> Feb 28 17:01:29 hanka em_rxeof() at em_rxeof+0x1ca
> Feb 28 17:01:29 hanka em_poll() at em_poll+0x6f
> Feb 28 17:01:29 hanka netisr_poll() at netisr_poll+0x87
> Feb 28 17:01:29 hanka swi_net() at swi_net+0xea
> Feb 28 17:01:29 hanka ithread_loop() at ithread_loop+0xda
> Feb 28 17:01:29 hanka fork_exit() at fork_exit+0x12a
> Feb 28 17:01:29 hanka fork_trampoline() at fork_trampoline+0xe
> Feb 28 17:01:29 hanka --- trap 0, rip = 0, rsp = 0xffffffff9feead30, rbp 
> = 0 ---


> Feb 28 17:01:29 hanka xinetd[1356]: xinetd Version 2.3.14 started with 
> libwrap loadavg options compiled in.
> Feb 28 17:01:29 hanka xinetd[1356]: Started working: 1 available service
> Feb 28 17:01:33 hanka lock order reversal:
> Feb 28 17:01:33 hanka 1st 0xffffff00042f5b10 inp (tcpinp) @ 
> netinet/tcp_usrreq.c:781
> Feb 28 17:01:33 hanka 2nd 0xffffffff80b0e1e8 PFil hook read/write mutex 
> (PFil hook read/write mutex) @ net/pfil.c:73
> Feb 28 17:01:33 hanka KDB: stack backtrace:
> Feb 28 17:01:33 hanka db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> Feb 28 17:01:33 hanka witness_checkorder() at witness_checkorder+0x605
> Feb 28 17:01:33 hanka _rw_rlock() at _rw_rlock+0x5b
> Feb 28 17:01:33 hanka pfil_run_hooks() at pfil_run_hooks+0x44
> Feb 28 17:01:33 hanka ip_output() at ip_output+0x38e
> Feb 28 17:01:33 hanka tcp_output() at tcp_output+0xacd
> Feb 28 17:01:33 hanka tcp_usr_send() at tcp_usr_send+0x272
> Feb 28 17:01:33 hanka sosend_generic() at sosend_generic+0x347
> Feb 28 17:01:33 hanka soo_write() at soo_write+0x38
> Feb 28 17:01:33 hanka dofilewrite() at dofilewrite+0x85
> Feb 28 17:01:33 hanka kern_writev() at
> Feb 28 17:01:33 hanka kern_writev+0x4c
> Feb 28 17:01:33 hanka write() at write+0x54
> Feb 28 17:01:33 hanka syscall() at syscall+0x1f6
> Feb 28 17:01:33 hanka Xfast_syscall() at Xfast_syscall+0xab
> Feb 28 17:01:33 hanka --- syscall (4, FreeBSD ELF64, write), rip = 
> 0x800e1fd3c, rsp = 0x7fffffffea88, rbp = 0x16 ---


> Feb 28 17:03:54 hanka lock order reversal:
> Feb 28 17:03:54 hanka 1st 0xffffffff80b0f308 tcp (tcp) @ 
> netinet/tcp_input.c:400
> Feb 28 17:03:54 hanka 2nd 0xffffffff80b0e1e8 PFil hook read/write mutex 
> (PFil hook read/write mutex) @ net/pfil.c:73
> Feb 28 17:03:54 hanka KDB: stack backtrace:
> Feb 28 17:03:54 hanka db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> Feb 28 17:03:54 hanka witness_checkorder() at witness_checkorder+0x605
> Feb 28 17:03:54 hanka _rw_rlock() at _rw_rlock+0x5b
> Feb 28 17:03:54 hanka pfil_run_hooks() at pfil_run_hooks+0x44
> Feb 28 17:03:54 hanka ip_output() at ip_output+0x38e
> Feb 28 17:03:54 hanka tcp_respond() at tcp_respond+0x2d5
> Feb 28 17:03:54 hanka tcp_dropwithreset() at tcp_dropwithreset+0x131
> Feb 28 17:03:54 hanka tcp_input() at tcp_input+0x6d0
> Feb 28 17:03:54 hanka ip_input() at ip_input+0xc0
> Feb 28 17:03:54 hanka ether_demux() at ether_demux+0x1d9
> Feb 28 17:03:54 hanka ether_input() at ether_input+0x19d
> Feb 28 17:03:54 hanka em_rxeof() at em_rxeof+0x1ca
> Feb 28 17:03:54 hanka em_poll() at em_poll+0x6f
> Feb 28 17:03:54 hanka netisr_poll() at netisr_poll+0x87
> Feb 28 17:03:54 hanka swi_net() at swi_net+0xea
> Feb 28 17:03:54 hanka ithread_loop() at ithread_loop+0xda
> Feb 28 17:03:54 hanka fork_exit() at fork_exit+0x12a
> Feb 28 17:03:54 hanka fork_trampoline() at fork_trampoline+0xe
> Feb 28 17:03:54 hanka --- trap 0, rip = 0, rsp = 0xffffffff9feead30, rbp 
> = 0 ---


> Feb 28 17:23:36 hanka lock order reversal:
> Feb 28 17:23:36 hanka 1st 0xffffffff810f9ef0 tcp_sc_head (tcp_sc_head) @ 
> netinet/tcp_syncache.c:477
> Feb 28 17:23:36 hanka 2nd 0xffffffff80b0e1e8 PFil hook read/write mutex 
> (PFil hook read/write mutex) @ net/pfil.c:73
> Feb 28 17:23:36 hanka KDB: stack backtrace:
> Feb 28 17:23:36 hanka db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> Feb 28 17:23:36 hanka witness_checkorder() at witness_checkorder+0x605
> Feb 28 17:23:36 hanka _rw_rlock() at _rw_rlock+0x5b
> Feb 28 17:23:36 hanka pfil_run_hooks() at pfil_run_hooks+0x44
> Feb 28 17:23:36 hanka ip_output() at ip_output+0x38e
> Feb 28 17:23:36 hanka syncache_respond() at syncache_respond+0x32a
> Feb 28 17:23:36 hanka syncache_add() at syncache_add+0x206
> Feb 28 17:23:36 hanka tcp_input() at tcp_input+0xa77
> Feb 28 17:23:36 hanka ip_input() at ip_input+0xc0
> Feb 28 17:23:36 hanka ether_demux() at ether_demux+0x1d9
> Feb 28 17:23:36 hanka ether_input() at ether_input+0x19d
> Feb 28 17:23:36 hanka em_rxeof() at em_rxeof+0x1ca
> Feb 28 17:23:36 hanka em_poll() at em_poll+0x6f
> Feb 28 17:23:36 hanka netisr_poll() at netisr_poll+0x87
> Feb 28 17:23:36 hanka swi_net() at swi_net+0xea
> Feb 28 17:23:36 hanka ithread_loop() at ithread_loop+0xda
> Feb 28 17:23:36 hanka fork_exit() at fork_exit+0x12a
> Feb 28 17:23:36 hanka fork_trampoline() at fork_trampoline+0xe
> Feb 28 17:23:36 hanka --- trap 0, rip = 0, rsp = 0xffffffff9feead30, rbp 
> = 0 ---

These look useful and should point to the problem.  It is also probably 
related to something in your script:

> I tried it with pf disabled, only running ipfw without dummynet, 
> everything is ok.
> I tried it witout pf disabled, running ipfw with dummynet queue for 
> small traffic, everything is ok.
> I tried it with/without pf disabled, running ipfw with dummynet queue 
> for hight traffic >100Mbit/s, the box freezes, everytime, after enabling 
> rules with the dummynet queue, after some second or few minutes, maybe 
> depending on the high traffic.
> 
> The shaper script:
> # SHAPER
> IPFW_PIPE_DOWNLOAD=1
> IPFW_PIPE_UPLOAD=11
> IPFW_QUEUE_DOWNLOAD=1
> IPFW_QUEUE_UPLOAD=11
> 
> DOWNLOAD_ROOT="450Mbit/s"
> UPLOAD_ROOT="450Mbit/s"
> SHAPER_BUCKETS="1024"
> 
> IPFW_SHAPER_ROOT=1000
> 
> $CMD pipe $IPFW_PIPE_DOWNLOAD config bw $DOWNLOAD_ROOT buckets 
> $SHAPER_BUCKETS
> $CMD pipe $IPFW_PIPE_UPLOAD config bw $UPLOAD_ROOT buckets $SHAPER_BUCKETS
> 
> $CMD queue $IPFW_QUEUE_DOWNLOAD config pipe $IPFW_PIPE_DOWNLOAD buckets 
> $SHAPER_BUCKETS mask dst-ip 0xFFFFFFFF
> $CMD queue $IPFW_QUEUE_UPLOAD config pipe $IPFW_PIPE_UPLOAD buckets 
> $SHAPER_BUCKETS mask src-ip 0xFFFFFFFF
> 
> $CMD delete $IPFW_SHAPER_ROOT
> $CMD $IPFW_SHAPER_ROOT add queue $IPFW_QUEUE_DOWNLOAD ip from me to any 
> out  // Share download
> $CMD $IPFW_SHAPER_ROOT add queue $IPFW_QUEUE_UPLOAD ip from any to me in 
> // Share upload

Thanks for the information.

Kris




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?47C7E068.5040208>