Date: Fri, 13 Mar 2009 16:01:58 +1100 From: Nick Withers <nick@nickwithers.com> To: freebsd-stable@freebsd.org Subject: NICs locking up, "*tcp_sc_h" Message-ID: <1236920519.1490.30.camel@localhost>
next in thread | raw e-mail | index | archive | help
--=-yvVB+Sk5YAzwJ0XlOtOZ Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Hello all, I recently installed my first amd64 system (currently running RELENG_7 from 2009-03-11) to replace an aged ppc box and have been having dramas with the network locking up. Breaking into the debugger manually and ps-ing shows the network card (e.g., "[irq20: fxp0+]") in state "LL" in "*tcp_sc_h". It seems the process(es) trying to access the card at the time is / are in state "L" in "*tcp". I thought this may have been something-or-other in the fxp driver, so installed an rl card and sadly ran into the issue again. The console appears unresponsive, but I can get into the debugger (and as soon as I have, input I'd sent seems to "go through", e.g., if I hit "Enter" a couple o' times, nothing happens; when I <Ctrl>+<Alt>+<Esc> into the debugger a few login prompts pop up before the debugger output). A "where" on the fxp / rl process (thread?) gives (transcribed from the console): ____ Tracing PID 31 tid 100030 td 0xffffff00012016e0 sched_switch() at sched_switch+0xf1 mi_switch() at mi_switch+0x18f turnstile_wait() at turnstile_wait+0x1cf _mtx_lock_sleep() at _mtx_lock_sleep+0x76 syncache_lookup() at syncache_lookup+0x176 syncache_expand() at syncache_expand+0x38 tcp_input() at tcp_input+0xa7d ip_input() at ip_input+0xa8 ether_demux() at ether_demux+0x1b9 ether_input() at ether_input+0x1bb fxp_intr() at fxp_intr+0x233 ithread_loop() at ithread_loop+0x17f fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe ____ A "where" on a process stuck in "*tcp", in this case "[swi4: clock]", gave the somewhat similar: ____ sched_switch() at sched_switch+0xf1 mi_switch() at mi_switch+0x18f turnstile_wait() at turnstile_wait+0x1cf _rw_rlock() at _rw_rlock+0x8c ipfw_chk() at ipfw_chk+0x3ab2 ipfw_check_out() at ipfw_check_out+0xb1 pfil_run_hooks() at pfil_run_hooks+0x9c ip_output() at ip_output+0x367 syncache_respond() at syncache_respond+0x2fd syncache_timer() at syncache_timer+0x15a (...) ____ In this particular case, the fxp0 card is in a lagg with rl0, but this problem can be triggered with either card on their own... The scheduler is SCHED_ULE. I'm not too sure how to give more useful information that this, I'm afraid. It's a custom kernel, too... Do I need to supply information on what code actually exists at the relevant addresses (I'm not at all clued in on how to do this... Sorry!)? Should I chuck WITNESS, INVARIANTS et al. in? I *think* every time this has been triggered there's been a "python2.5" process in the "*tcp" state. This machine runs net-p2p/deluge and generally has at least 100 TCP connections on the go at any given time. Can anyone give me a clue as to what I might do to track this down? Appreciate any pointers. --=20 Nick Withers email: nick@nickwithers.com Web: http://www.nickwithers.com Mobile: +61 414 397 446 --=-yvVB+Sk5YAzwJ0XlOtOZ Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.11 (FreeBSD) iEYEABECAAYFAkm56MYACgkQ3wcG/Pf4WrjPkgCgrfzOiRqDgCVnOx4TnLY1/NLT 9TgAoIghvGP9/lbqKVGh2TRLUenEsb6U =GWu+ -----END PGP SIGNATURE----- --=-yvVB+Sk5YAzwJ0XlOtOZ--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1236920519.1490.30.camel>