From owner-freebsd-stable@FreeBSD.ORG Mon Jan 5 13:18:28 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 94ADC1065670 for ; Mon, 5 Jan 2009 13:18:28 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 542288FC2B for ; Mon, 5 Jan 2009 13:18:28 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id 07D3146B06; Mon, 5 Jan 2009 08:18:28 -0500 (EST) Date: Mon, 5 Jan 2009 13:18:27 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Terry Kennedy In-Reply-To: <01N3VGDZ7EOM0008L3@tmk.com> Message-ID: References: <01N3OFGBCXMS000125@tmk.com> <01N3OYSUCHAE000125@tmk.com> <01N3VGDZ7EOM0008L3@tmk.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Peter Jeremy , freebsd-stable@freebsd.org Subject: Re: rdump stuck in sbwait state (RELENG_7) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Jan 2009 13:18:28 -0000 On Sat, 3 Jan 2009, Terry Kennedy wrote: >> Sorry, I can't think of any - by the time you see it hung, whatever went >> wrong has already happened. You might glean some insight from the TCP >> socket state (on the FreeBSD side, use 'netstat -A' to print the PCB >> address and gdb to dump the contents but I'm not sure how to get this data >> out of OpenVMS). The '-C' and '-W' options to tcpdump will help. > > Ok, I found some time to reproduce this while capturing a trace with > tcpdump. > > Here's the relevant output from netstat / kgdb: I may have missed this earlier in the thread, but I don't see a kernel stack trace of the stuck thread/process. Could you grab one using procstat -k, DDB, or KGDB? I'd like to confirm that the 'sbwait' really reflects waiting to send, rather than waiting to receive, which (for better or worse) uses the same wmesg. procstat -k may be the simplest of the above to do if your system is reasonable recent. Robert N M Watson Computer Laboratory University of Cambridge > > (0:31) test4:~terry# netstat -A > Active Internet connections > Tcpcb Proto Recv-Q Send-Q Local Address Foreign Address (state) > c73eeae0 tcp4 0 0 test4.892 server.shell ESTABLISHED > [snip] > > (0:32) test4:~terry# kgdb > GNU gdb 6.1.1 [FreeBSD] > [snip] > #0 0x00000000 in ?? () > (kgdb) print * (struct tcpcb *) 0xc73eeae0 > $1 = {t_segq = {lh_first = 0x0}, t_segqlen = 0, t_dupacks = 0, > t_timers = 0xc73eec24, t_inpcb = 0xc7387708, t_state = 4, t_flags = 484, > snd_una = 292841209, snd_max = 292841209, snd_nxt = 292841209, > snd_up = 292780017, snd_wl1 = 3606352422, snd_wl2 = 292841209, > iss = 3955646224, irs = 3606284909, rcv_nxt = 3606352422, > rcv_adv = 3606415910, rcv_wnd = 63488, rcv_up = 3606352422, snd_wnd = 65535, > snd_cwnd = 65535, snd_bwnd = 1073725440, snd_ssthresh = 1073725440, > snd_bandwidth = 0, snd_recover = 3955646224, t_maxopd = 1460, > t_rcvtime = 11273919, t_starttime = 11024967, t_rtttime = 0, > t_rtseq = 292839154, t_bw_rtttime = 11024966, t_bw_rtseq = 3955646224, > t_rxtcur = 230, t_maxseg = 1448, t_srtt = 145, t_rttvar = 34, > t_rxtshift = 0, t_rttmin = 30, t_rttbest = 67, t_rttupdated = 232101, > max_sndwnd = 65535, t_softerror = 0, t_oobflags = 0 '\0', t_iobc = 0 '\0', > snd_scale = 0 '\0', rcv_scale = 3 '\003', request_r_scale = 3 '\003', > ts_recent = 1207233, ts_recent_age = 11273919, ts_offset = 0, > last_ack_sent = 3606352422, snd_cwnd_prev = 0, snd_ssthresh_prev = 0, > snd_recover_prev = 0, t_badrxtwin = 0, snd_limited = 0 '\0', > snd_numholes = 0, snd_holes = {tqh_first = 0x0, tqh_last = 0xc73eebb8}, > snd_fack = 0, rcv_numsacks = 0, sackblks = {{start = 0, end = 0}, { > start = 0, end = 0}, {start = 0, end = 0}, {start = 0, end = 0}, { > start = 0, end = 0}, {start = 0, end = 0}}, sack_newdata = 0, > sackhint = {nexthole = 0x0, sack_bytes_rexmit = 0}, t_rttlow = 1, > rfbuf_ts = 0, rfbuf_cnt = 0, t_pspare = {0x0, 0x0, 0x0}, t_tu = 0x0, > t_toe = 0x0} > (kgdb) q > > Rather than pasting the decoded tcpdump output here, the raw capture > file is at http://www.tmk.com/transient/rdump30.gz (it is only 76KB > compressed, 270KB uncompressed). It looks to me like the remote host > (the VMS box) has correctly ack'd all outstanding data from the FreeBSD > host, but that the FreeBSD host is just sitting there for some reason. > > As before, I have this sitting in the wedged state so if anyone needs > more data, I can either collect it or give you access to the system. > > Terry Kennedy http://www.tmk.com > terry@tmk.com New York, NY USA > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >