From owner-freebsd-current@FreeBSD.ORG Sun Jul 11 10:06:02 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E2EAF16A4CE; Sun, 11 Jul 2004 10:06:01 +0000 (GMT) Received: from tuminfo2.informatik.tu-muenchen.de (tuminfo2.informatik.tu-muenchen.de [131.159.0.81]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0CC8043D45; Sun, 11 Jul 2004 10:06:01 +0000 (GMT) (envelope-from langd@informatik.tu-muenchen.de) Date: Sun, 11 Jul 2004 12:05:53 +0200 From: Daniel Lang To: Don Lewis Message-ID: <20040711100553.GA64553@atrbg11.informatik.tu-muenchen.de> References: <200407102324.i6ANOlEs015698@gw.catspoiler.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200407102324.i6ANOlEs015698@gw.catspoiler.org> X-Geek: GCS/CC d-- s: a- C++$ UBS++++$ P+++$ L- E-(---) W+++(--) N++ o K w--- O? M? V? PS+(++) PE--(+) Y+ PGP+ t++ 5+++ X R+(-) tv+ b+ DI++ D++ G++ e+++ h---(-) r+++ y+ User-Agent: Mutt/1.5.6i X-Virus-Scanned: by amavisd-new at informatik.tu-muenchen.de cc: ps@FreeBSD.org cc: rwatson@FreeBSD.org cc: current@FreeBSD.org Subject: Re: panic: m_copym, length > size of mbuf chain X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Jul 2004 10:06:02 -0000 Hi Don, referring to your first answer, the 'len' parameter in the tcp_output.c frame is 1460, offset is 737. The sum is obviously greater than 975, the value of so->so_snd.sb_cc. So the suggested assertion from Robert would have been triggered. I consider now adding the SOCKBUF_DEBUG value. However, with SACK disabled, the machine is still up and running now. Don Lewis wrote on Sat, Jul 10, 2004 at 04:24:47PM -0700: [..] > > (2) Try adding some assertions just before the copy to m_copy() in > > tcp_output(). I'd suggest something like the following: > > I'm very suspicious of the SACK code. In the non-SACK case, len gets > set here: > > if (!sack_rxmit) > len = ((long)ulmin(so->so_snd.sb_cc, sendwin) - off); > > but when the system panics len+off > sb_cc. Yes. > It would be interesting to look at *tp and *p in the tcp_output stack > frame. > > If I had to guess, I'd say that either tp->snd_recover-tp->snd_una or > p->end-tp->snd_una is greater than so->so_snd.sb_cc. (kgdb) p *tp $4 = {t_segq = {lh_first = 0x0}, t_segqlen = 0, t_dupacks = 16, unused = 0x0, tt_rexmt = 0xc3f50148, tt_persist = 0xc3f50160, tt_keep = 0xc3f50178, tt_2msl = 0xc3f50190, tt_delack = 0xc3f501a8, t_inpcb = 0xc4d592d0, t_state = 5, t_flags = 1049092, t_force = 0, snd_una = 2644477935, snd_max = 2644478910, snd_nxt = 2644478910, snd_up = 2644477935, snd_wl1 = 465530853, snd_wl2 = 2644477935, iss = 2644477934, irs = 465530852, rcv_nxt = 465530854, rcv_adv = 465596389, rcv_wnd = 65700, rcv_up = 465530854, snd_wnd = 17520, snd_cwnd = 26280, snd_bwnd = 1073725440, snd_ssthresh = 2920, snd_bandwidth = 3498991, snd_recover = 2644478412, t_maxopd = 1460, t_rcvtime = 333934, t_starttime = 330457, t_rtttime = 333904, t_rtseq = 2644478412, t_bw_rtttime = 330457, t_bw_rtseq = 0, t_rxtcur = 145, t_maxseg = 1460, t_srtt = 717, t_rttvar = 72, t_rxtshift = 0, t_rttmin = 3, t_rttbest = 749, t_rttupdated = 0, max_sndwnd = 17520, t_softerror = 0, t_oobflags = 0 '\0', t_iobc = 0 '\0', snd_scale = 0 '\0', rcv_scale = 0 '\0', request_r_scale = 0 '\0', requested_s_scale = 0 '\0', ts_recent = 0, ts_recent_age = 0, last_ack_sent = 465530854, cc_send = 0, cc_recv = 0, snd_cwnd_prev = 0, snd_ssthresh_prev = 0, snd_recover_prev = 0, t_badrxtwin = 0, snd_limited = 0 '\0', rcv_second = 0, rcv_pps = 0, rcv_byps = 0, sack_enable = 1, snd_numholes = 4, snd_holes = 0xc4280be0, rcv_laststart = 465530854, rcv_lastend = 465530854, rcv_lastsack = 2644478693, rcv_numsacks = 0, sackblks = {{start = 0, end = 0}, {start = 0, end = 0}, {start = 0, end = 0}, {start = 0, end = 0}, {start = 0, end = 0}, {start = 0, end = 0}}} So snd_recover - snd_una = 2644478412 - 2644477935 = 477, this is less than so->so_snd.sb_cc = 975. (kgdb) p *p $6 = {start = 2644478672, end = 2644478686, rxmit = 2644478672, next = 0x0} p->end - snd_una = 2644478686 - 2644477935 = 751, again less. Hmmm, I inspected the code in tcp_output.c about occurences of 'len', I stumbled across this code: [..] /* * NOTE! on localhost connections an 'ack' from the remote * end may occur synchronously with the output and cause * us to flush a buffer queued with moretocome. XXX * * note: the len + off check is almost certainly unnecessary. */ if (!(tp->t_flags & TF_MORETOCOME) && /* normal case */ (idle || (tp->t_flags & TF_NODELAY)) && len + off >= so->so_snd.sb_cc && (tp->t_flags & TF_NOPUSH) == 0) { goto send; [..] So here there is actually a check, but it does not seem to be a big problem, as later the length is adjusted if len + optlen + ipoptlen > tp->t_maxopd And as above confirmed, the len is 1460 throughout this frame.... Best regards, Daniel -- IRCnet: Mr-Spock - My name is Pentium of Borg, division is futile, you will be approximated. - Daniel Lang * dl@leo.org * +49 89 289 18532 * http://www.leo.org/~dl/