From owner-freebsd-hackers@FreeBSD.ORG Wed Apr 9 16:12:08 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7C5D337B401; Wed, 9 Apr 2003 16:12:08 -0700 (PDT) Received: from postal2.lbl.gov (postal2.lbl.gov [131.243.248.26]) by mx1.FreeBSD.org (Postfix) with ESMTP id B2EA143FAF; Wed, 9 Apr 2003 16:12:07 -0700 (PDT) (envelope-from j_guojun@lbl.gov) Received: from postal2.lbl.gov (localhost [127.0.0.1]) by postal2.lbl.gov (8.12.8/8.12.8) with ESMTP id h39NC5Z6013878; Wed, 9 Apr 2003 16:12:05 -0700 (PDT) Received: from lbl.gov (gracie.lbl.gov [131.243.2.175]) by postal2.lbl.gov (8.12.8/8.12.8) with ESMTP id h39NC4Ig013875; Wed, 9 Apr 2003 16:12:04 -0700 (PDT) Sender: jin@lbl.gov Message-ID: <3E94A8C4.3A196E42@lbl.gov> Date: Wed, 09 Apr 2003 16:12:04 -0700 From: "Jin Guojun [DSD]" X-Mailer: Mozilla 4.76 [en] (X11; U; FreeBSD 4.7-RELEASE i386) X-Accept-Language: zh, zh-CN, en MIME-Version: 1.0 To: freebsd-hackers@freebsd.org, freebsd-performance@freebsd.org References: <3E94A22D.174321F0@lbl.gov> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Subject: Re: tcp_output starving -- is due to mbuf get delay? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Apr 2003 23:12:08 -0000 Some details was left behind -- The machine is 2 GHz Intel P4 with 1 GB memory, so the delay is not from either CPU or lack of memory. -Jin "Jin Guojun [DSD]" wrote: > When testing GigE path that has 67 ms RTT, the maximum TCP throughput is > limited at 250 Mb/s. By tracing the problem, I found that tcp_output() is > starving > where snd_wnd and snd_cwnd are fully open. The snd_cc is never filled beyond > the 4.05MB even though the snd_hiwat is 10MB and snd_sbmax is 8MB. That is, > sosend never stopped at sbwait. So only place can slow down is the mbuf > allocation > in sosend(). The attached trace file shows that each MGET and MCLGET takes > significant time -- around 8 us at slow start time, and gradually increasing > after that > in an range 18 to 648 us. > Each packet Tx on GigE takes 12 us. It average mbuf allocation takes 18 us, then > > the performance will be reduced to 40%, in fact it is down to 25%, which means > higher average delay. > > I have change NMBCLUSTER from 2446 to 6566 to 10240, and nothing is improved. > > Any one can tell what factors would cause MGET / MCLGET to wait? > Is there any way to make MGET/MCLGET not to wait? > > -Jin > > ----------- system info ------------- > > kern.ipc.maxsockbuf: 10485760 > net.inet.tcp.sendspace: 8388608 > kern.ipc.nmbclusters: 10240 > kern.ipc.mbuf_wait: 32 > kern.ipc.mbtypes: 2606 322 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > kern.ipc.nmbufs: 40960 > > -------------- code trace and explanation ---------- > > sosend() > { > ... > if (space < resid + clen && > (atomic || space < so->so_snd.sb_lowat || space < clen)) { > if (so->so_state & SS_NBIO) > snderr(EWOULDBLOCK); > sbunlock(&so->so_snd); > error = sbwait(&so->so_snd); /***** never come > down to here ****/ > splx(s); > if (error) > goto out; > goto restart; > } > splx(s); > mp = ⊤ > space -= clen; > do { > if (uio == NULL) { > /* > * Data is prepackaged in "top". > */ > resid = 0; > if (flags & MSG_EOR) > top->m_flags |= M_EOR; > } else do { > if (top == 0) { > microtime(&t1); > MGETHDR(m, M_WAIT, MT_DATA); > if (m == NULL) { > error = ENOBUFS; > goto release; > } > mlen = MHLEN; > m->m_pkthdr.len = 0; > m->m_pkthdr.rcvif = (struct ifnet *)0; > } else { > MGET(m, M_WAIT, MT_DATA); > if (m == NULL) { > error = ENOBUFS; > goto release; > } > mlen = MLEN; > } > if (resid >= MINCLSIZE) { > MCLGET(m, M_WAIT); > if ((m->m_flags & M_EXT) == 0) > goto nopages; > mlen = MCLBYTES; > len = min(min(mlen, resid), space); > } else { > nopages: > len = min(min(mlen, resid), space); > /* > * For datagram protocols, leave room > * for protocol headers in first mbuf. > */ > if (atomic && top == 0 && len < mlen) > MH_ALIGN(m, len); > } > microtime(&t2); > td = time_diff(&t2, &t1); > if ((td > 5 && (++tcnt & 31) == 0) || td > 50) > log( ... "td %d %d\n", td, tcnt); > > ... > > } /* end of sosend */