From owner-freebsd-hackers@FreeBSD.ORG Wed Apr 9 15:44:04 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 28A9737B401; Wed, 9 Apr 2003 15:44:04 -0700 (PDT) Received: from postal2.lbl.gov (postal2.lbl.gov [131.243.248.26]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3E7DA43F93; Wed, 9 Apr 2003 15:44:01 -0700 (PDT) (envelope-from j_guojun@lbl.gov) Received: from postal2.lbl.gov (localhost [127.0.0.1]) by postal2.lbl.gov (8.12.8/8.12.8) with ESMTP id h39MhwZ6012838; Wed, 9 Apr 2003 15:43:58 -0700 (PDT) Received: from lbl.gov (gracie.lbl.gov [131.243.2.175]) by postal2.lbl.gov (8.12.8/8.12.8) with ESMTP id h39MhvIg012832; Wed, 9 Apr 2003 15:43:57 -0700 (PDT) Sender: jin@lbl.gov Message-ID: <3E94A22D.174321F0@lbl.gov> Date: Wed, 09 Apr 2003 15:43:57 -0700 From: "Jin Guojun [DSD]" X-Mailer: Mozilla 4.76 [en] (X11; U; FreeBSD 4.7-RELEASE i386) X-Accept-Language: zh, zh-CN, en MIME-Version: 1.0 To: freebsd-hackers@freebsd.org, freebsd-performance@freebsd.org Content-Type: multipart/mixed; boundary="------------32793A0542BCFE2DD05E9119" Subject: tcp_output starving -- is due to mbuf get delay? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Apr 2003 22:44:04 -0000 This is a multi-part message in MIME format. --------------32793A0542BCFE2DD05E9119 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit When testing GigE path that has 67 ms RTT, the maximum TCP throughput is limited at 250 Mb/s. By tracing the problem, I found that tcp_output() is starving where snd_wnd and snd_cwnd are fully open. The snd_cc is never filled beyond the 4.05MB even though the snd_hiwat is 10MB and snd_sbmax is 8MB. That is, sosend never stopped at sbwait. So only place can slow down is the mbuf allocation in sosend(). The attached trace file shows that each MGET and MCLGET takes significant time -- around 8 us at slow start time, and gradually increasing after that in an range 18 to 648. Each packet Tx on GigE takes 12 us. It average mbuf allocation takes 18 us, then the performance will be reduced to 40%, in fact it is down to 25%, which means higher average delay. I have change NMBCLUSTER from 2446 to 6566 to 10240, and nothing is improved. Any one can tell what factors would cause MGET / MCLGET to wait? Is there any way to make MGET/MCLGET not to wait? -Jin ----------- system info ------------- kern.ipc.maxsockbuf: 10485760 net.inet.tcp.sendspace: 8388608 kern.ipc.nmbclusters: 10240 kern.ipc.mbuf_wait: 32 kern.ipc.mbtypes: 2606 322 0 0 0 0 0 0 0 0 0 0 0 0 0 0 kern.ipc.nmbufs: 40960 -------------- code trace and explanation ---------- sosend() { ... if (space < resid + clen && (atomic || space < so->so_snd.sb_lowat || space < clen)) { if (so->so_state & SS_NBIO) snderr(EWOULDBLOCK); sbunlock(&so->so_snd); error = sbwait(&so->so_snd); /***** never come down to here ****/ splx(s); if (error) goto out; goto restart; } splx(s); mp = ⊤ space -= clen; do { if (uio == NULL) { /* * Data is prepackaged in "top". */ resid = 0; if (flags & MSG_EOR) top->m_flags |= M_EOR; } else do { if (top == 0) { microtime(&t1); MGETHDR(m, M_WAIT, MT_DATA); if (m == NULL) { error = ENOBUFS; goto release; } mlen = MHLEN; m->m_pkthdr.len = 0; m->m_pkthdr.rcvif = (struct ifnet *)0; } else { MGET(m, M_WAIT, MT_DATA); if (m == NULL) { error = ENOBUFS; goto release; } mlen = MLEN; } if (resid >= MINCLSIZE) { MCLGET(m, M_WAIT); if ((m->m_flags & M_EXT) == 0) goto nopages; mlen = MCLBYTES; len = min(min(mlen, resid), space); } else { nopages: len = min(min(mlen, resid), space); /* * For datagram protocols, leave room * for protocol headers in first mbuf. */ if (atomic && top == 0 && len < mlen) MH_ALIGN(m, len); } microtime(&t2); td = time_diff(&t2, &t1); if ((td > 5 && (++tcnt & 31) == 0) || td > 50) log( ... "td %d %d\n", td, tcnt); ... } /* end of sosend */ --------------32793A0542BCFE2DD05E9119 Content-Type: text/plain; charset=us-ascii; name="sosend.trace" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="sosend.trace" /kernel: l-timer 1049825316.1469006 1049825316.787656 68135 seq# 1608354634 /kernel: lion init rtt 68135 hiwat 8388608 mbmax 10485760 /kernel: init6 maxbss 8516875 (1460) co_tick 1 nbs 6 NIC 1000000 # delay count /kernel: sosend: td 6 32 /kernel: sosend: td 7 64 /kernel: sosend: td 6 96 /kernel: sosend: td 9 128 /kernel: sosend: td 6 160 /kernel: sosend: td 6 192 /kernel: sosend: td 7 224 /kernel: sosend: td 6 256 /kernel: sosend: td 6 288 /kernel: sosend: td 7 320 /kernel: sosend: td 6 352 /kernel: sosend: td 7 384 /kernel: sosend: td 8 416 /kernel: sosend: td 6 448 /kernel: sosend: td 6 480 /kernel: sosend: td 6 512 /kernel: sosend: td 6 544 /kernel: sosend: td 6 576 /kernel: sosend: td 6 608 /kernel: sosend: td 6 640 /kernel: sosend: td 6 672 /kernel: sosend: td 6 704 /kernel: sosend: td 7 736 /kernel: sosend: td 8 768 /kernel: sosend: td 7 800 /kernel: sosend: td 7 832 /kernel: sosend: td 7 864 /kernel: sosend: td 8 896 /kernel: sosend: td 8 928 /kernel: sosend: td 7 960 /kernel: tcp_lion SO_SNDBUF 8517136 // end of slow start /kernel: #(MBS 25550625) swnd 202496 cwnd 6744744 mbcnt 4133376 sp 4843184 /kernel: sosend: td 58 965 /kernel: sosend: td 370 966 /kernel: sosend: td 57 970 /kernel: sosend: td 77 972 /kernel: sosend: td 52 974 /kernel: sosend: td 55 975 /kernel: sosend: td 58 976 /kernel: sosend: td 53 982 /kernel: sosend: td 25 992 /kernel: sosend: td 55 1017 /kernel: sosend: td 19 1024 /kernel: sosend: td 27 1056 /kernel: sosend: td 26 1088 /kernel: sosend: td 43 1120 /kernel: sosend: td 30 1152 /kernel: sosend: td 17 1184 /kernel: sosend: td 25 1216 /kernel: sosend: td 25 1248 /kernel: sosend: td 29 1280 /kernel: sosend: td 18 1312 /kernel: sosend: td 28 1344 /kernel: sosend: td 26 1376 /kernel: sosend: td 26 1408 /kernel: sosend: td 27 1440 /kernel: sosend: td 32 1472 /kernel: sosend: td 27 1504 /kernel: sosend: td 19 1536 /kernel: sosend: td 56 1538 /kernel: sosend: td 21 1568 /kernel: sosend: td 648 1578 /kernel: sosend: td 27 1600 /kernel: sosend: td 27 1632 /kernel: sosend: td 29 1664 /kernel: sosend: td 25 1696 /kernel: sosend: td 70 1717 /kernel: sosend: td 28 1728 /kernel: sosend: td 53 1746 /kernel: sosend: td 51 1750 /kernel: sosend: td 84 1751 /kernel: sosend: td 63 1760 /kernel: sosend: td 293 1766 /kernel: sosend: td 166 1768 /kernel: sosend: td 127 1770 /kernel: sosend: td 76 1771 /kernel: sosend: td 78 1773 /kernel: sosend: td 79 1774 /kernel: sosend: td 308 1776 /kernel: sosend: td 78 1777 /kernel: sosend: td 80 1778 /kernel: sosend: td 79 1779 /kernel: sosend: td 150 1781 /kernel: sosend: td 107 1782 /kernel: sosend: td 106 1784 /kernel: sosend: td 102 1785 /kernel: sosend: td 18 1792 /kernel: sosend: td 97 1793 /kernel: sosend: td 113 1794 /kernel: sosend: td 108 1795 /kernel: sosend: td 100 1796 /kernel: sosend: td 188 1799 /kernel: sosend: td 25 1824 /kernel: sosend: td 26 1856 /kernel: sosend: td 26 1888 /kernel: sosend: td 53 1897 /kernel: sosend: td 28 1920 /kernel: FIN len 96 nxt 1691626219 max 1691626219 seq 1691626219 total counted delay = 39596 per packet (1448 bytes) delay is .7132564500 us --------------32793A0542BCFE2DD05E9119--