From owner-freebsd-net@FreeBSD.ORG Mon Aug 23 17:52:25 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 949FB1065675 for ; Mon, 23 Aug 2010 17:52:25 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from mail-pv0-f182.google.com (mail-pv0-f182.google.com [74.125.83.182]) by mx1.freebsd.org (Postfix) with ESMTP id 617908FC16 for ; Mon, 23 Aug 2010 17:52:25 +0000 (UTC) Received: by pvg4 with SMTP id 4so2642161pvg.13 for ; Mon, 23 Aug 2010 10:52:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:received:from:date:to:cc :subject:message-id:reply-to:references:mime-version:content-type :content-disposition:in-reply-to:user-agent; bh=XBWhzPmFmS7mGHxO1phurhJxuGacgkI6AFly1sYhHCI=; b=e7vnV7uH4vYidahWzgC9p7RO3pEVKI07Uy5ETne2z434kKsoLVvtJwN0a/mRk0e74D cTCk+SRIzRjOO2lgCXUutZXXWtWeW45jYpRAdTekXDz5DzpySsgd9f7qbzN3pbpE0Er+ F4ZnA3U2/yGHh5s/r5zveM0RwnVujdoxD6mqk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=J2rbOI9dQGEXQnZIQeYD4Si93VzAKm4jZstSbmbm2QCIAt6EgNoAxyPxTHQiB/Oh0L 0Hf/+v2OezZD9MqN5TAqBlPEnK+stSZmhfKYK5925/9Ja/VDRjmmWOrvKr9p4xTEwvkG 8Le/rqEMhnQamnZEAVNFkqmgdkEPnTKPUMOG4= Received: by 10.142.154.7 with SMTP id b7mr4759347wfe.92.1282585944990; Mon, 23 Aug 2010 10:52:24 -0700 (PDT) Received: from pyunyh@gmail.com ([174.35.1.224]) by mx.google.com with ESMTPS id z1sm8781840wfd.3.2010.08.23.10.52.21 (version=TLSv1/SSLv3 cipher=RC4-MD5); Mon, 23 Aug 2010 10:52:22 -0700 (PDT) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Mon, 23 Aug 2010 10:52:20 -0700 From: Pyun YongHyeon Date: Mon, 23 Aug 2010 10:52:20 -0700 To: Andre Oppermann Message-ID: <20100823175220.GB1116@michelle.cdnetworks.com> References: <20100822222746.GC6013@michelle.cdnetworks.com> <4C724AD9.5020000@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4C724AD9.5020000@freebsd.org> User-Agent: Mutt/1.4.2.3i Cc: adrian.chadd@gmail.com, freebsd-net@freebsd.org Subject: Re: 8.0-RELEASE-p3: 4k jumbo mbuf cluster exhaustion X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Aug 2010 17:52:25 -0000 On Mon, Aug 23, 2010 at 12:18:01PM +0200, Andre Oppermann wrote: > On 23.08.2010 11:26, Adrian Chadd wrote: > >On 23 August 2010 06:27, Pyun YongHyeon wrote: > > > >>I recall there was SIOCSIFCAP ioctl handling bug in bce(4) on 8.0 so > >>it might also disable IFCAP_TSO4/IFCAP_TXCSUM/IFCAP_RXCSUM when yo > >>disabled RX checksum offloading. But I can't explain how checksum > >>offloading could be related with the growth of 4k jumbo buffers. > > > >Neither can I! > > > >I'm trying to come up with a reproduction method that doesn't involve > >"put box on the internet, push clients through it, wait." > > Network drivers use 2k sized mbuf clusters on receive. So the problem > doesn't seem to be RX related. > bce(4) is special in this regards. The controller would allocate jumbo cluster on RX if jumbo frame is used. If header splitting is used, driver will use normal mbuf clusters. > The function that is called on a socket write is sosend_generic() which > makes use of m_getm2(). This function allocates mbuf chains with the > tightest packing it can achieve. It will make use 4k (page size) mbufs > as much as it can. This is where they come from. > > It seems the 4k clusters do not get freed back to the pool after they've > been sent by the NIC and dropped from the socket buffer after the ACK has > arrived. The leak must occur in one of these two places. The socket > buffer is unlikely as it would affect not just you but everyone else too. > Thus the mbuf freeing after DMA/tx in the bce(4) driver is the prime > suspect. > I know bce(4) has a couple of bug in TX path(wrong dma tag, lack of bus_dmamap_sync(9) etc) but this is the same code path with/without TX checksum offloading. This is one of reason why I still do not understand what's really happening here. TX checksum offloading may introduce additional frame processing time to fill internal FIFO to compute checksum before transmitting the frame to wire such that it can change timing of TX path. This timing change might trigger the TX path bug. It's just vague guessing though. > -- > Andre