From owner-freebsd-net@FreeBSD.ORG Mon Aug 23 19:51:17 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6116B1065698; Mon, 23 Aug 2010 19:51:17 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from mail-pw0-f54.google.com (mail-pw0-f54.google.com [209.85.160.54]) by mx1.freebsd.org (Postfix) with ESMTP id 2567B8FC16; Mon, 23 Aug 2010 19:51:16 +0000 (UTC) Received: by pwi8 with SMTP id 8so925204pwi.13 for ; Mon, 23 Aug 2010 12:51:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:received:from:date:to:cc :subject:message-id:reply-to:references:mime-version:content-type :content-disposition:in-reply-to:user-agent; bh=O3IqanDQAVN4y6glyjkm1xRS1+RdOBd22WO2VKVKD6w=; b=P9fBALjygXcN9+v3LnOIetm4Re/Oq8svWs+NKOWPdr6nEgG4AoHIx7OTWIa9AbI7g8 oJuZtSsD8UaomkIajkj3afk/oVxadAKVGhiIkcihus5nQpetSwl7zK8AxqdzubAjIy5Q lJsq7EHyd/LtoRg7rQyD+Qu2pxgnIZsjFan10= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=c36QWZoMSQNd83YmJV8t069TRgGatA11/eyHxTBFfwwRxnIZZX1pyFHgsvm/tAw5zi WAa1embfQjEWqrzOIa9jvh6eFYKTi0G52s1taw8MehAe1slALudQ+lMjfJW0pYd/nIw/ DeBqqxQACozo0ky2fuOI1dlvSHn1TtVXeipFY= Received: by 10.142.133.18 with SMTP id g18mr4670053wfd.228.1282593076321; Mon, 23 Aug 2010 12:51:16 -0700 (PDT) Received: from pyunyh@gmail.com ([174.35.1.224]) by mx.google.com with ESMTPS id d4sm8897955wfh.23.2010.08.23.12.51.13 (version=TLSv1/SSLv3 cipher=RC4-MD5); Mon, 23 Aug 2010 12:51:14 -0700 (PDT) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Mon, 23 Aug 2010 12:51:12 -0700 From: Pyun YongHyeon Date: Mon, 23 Aug 2010 12:51:12 -0700 To: Andre Oppermann Message-ID: <20100823195112.GG1116@michelle.cdnetworks.com> References: <20100822222746.GC6013@michelle.cdnetworks.com> <4C724AD9.5020000@freebsd.org> <20100823175220.GB1116@michelle.cdnetworks.com> <4C72C622.2070302@freebsd.org> <20100823191634.GE1116@michelle.cdnetworks.com> <4C72CFD0.2000005@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4C72CFD0.2000005@freebsd.org> User-Agent: Mutt/1.4.2.3i Cc: adrian.chadd@gmail.com, freebsd-net@freebsd.org Subject: Re: 8.0-RELEASE-p3: 4k jumbo mbuf cluster exhaustion X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Aug 2010 19:51:17 -0000 On Mon, Aug 23, 2010 at 09:45:20PM +0200, Andre Oppermann wrote: > On 23.08.2010 21:16, Pyun YongHyeon wrote: > >On Mon, Aug 23, 2010 at 09:04:02PM +0200, Andre Oppermann wrote: > >>On 23.08.2010 19:52, Pyun YongHyeon wrote: > >>>On Mon, Aug 23, 2010 at 12:18:01PM +0200, Andre Oppermann wrote: > >>>>The function that is called on a socket write is sosend_generic() which > >>>>makes use of m_getm2(). This function allocates mbuf chains with the > >>>>tightest packing it can achieve. It will make use 4k (page size) mbufs > >>>>as much as it can. This is where they come from. > >>>> > >>>>It seems the 4k clusters do not get freed back to the pool after they've > >>>>been sent by the NIC and dropped from the socket buffer after the ACK > >>>>has > >>>>arrived. The leak must occur in one of these two places. The socket > >>>>buffer is unlikely as it would affect not just you but everyone else > >>>>too. > >>>>Thus the mbuf freeing after DMA/tx in the bce(4) driver is the prime > >>>>suspect. > >>>> > >>> > >>>I know bce(4) has a couple of bug in TX path(wrong dma tag, lack of > >>>bus_dmamap_sync(9) etc) but this is the same code path with/without > >>>TX checksum offloading. This is one of reason why I still do not > >>>understand what's really happening here. TX checksum offloading may > >>>introduce additional frame processing time to fill internal FIFO to > >>>compute checksum before transmitting the frame to wire such that it > >>>can change timing of TX path. This timing change might trigger the > >>>TX path bug. It's just vague guessing though. > >> > >>Had a chat with Claudio@OpenBSD and he said that the bce(4) DMA engine > >>can only access the first 1GB of physical RAM and has to use bounce > >>buffers all the time. Maybe this is related. > >> > > > >Really? I don't remember I saw such a DMA address space limitation > >in data sheet. And I don't think Broadcom made such a horrible > >thing for controllers targeted for servers. The only limitation I > >know is BCM5708 is not able to handle DMA addresses greater than > >40bits so bce(4) limits the DMA address space in DMA tag creation. > > Oops... OpenBSD bce(4) != FreeBSD bce(4). The former is for BCM440x > chips the latter for BCM57xx. > Ok, OpenBSD has bnx(4) for Broadcom NetXtreme II controllers.