Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 23 Aug 2010 12:51:12 -0700
From:      Pyun YongHyeon <pyunyh@gmail.com>
To:        Andre Oppermann <andre@freebsd.org>
Cc:        adrian.chadd@gmail.com, freebsd-net@freebsd.org
Subject:   Re: 8.0-RELEASE-p3: 4k jumbo mbuf cluster exhaustion
Message-ID:  <20100823195112.GG1116@michelle.cdnetworks.com>
In-Reply-To: <4C72CFD0.2000005@freebsd.org>
References:  <AANLkTikrbCFHz-CnuYcgH2JzpeH5hob0Aa2y5dwn3Hvv@mail.gmail.com> <AANLkTikYMU=wML_z=HDnkUF1PGYMVa1q-QWTrkxD%2B7EP@mail.gmail.com> <20100822222746.GC6013@michelle.cdnetworks.com> <AANLkTi=t%2BnG8isp1nf2aBec%2BFwomApNt0NBPO8LqZ%2B=9@mail.gmail.com> <4C724AD9.5020000@freebsd.org> <20100823175220.GB1116@michelle.cdnetworks.com> <4C72C622.2070302@freebsd.org> <20100823191634.GE1116@michelle.cdnetworks.com> <4C72CFD0.2000005@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Aug 23, 2010 at 09:45:20PM +0200, Andre Oppermann wrote:
> On 23.08.2010 21:16, Pyun YongHyeon wrote:
> >On Mon, Aug 23, 2010 at 09:04:02PM +0200, Andre Oppermann wrote:
> >>On 23.08.2010 19:52, Pyun YongHyeon wrote:
> >>>On Mon, Aug 23, 2010 at 12:18:01PM +0200, Andre Oppermann wrote:
> >>>>The function that is called on a socket write is sosend_generic() which
> >>>>makes use of m_getm2().  This function allocates mbuf chains with the
> >>>>tightest packing it can achieve.  It will make use 4k (page size) mbufs
> >>>>as much as it can.  This is where they come from.
> >>>>
> >>>>It seems the 4k clusters do not get freed back to the pool after they've
> >>>>been sent by the NIC and dropped from the socket buffer after the ACK 
> >>>>has
> >>>>arrived.  The leak must occur in one of these two places.  The socket
> >>>>buffer is unlikely as it would affect not just you but everyone else 
> >>>>too.
> >>>>Thus the mbuf freeing after DMA/tx in the bce(4) driver is the prime
> >>>>suspect.
> >>>>
> >>>
> >>>I know bce(4) has a couple of bug in TX path(wrong dma tag, lack of
> >>>bus_dmamap_sync(9) etc) but this is the same code path with/without
> >>>TX checksum offloading. This is one of reason why I still do not
> >>>understand what's really happening here. TX checksum offloading may
> >>>introduce additional frame processing time to fill internal FIFO to
> >>>compute checksum before transmitting the frame to wire such that it
> >>>can change timing of TX path. This timing change might trigger the
> >>>TX path bug. It's just vague guessing though.
> >>
> >>Had a chat with Claudio@OpenBSD and he said that the bce(4) DMA engine
> >>can only access the first 1GB of physical RAM and has to use bounce
> >>buffers all the time.  Maybe this is related.
> >>
> >
> >Really? I don't remember I saw such a DMA address space limitation
> >in data sheet. And I don't think Broadcom made such a horrible
> >thing for controllers targeted for servers. The only limitation I
> >know is BCM5708 is not able to handle DMA addresses greater than
> >40bits so bce(4) limits the DMA address space in DMA tag creation.
> 
> Oops... OpenBSD bce(4) != FreeBSD bce(4).  The former is for BCM440x
> chips the latter for BCM57xx.
> 

Ok, OpenBSD has bnx(4) for Broadcom NetXtreme II controllers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100823195112.GG1116>