From owner-svn-src-head@FreeBSD.ORG Tue Mar 12 15:31:09 2013 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 99C855BA for ; Tue, 12 Mar 2013 15:31:09 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id E21A731A for ; Tue, 12 Mar 2013 15:31:08 +0000 (UTC) Received: (qmail 10962 invoked from network); 12 Mar 2013 16:43:41 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 12 Mar 2013 16:43:41 -0000 Message-ID: <513F4A39.8040107@freebsd.org> Date: Tue, 12 Mar 2013 16:31:05 +0100 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130107 Thunderbird/17.0.2 MIME-Version: 1.0 To: Gleb Smirnoff Subject: Re: svn commit: r248196 - head/sys/nfs References: <201303121219.r2CCJN5Z069789@svn.freebsd.org> <513F3A54.3090702@freebsd.org> <20130312150053.GI48089@FreeBSD.org> In-Reply-To: <20130312150053.GI48089@FreeBSD.org> Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 7bit Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Mar 2013 15:31:09 -0000 On 12.03.2013 16:00, Gleb Smirnoff wrote: > Andre, > > On Tue, Mar 12, 2013 at 03:23:16PM +0100, Andre Oppermann wrote: > A> On 12.03.2013 13:19, Gleb Smirnoff wrote: > A> > Author: glebius > A> > Date: Tue Mar 12 12:19:23 2013 > A> > New Revision: 248196 > A> > URL: http://svnweb.freebsd.org/changeset/base/248196 > A> > > A> > Log: > A> > Use m_get2() to get mbuf of appropriate size. > A> > A> The problem with m_get2() is that it will attempt to use jumbo mbufs > A> larger than PAGE_SIZE if the request size is sufficiently large. > A> > A> In light of the recent issues with jumbo > PAGE_SIZE allocations on > A> loaded systems I don't think it is appropriate to extend this practice > A> further. > A> > A> Anything that needs more than PAGE_SIZE (where PAGE_SIZE == 4K) mbuf > A> space should allocate an mbuf chain with m_getm2(). That is what we > A> have mbuf chains for. > A> > A> Sufficient availability of mbufs > PAGE_SIZE cannot be guaranteed after > A> some uptime and/or a loaded system due to memory fragmentation. Allocation > A> can fail non-deterministically for mbufs > PAGE_SIZE long before smaller > A> mbufs become unavailable due to overall (kernel) memory exhaustion. > A> > A> Mbufs > PAGE_SIZE are not only contiguous in kernel address space but > A> also in real memory address space. They are intended jumbo frames on > A> poor NIC's without scatter-capable DMA engines. > A> > A> When you put the m_get2() function proposal for review and comments I > A> already highlighted these issues and put forward serious concerns about > A> this. > A> > A> Please change m_get2() to limit itself to mbuf allocations <= PAGE_SIZE. > > I already got similar patch in my queue. We have a lot of code that could > benefit from using both m_get2() and m_getm2() that are limited to not > use jumbos at all. Indeed. That's why I applaud your work in converting them. > If you are concerned about using jumbos that are > PAGE_SIZE, then I can > extend API in my patch. ... done. > > Patch attached. > > The NFS code itself guarantees that it won't request > than MCLBYTES, > so using bare m_get2() here is safe. I can add flag there later for > clarity. Using PAGE_SIZE clusters is perfectly fine and no flag to prevent that is necessary. In fact we're doing it for years on socket writes without complaints (through m_getm2()). However I think that m_get2() should never ever even try to attempt to allocate mbuf clusters larger than PAGE_SIZE. Not even with flags. All mbufs > PAGE_SIZE should be exclusively and only ever be used by drivers for NIC's with "challenged" DMA engines. Possibly only available through a dedicated API to prevent all other uses of it. -- Andre