From owner-svn-src-all@FreeBSD.ORG Tue Mar 12 16:33:07 2013 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D79DBBFC for ; Tue, 12 Mar 2013 16:33:07 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 5A556B34 for ; Tue, 12 Mar 2013 16:33:07 +0000 (UTC) Received: (qmail 11216 invoked from network); 12 Mar 2013 17:45:38 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 12 Mar 2013 17:45:38 -0000 Message-ID: <513F58C0.4050302@freebsd.org> Date: Tue, 12 Mar 2013 17:33:04 +0100 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130107 Thunderbird/17.0.2 MIME-Version: 1.0 To: Gleb Smirnoff Subject: Re: svn commit: r248196 - head/sys/nfs References: <201303121219.r2CCJN5Z069789@svn.freebsd.org> <513F3A54.3090702@freebsd.org> <20130312150053.GI48089@FreeBSD.org> <513F4A39.8040107@freebsd.org> <20130312155005.GJ48089@FreeBSD.org> In-Reply-To: <20130312155005.GJ48089@FreeBSD.org> Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 7bit Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Mar 2013 16:33:08 -0000 On 12.03.2013 16:50, Gleb Smirnoff wrote: > On Tue, Mar 12, 2013 at 04:31:05PM +0100, Andre Oppermann wrote: > A> > If you are concerned about using jumbos that are > PAGE_SIZE, then I can > A> > extend API in my patch. ... done. > A> > > A> > Patch attached. > A> > > A> > The NFS code itself guarantees that it won't request > than MCLBYTES, > A> > so using bare m_get2() here is safe. I can add flag there later for > A> > clarity. > A> > A> Using PAGE_SIZE clusters is perfectly fine and no flag to prevent that > A> is necessary. In fact we're doing it for years on socket writes without > A> complaints (through m_getm2()). > > mbuf usage isn't limited to sockets. There is some code that right now utilizes > only mbufs and standard clusters, netipsec for example. Yes, I understand that. > I'd like to remove a lot of handmade mbuf allocating, in different places in > kernel and this can be done with M_NOJUMBO flag. I don't have time to dig more > deep into large chunks of code trying to understand whether it is possible to > convert them into using PAGE_SIZE clusters or not, I just want to reduce > amount of pasted hand allocating. Reducing the amount of hand allocation is very good. > We have very common case when we allocate either mbuf or mbuf + cluster, > depending on size. Everywhere this is made by hand, but can be substituted > with m_get2(len, ..., M_NOJUMBO); I guess what I'm trying to say is that not wanting jumbo > PAGE_SIZE is normal and shouldn't be specified all the time. This makes the API look like this: m_get2(len, ..., 0); /* w/o flags I get at most MJUMPAGESIZE */ If someone really, really, really knows what he is doing he can say he wants jumbo > PAGE_SIZE returned with M_JUMBOOK or such. However IMHO even that shouldn't be offered and m_getm2() should be used for a chain. > A> However I think that m_get2() should never ever even try to attempt to > A> allocate mbuf clusters larger than PAGE_SIZE. Not even with flags. > A> > A> All mbufs > PAGE_SIZE should be exclusively and only ever be used by drivers > A> for NIC's with "challenged" DMA engines. Possibly only available through a > A> dedicated API to prevent all other uses of it. > > Have you done any benchmarking that proves that scatter-gather on the level of > busdma is any worse than chaining on mbuf level? The problem is different. With our current jumbo mbufs > PAGE_SIZE there isn't any scatter-gather at busdma level because they are contiguous at physical *and* KVA level. Allocating such jumbo mbufs shifts the burden of mbuf chains to the VM and pmap layer in trying to come up with such contiguous patches of physical memory. This fails quickly after some activity and memory fragmentation on the machine as we've seen in recent days even with 96GB of RAM available. It gets worse the more load the machine has. Which is exactly what we *don't* want. > Dealing with contiguous in virtual memory mbuf is handy, for protocols that > look through entire payload, for example pfsync. I guess NFS may also benefit > from that. Of course it is handy. However that carries other tradeoffs, some significant, in other parts of the system. And then for incoming packets it depends on the MTU size. For NFS, as far as I've read through the code today, the control messages tend to be rather small. The vast bulk of the data is transported between mbuf and VFS/filesystem. > P.S. Ok about the patch? No. m_getm2() doesn't need the flag at all. PAGE_SIZE mbufs are always good. Calling m_get2() without flag should return at most a PAGE_SIZE mbuf. The (ab)use of M_PROTO1|2 flags is icky and may conflict with protocol specific uses. -- Andre