From owner-freebsd-net@FreeBSD.ORG Wed Jan 29 00:06:50 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B602CF4A for ; Wed, 29 Jan 2014 00:06:50 +0000 (UTC) Received: from mail-ie0-x22a.google.com (mail-ie0-x22a.google.com [IPv6:2607:f8b0:4001:c03::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 83AF5185C for ; Wed, 29 Jan 2014 00:06:50 +0000 (UTC) Received: by mail-ie0-f170.google.com with SMTP id u16so1377678iet.1 for ; Tue, 28 Jan 2014 16:06:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=xcf2TVbtRhc6egPITwsSspZRXi7QWrV3eOZVZG3Qaak=; b=F5B+6qVoXhrwQ9tUyEFzzM6m2huoMHUTEJ46X3WW15FZF3PVb5Md3ZDnf3jY/dauhf FCjjVtLts3Jd8OawDaVe/OB30hT07dcwl7NwAZzGHxL54IjsxMnXgTDqrtecDVUBkTgq O75SBdBiaLMD4DIvBaLL2jgrEuqbNlZkMhU6SDvkZnmimmGJUnyqMGsqKU21ouzZZZSq 4CMrvbJtZDLytZsQEUDCN4ji9/EJExfN3F8ZNB0h6ZUTaXqfILx1sbYzJ3jxfaxKpuak sQNvsQtowA2UFd0gc+BFeKIbsYddYfzIjdw0x6mA3S07eWOYqwdY+YRIm/wwCWvnWMQG ATlg== MIME-Version: 1.0 X-Received: by 10.50.154.102 with SMTP id vn6mr25502784igb.1.1390954009442; Tue, 28 Jan 2014 16:06:49 -0800 (PST) Sender: jdavidlists@gmail.com Received: by 10.42.170.8 with HTTP; Tue, 28 Jan 2014 16:06:49 -0800 (PST) In-Reply-To: <372707859.17587309.1390923341323.JavaMail.root@uoguelph.ca> References: <20140128021450.GY13704@funkthat.com> <372707859.17587309.1390923341323.JavaMail.root@uoguelph.ca> Date: Tue, 28 Jan 2014 19:06:49 -0500 X-Google-Sender-Auth: qNyyKJhBRRg54dHCKEQmwcr46rw Message-ID: Subject: Re: Terrible NFS performance under 9.2-RELEASE? From: J David To: Rick Macklem Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Jan 2014 00:06:50 -0000 On Tue, Jan 28, 2014 at 10:35 AM, Rick Macklem wrote: > Since messgaes are sent quickly and then mbufs released, except for > the DRC in the server, I think avoiding large allocations for server > replies that may be cached is the case to try and avoid. Fortunately > the large replies will be for read and readdir and these don't need > to be cached by the DRC. As such, a patch that uses 4K clusters in > the server for read, readdir and 4K clusters for write requests in > the client, should be appropriate, I think? m_getm2 appears to consistent produce "right-sized" results. The relevant code is: while (len > 0) { if (len > MCLBYTES) mb = m_getjcl(how, type, (flags & M_PKTHDR), MJUMPAGESIZE); else if (len >= MINCLSIZE) mb = m_getcl(how, type, (flags & M_PKTHDR)); else if (flags & M_PKTHDR) mb = m_gethdr(how, type); else mb = m_get(how, type); /* ... */ } So it allocates the shortest possible chain and uses the best-fit cluster for the last (or only) block in the chain. It's probably the use of this function in m_uiotombuf or somewhere very similar that prevents tools like iperf from encountering this same issue. Getting this same logic into the NFS code seems like it would be a good thing, in terms of reducing code duplication, increasing performance, and leveraging a well-tested code path. It may raise portability concerns, but it does seem likely that other OS's to which the NFS code could potentially be ported have similar mechanisms these days. Possibly it would be worthwhile to examine whether the NFS code could choose a slightly different point of abstraction. Or, if that's undesirable, maybe asking the hypothetical person doing such a port to cross that bridge when they come to it is not unreasonable, since that would be the person most likely to be intimately familiar with the relevant details of both OS's. Also, looking at GAWollman's patch, an mbuf+cluster allocator that kicks back a prewired iovec seems really handy. Is that something that would be useful elsewhere in the kernel, or is NFS just kind of a special case because it's just moving data around, not across weird boundaries like device drivers and anything user mode-facing does? Thanks!