Date: Thu, 21 Sep 2006 14:26:50 +0200 From: Andre Oppermann <andre@freebsd.org> To: Gleb Smirnoff <glebius@FreeBSD.org>, Andre Oppermann <andre@freebsd.org>, freebsd-current@freebsd.org, freebsd-net@freebsd.org, alc@freebsd.org, tegge@freebsd.org, gallatin@cs.duke.edu Subject: Re: Much improved sendfile(2) kernel implementation Message-ID: <4512850A.5000107@freebsd.org> In-Reply-To: <20060921114431.GF27667@FreeBSD.org> References: <4511B9B1.2000903@freebsd.org> <20060921114431.GF27667@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Gleb Smirnoff wrote: > Andre, > > On Wed, Sep 20, 2006 at 11:59:13PM +0200, Andre Oppermann wrote: > A> I have rewritten kern_sendfile() to work in two loops, the inner which turns > A> as many pages into mbufs as it can up to the free send socket buffer space. > A> The outer loop then drops the whole mbuf chain into the send socket buffer, > A> calls tcp_output() on it and then waits until 50% of the socket buffer are > A> free again to repeat the cycle. This way tcp_output() gets the full amount > A> of data to work with and can issue up to 64K sends for TSO to chop up in the > A> network adapter without using any CPU cycles. Thus it gets very efficient > A> especially with the readahead the VM and I/O system do. > A> > A> The patch is available here: > A> http://people.freebsd.org/~andre/sendfile-20060920.diff > > I see that mbuf allocations are done in different way depending on the SS_NBIO > flag on socket. This isn't true for the current code, so your patch isn't > only opmization, but is also changing the semantics of the syscall. This should > be avoided in single change/commit. IMO this goes hand in hand. > The non-blocking IO definition is related to the blocking on the socket buffer. > It isn't related to blocking on memory, so this change is incorrect: > > - m_header = m_uiotombuf(hdr_uio, M_DONTWAIT, 0, 0); > - if (m_header == NULL) > + m = m_uiotombuf(hdr_uio, (nbio ? M_NOWAIT : M_WAITOK), > + 0, 0); > + if (m == NULL) { > + error = nbio ? EAGAIN : ENOBUFS; > > There should be unconditional M_NOWAIT. Oops, the M_DONTWAIT in the current > code is incorrect. It is present since rev. 1.171. If the m_uiotombuf() fails > the current code returns from syscall without error! Before rev. 1.171, there > wasn't m_uiotombuf(), the mbuf header was allocated below, with correct wait > argument. > > The wait argument for m_uiotombuf() should be changed to M_WAITOK, but in > a separate commit. > > And this one: > > + m0 = m_get((nbio ? M_NOWAIT : M_WAITOK), MT_DATA); > + if (m0 == NULL) { > + error = (nbio ? EAGAIN : ENOBUFS); > + sf_buf_mext((void *)sf_buf_kva(sf), sf); > + break; > + } > > This one should be M_WAITOK always. It is M_TRYWAIT (equal to M_WAITOK) in > the current code. The reason why I changed the mbuf allocations with SS_NBIO is the rationale of sendfile() and the performance evaluation that was done by alc@ students. sendfile() has two flags which control its blocking behavior. Non blocking socket (SS_NBIO) and SF_NODISKIO. The latter is necessary because file reads or writes are normally not considered to be blocking. The most optimal sendfile() is usage is with a single process doing accept(), parsing and then sendfile that should never ever block on anything. This way the main process then can use kqueue for all the socket stuff and it can transfer all sends that require disk I/O to a child process or thread to provide a context for the read. Meanwhile the main process is free to accept further connections and to continue serving existing connections. Having sendfile() block in mbuf allocation for the header, on sfbufs or anything else is not desirable and must be avoided. I know I'm extending the traditional definition of SS_NBIO a bit but it's fully in line with the semantics and desired operational behavior of sendfile(). The paper by alc@'s students clearly identifies this as the main property of a sendfile implementation besides its zero copy nature. > Is there RELENG_6 version of your patch. If there is one, we can test it > quite extensively. This patch does not fully apply to RELENG_6 as there are some small changes and a bit locking differences. It shouldn't be too hard to adapt it. You won't get TSO speedups though and we can't MFC TSO because it changes the ABI+API for network drivers. -- Andre
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4512850A.5000107>