From owner-freebsd-arch Sun Mar 9 5:51: 8 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0D02337B401 for ; Sun, 9 Mar 2003 05:51:04 -0800 (PST) Received: from perrin.int.nxad.com (internal.ext.nxad.com [69.1.70.251]) by mx1.FreeBSD.org (Postfix) with ESMTP id 460A843FAF for ; Sun, 9 Mar 2003 05:51:03 -0800 (PST) (envelope-from sean@perrin.int.nxad.com) Received: by perrin.int.nxad.com (Postfix, from userid 1001) id D31FD21064; Sun, 9 Mar 2003 05:50:37 -0800 (PST) Date: Sun, 9 Mar 2003 05:50:37 -0800 From: Sean Chittenden To: "Alan L. Cox" Cc: arch@freebsd.org Subject: Re: Should sendfile() to return EAGAIN? [patch] Message-ID: <20030309135037.GK79234@perrin.int.nxad.com> References: <3E64FEA0.CCA21C7@imimic.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="D85EWoRcDXv1uhH1" Content-Disposition: inline In-Reply-To: <3E64FEA0.CCA21C7@imimic.com> User-Agent: Mutt/1.4i X-PGP-Key: finger seanc@FreeBSD.org X-PGP-Fingerprint: 3849 3760 1AFE 7B17 11A0 83A6 DD99 E31F BC84 B341 X-Web-Homepage: http://sean.chittenden.org/ Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG --D85EWoRcDXv1uhH1 Content-Type: multipart/mixed; boundary="Sh9dYexoRflRb0jn" Content-Disposition: inline --Sh9dYexoRflRb0jn Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable > Given that the main purpose of the sf_buf is simply to provide an > in-kernel virtual address for the page, one sf_buf per page should > suffice. Sf_bufs are already reference counted. So, the principle > change would be to add a directory data structure that could answer > the question "Does this page already have an allocated sf_buf?" Is this kind of a structure already in use someplace in the tree? I poked around for a bit and couldn't find anything that'd suggest that it's already been done. Do you know if/where I could poach code that'd provide this functionality or would this be fresh bits that'd that'd have to hit the tree? Was there any form of consensus regarding whether or not sf_buf_alloc() should be called in the non-blocking case? I think that there should be another call, sf_buf_alloc_nb() that is used that doesn't block the call when there aren't any sf_buf's available and is used in the non-blocking case. I can't imagine Greenman meant to use msleep(..., 0) for the non-blocking case. The attached patch corrects this. The patch updates the case of sendfile() when there aren't any sf_buf's available. Instead of calling msleep() and blocking the caller on a socket that has been marked non-blocking, return instantly with EAGAIN. This doesn't provide a mechanism for identifying that there aren't any sf_buf's available. At some point a read only sysctl should be added that lets an administrator know how many sf_buf's are free (max number already exists in -CURRENT so it should be trivial for an admin to figure out how many are in use), but that will come at a later date (ENOSLEEP). Returning control to the program should dramatically improve the responsiveness of a non-blocking application that uses sendfile(), hopefully sf_buf's will be freed up for use making it possible to deal with the load. Currently when this limit is hit, it kills the concurrency of the server that non-blocking IO affords and the throughput of a system drops from __Mbps down to near 0Mbps. With this patch, at the very least the server should be able to continue to send data at __Mbps. There is a race in this code, however. There is a race with this code in that if a non-blocking socket that has had sendfile() called on it where there wasn't an sf_buf available, and it is set back to being a blocking socket, sf_buf_alloc_want will never reach zero and as a result, wakeup_one(&sf_freelist) will be called every time in sf_buf_free(). I'm not sure how best to fix this if it should be, or even what the behavior of wakeup_one(&sf_freelist) will be if it is called when there isn't anything msleep()'ing on sf_freelist. Another something that came to mind was to alert the server admin that he/she's out of sf_buf's the same way that the kernel does when it runs out of nmbclusters. Lastly, I couldn't find any bits to suggest how data coherency is maintained. What's the mechanism in place that guarantee's this? -sc Patch can also be found at: http://people.freebsd.org/~seanc/patches/#sendfile_no_block --=20 Sean Chittenden --Sh9dYexoRflRb0jn Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=patch Content-Transfer-Encoding: quoted-printable Index: uipc_syscalls.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /home/ncvs/src/sys/kern/uipc_syscalls.c,v retrieving revision 1.142 diff -u -r1.142 uipc_syscalls.c --- uipc_syscalls.c 6 Mar 2003 04:48:19 -0000 1.142 +++ uipc_syscalls.c 9 Mar 2003 13:14:43 -0000 @@ -1661,6 +1661,27 @@ return (sf); } =20 +/* + * Get an sf_buf from the freelist. Will return NULL if none are + * available. + */ +struct sf_buf * +sf_buf_alloc_nb() +{ + struct sf_buf *sf; + + mtx_lock(&sf_freelist.sf_lock); + sf_buf_alloc_want++; + sf =3D SLIST_FIRST(&sf_freelist.sf_head); + if (sf !=3D NULL) { + SLIST_REMOVE_HEAD(&sf_freelist.sf_head, free_list); + sf_buf_alloc_want--; + } + + mtx_unlock(&sf_freelist.sf_lock); + return (sf); +} + #define dtosf(x) (&sf_bufs[((uintptr_t)(x) - (uintptr_t)sf_base) >> PAGE_S= HIFT]) =20 /* @@ -1929,17 +1950,22 @@ vm_page_unlock_queues(); =20 /* - * Get a sendfile buf. We usually wait as long as necessary, - * but this wait can be interrupted. + * Get a sendfile buf. */ - if ((sf =3D sf_buf_alloc()) =3D=3D NULL) { + if (so->so_state & SS_NBIO) { + if ((sf =3D sf_buf_alloc_nb()) =3D=3D NULL) + error =3D EAGAIN; + } else { + if ((sf =3D sf_buf_alloc()) =3D=3D NULL) + error =3D EINTR; + } + if (sf =3D=3D NULL) { vm_page_lock_queues(); vm_page_unwire(pg, 0); if (pg->wire_count =3D=3D 0 && pg->object =3D=3D NULL) vm_page_free(pg); vm_page_unlock_queues(); sbunlock(&so->so_snd); - error =3D EINTR; goto done; } =20 --Sh9dYexoRflRb0jn-- --D85EWoRcDXv1uhH1 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Comment: Sean Chittenden iD8DBQE+a0at3ZnjH7yEs0ERAmRSAJ9wXd+g+rPqHQ/dL9K1Ds0ku2aMqACfSgfv xaRjZ5R5gJNFkPNdnrK1tl0= =ngMe -----END PGP SIGNATURE----- --D85EWoRcDXv1uhH1-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message