From owner-freebsd-arch@FreeBSD.ORG Sun Jul 23 06:19:40 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BE57E16A4E2; Sun, 23 Jul 2006 06:19:40 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from fw.zoral.com.ua (fw.zoral.com.ua [213.186.206.134]) by mx1.FreeBSD.org (Postfix) with ESMTP id C5E8643D46; Sun, 23 Jul 2006 06:19:39 +0000 (GMT) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by fw.zoral.com.ua (8.13.4/8.13.4) with ESMTP id k6N6JY7l087733 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 23 Jul 2006 09:19:34 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.13.6/8.13.6) with ESMTP id k6N6JY5i050761; Sun, 23 Jul 2006 09:19:34 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.13.6/8.13.6/Submit) id k6N6JXD8050760; Sun, 23 Jul 2006 09:19:33 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 23 Jul 2006 09:19:33 +0300 From: Kostik Belousov To: Peter Jeremy Message-ID: <20060723061933.GC1217@deviant.kiev.zoral.com.ua> References: <20060721104044.GB728@turion.vk2pj.dyndns.org> <20060722154606.N54846@fledge.watson.org> <20060722151631.GB1217@deviant.kiev.zoral.com.ua> <20060722235528.GI728@turion.vk2pj.dyndns.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="48TaNjbzBVislYPb" Content-Disposition: inline In-Reply-To: <20060722235528.GI728@turion.vk2pj.dyndns.org> User-Agent: Mutt/1.4.2.2i X-Virus-Scanned: ClamAV version 0.88.2, clamav-milter version 0.88.2 on fw.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=0.4 required=5.0 tests=ALL_TRUSTED, DNS_FROM_RFC_ABUSE,SPF_NEUTRAL autolearn=no version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on fw.zoral.com.ua Cc: Robert Watson , freebsd-arch@freebsd.org Subject: Re: mlock(2) for ordinary users X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 Jul 2006 06:19:40 -0000 --48TaNjbzBVislYPb Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Jul 23, 2006 at 09:55:28AM +1000, Peter Jeremy wrote: > On Sat, 2006-Jul-22 18:16:31 +0300, Kostik Belousov wrote: > >On Sat, Jul 22, 2006 at 03:52:37PM +0100, Robert Watson wrote: > >As consequence, allowing mlock() for non-root users actually allow such > >user to allocate value-of(RLIMIT_MEMLOCK) * value-of(RLIMIT_NPROC). >=20 > This is no different to the other per-process resource limits. On a > stock FreeBSD system, I can reach the system-wide FD limit with two > user processes. I can't see that having several processes each > locking RLIMIT_MEMLOCK pages provides any real benefit to the user > so this is really just another DoS vector. >=20 > >In fact, I had to make the answers to the asked questions when I > >implemented the per-user swap limits. >=20 > I didn't realise this existed. How do you control per-user swap? I > can't find any reference to this in either login.conf or setrlimit(2). This is not in the tree. See http://people.freebsd.org/~kib/overcommit/index.html I would be more than happy if this stuff becomes useful for at least one purpose. --48TaNjbzBVislYPb Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.4 (FreeBSD) iD8DBQFEwxT1C3+MBN1Mb4gRApKNAKD1X2SNtY3Z5Piyom70Na8r3crFlACeKaI8 eQGOb4Gr+bj417hGHbn0lSo= =PWd1 -----END PGP SIGNATURE----- --48TaNjbzBVislYPb-- From owner-freebsd-arch@FreeBSD.ORG Sun Jul 23 18:58:00 2006 Return-Path: X-Original-To: arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2AA9516A4DA for ; Sun, 23 Jul 2006 18:58:00 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id 659AF43D7B for ; Sun, 23 Jul 2006 18:57:57 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 292D846B04 for ; Sun, 23 Jul 2006 14:57:56 -0400 (EDT) Date: Sun, 23 Jul 2006 19:57:56 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: arch@FreeBSD.org Message-ID: <20060723171734.K35186@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Subject: sosend/soreceive consistency improvements X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 Jul 2006 18:58:00 -0000 As part of cleanups, locking, and optimization work, I've been looking at the socket send and receive paths. In the past, work was done do allow the uio/mbuf chain send and receive paths (sosend, soreceive) to be pluggable for a protocol, so that the protocol could provide substitute implementations. This is not, generally, currently used, although I recently changed UDP to use an optimized datagram send routine. This pluggability is made possible by virtue of each protocol providing its own pru_sosend() and pru_soreceive() methods in the protocol switch. There's another side to the pluggability, however -- the socket consumers in the kernel, of which there are quite a few -- obviously the socket system calls, but also netgraph, distributed file systems, etc. Some of these consumers have been modified to call so->so_proto->pr_usrreqs->pru_soreceive and ...->pru_sosend, but it turns out many haven't. New references to sosend() and soreceive() periodically get encoded into consumers -- presumably because they are easy to spell, and in fact are generally functionally identical. But not always! It turns out that the NFS code isn't using the optimized UDP send path via sosend_dgram(), because it's calling sosend() directly. Rather than continue in this "in between state", in which the uio/mbuf chain sosend and soreceive are reached via the protocol switch in each occurrence, I propose a change: sosend() and soreceive() will now be the formal APIs for sending and receiveing on sockets within the kernel, as is the case with many other so*() functions, and they will perform the protocol switch dereference. The existing functions are renamed to sosend_generic() and soreceive_generic(), and in most cases are never referenced by protocols since our protocol domain registration already uses sosend() and soreceive() as the defaults today. The new code strikes me as quite a bit more readable, and likely easier for socket consumers to use. Any thoughts and/or objections? Robert N M Watson Computer Laboratory University of Cambridge --- //depot/vendor/freebsd/src/sys/kern/sys_socket.c 2005/04/16 18:50:30 +++ //depot/user/rwatson/socleanup/src/sys/kern/sys_socket.c 2006/07/23 15:32:54 @@ -88,7 +88,7 @@ return (error); } #endif - error = so->so_proto->pr_usrreqs->pru_soreceive(so, 0, uio, 0, 0, 0); + error = soreceive(so, 0, uio, 0, 0, 0); NET_UNLOCK_GIANT(); return (error); } @@ -115,8 +115,7 @@ return (error); } #endif - error = so->so_proto->pr_usrreqs->pru_sosend(so, 0, uio, 0, 0, 0, - uio->uio_td); + error = sosend(so, 0, uio, 0, 0, 0, uio->uio_td); if (error == EPIPE && (so->so_options & SO_NOSIGPIPE) == 0) { PROC_LOCK(uio->uio_td->td_proc); psignal(uio->uio_td->td_proc, SIGPIPE); --- //depot/vendor/freebsd/src/sys/kern/uipc_domain.c 2006/07/11 23:21:53 +++ //depot/user/rwatson/socleanup/src/sys/kern/uipc_domain.c 2006/07/23 15:52:33 @@ -119,8 +119,8 @@ DEFAULT(pu->pru_rcvd, pru_rcvd_notsupp); DEFAULT(pu->pru_rcvoob, pru_rcvoob_notsupp); DEFAULT(pu->pru_sense, pru_sense_null); - DEFAULT(pu->pru_sosend, sosend); - DEFAULT(pu->pru_soreceive, soreceive); + DEFAULT(pu->pru_sosend, sosend_generic); + DEFAULT(pu->pru_soreceive, soreceive_generic); DEFAULT(pu->pru_sopoll, sopoll); #undef DEFAULT if (pr->pr_init) --- //depot/vendor/freebsd/src/sys/kern/uipc_socket.c 2006/07/21 17:16:23 +++ //depot/user/rwatson/socleanup/src/sys/kern/uipc_socket.c 2006/07/23 15:32:54 @@ -1087,7 +1087,7 @@ */ #define snderr(errno) { error = (errno); goto release; } int -sosend(so, addr, uio, top, control, flags, td) +sosend_generic(so, addr, uio, top, control, flags, td) struct socket *so; struct sockaddr *addr; struct uio *uio; @@ -1249,6 +1249,25 @@ } #undef snderr +int +sosend(so, addr, uio, top, control, flags, td) + struct socket *so; + struct sockaddr *addr; + struct uio *uio; + struct mbuf *top; + struct mbuf *control; + int flags; + struct thread *td; +{ + + /* XXXRW: Temporary debugging. */ + KASSERT(so->so_proto->pr_usrreqs->pru_sosend != sosend, + ("sosend: protocol calls sosend")); + + return (so->so_proto->pr_usrreqs->pru_sosend(so, addr, uio, top, + control, flags, td)); +} + /* * The part of soreceive() that implements reading non-inline out-of-band * data from a socket. For more complete comments, see soreceive(), from @@ -1354,7 +1373,7 @@ * only for the count in uio_resid. */ int -soreceive(so, psa, uio, mp0, controlp, flagsp) +soreceive_generic(so, psa, uio, mp0, controlp, flagsp) struct socket *so; struct sockaddr **psa; struct uio *uio; @@ -1794,6 +1813,24 @@ } int +soreceive(so, psa, uio, mp0, controlp, flagsp) + struct socket *so; + struct sockaddr **psa; + struct uio *uio; + struct mbuf **mp0; + struct mbuf **controlp; + int *flagsp; +{ + + /* XXXRW: Temporary debugging. */ + KASSERT(so->so_proto->pr_usrreqs->pru_soreceive != soreceive, + ("soreceive: protocol calls soreceive")); + + return (so->so_proto->pr_usrreqs->pru_soreceive(so, psa, uio, mp0, + controlp, flagsp)); +} + +int soshutdown(so, how) struct socket *so; int how; --- //depot/vendor/freebsd/src/sys/kern/uipc_syscalls.c 2006/07/19 18:31:24 +++ //depot/user/rwatson/socleanup/src/sys/kern/uipc_syscalls.c 2006/07/23 15:32:54 @@ -803,8 +803,7 @@ ktruio = cloneuio(&auio); #endif len = auio.uio_resid; - error = so->so_proto->pr_usrreqs->pru_sosend(so, mp->msg_name, &auio, - 0, control, flags, td); + error = sosend(so, mp->msg_name, &auio, 0, control, flags, td); if (error) { if (auio.uio_resid != len && (error == ERESTART || error == EINTR || error == EWOULDBLOCK)) @@ -1020,8 +1019,7 @@ ktruio = cloneuio(&auio); #endif len = auio.uio_resid; - error = so->so_proto->pr_usrreqs->pru_soreceive(so, &fromsa, &auio, - (struct mbuf **)0, + error = soreceive(so, &fromsa, &auio, (struct mbuf **)0, (mp->msg_control || controlp) ? &control : (struct mbuf **)0, &mp->msg_flags); if (error) { --- //depot/vendor/freebsd/src/sys/kern/uipc_usrreq.c 2006/07/23 12:02:30 +++ //depot/user/rwatson/socleanup/src/sys/kern/uipc_usrreq.c 2006/07/23 16:05:31 @@ -768,8 +768,8 @@ .pru_sense = uipc_sense, .pru_shutdown = uipc_shutdown, .pru_sockaddr = uipc_sockaddr, - .pru_sosend = sosend, - .pru_soreceive = soreceive, + .pru_sosend = sosend_generic, + .pru_soreceive = soreceive_generic, .pru_sopoll = sopoll, .pru_close = uipc_close, }; --- //depot/vendor/freebsd/src/sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c 2006/07/21 17:11:33 +++ //depot/user/rwatson/socleanup/src/sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c 2006/07/23 15:32:54 @@ -1558,8 +1558,8 @@ flags = MSG_DONTWAIT; m = NULL; - error = (*s->l2so->so_proto->pr_usrreqs->pru_soreceive)(s->l2so, - NULL, &uio, &m, (struct mbuf **) NULL, &flags); + error = soreceive(s->l2so, NULL, &uio, &m, + (struct mbuf **) NULL, &flags); if (error != 0) { if (error == EWOULDBLOCK) return (0); /* XXX can happen? */ @@ -1610,9 +1610,8 @@ return (0); /* we are done */ /* Call send function on the L2CAP socket */ - error = (*s->l2so->so_proto->pr_usrreqs->pru_sosend) - (s->l2so, NULL, NULL, m, NULL, 0, - curthread /* XXX */); + error = sosend(s->l2so, NULL, NULL, m, NULL, 0, + curthread /* XXX */); if (error != 0) { NG_BTSOCKET_RFCOMM_ERR( "%s: Could not send data to L2CAP socket, error=%d\n", __func__, error); --- //depot/vendor/freebsd/src/sys/netgraph/ng_ksocket.c 2006/02/21 13:08:33 +++ //depot/user/rwatson/socleanup/src/sys/netgraph/ng_ksocket.c 2006/07/23 15:32:54 @@ -920,7 +920,7 @@ sa = &stag->sa; /* Send packet */ - error = (*so->so_proto->pr_usrreqs->pru_sosend)(so, sa, 0, m, 0, 0, td); + error = sosend(so, sa, 0, m, 0, 0, td); return (error); } @@ -1101,9 +1101,8 @@ struct mbuf *n; /* Try to get next packet from socket */ - if ((error = (*so->so_proto->pr_usrreqs->pru_soreceive) - (so, (so->so_state & SS_ISCONNECTED) ? NULL : &sa, - &auio, &m, (struct mbuf **)0, &flags)) != 0) + if ((error = soreceive(so, (so->so_state & SS_ISCONNECTED) ? + NULL : &sa, &auio, &m, (struct mbuf **)0, &flags)) != 0) break; /* See if we got anything */ --- //depot/vendor/freebsd/src/sys/netncp/ncp_sock.c 2005/01/07 01:52:23 +++ //depot/user/rwatson/socleanup/src/sys/netncp/ncp_sock.c 2006/07/23 15:32:54 @@ -139,10 +139,9 @@ auio.uio_td = td; flags = MSG_DONTWAIT; -/* error = so->so_proto->pr_usrreqs->pru_soreceive(so, 0, &auio, - (struct mbuf **)0, (struct mbuf **)0, &flags);*/ - error = so->so_proto->pr_usrreqs->pru_soreceive(so, 0, &auio, - mp, (struct mbuf **)0, &flags); +/* error = soreceive(so, 0, &auio, (struct mbuf **)0, (struct mbuf **)0, + &flags);*/ + error = soreceive(so, 0, &auio, mp, (struct mbuf **)0, &flags); *rlen = len - auio.uio_resid; /* if (!error) { *rlen=iov.iov_len; @@ -168,7 +167,7 @@ for (;;) { m = m_copym(top, 0, M_COPYALL, M_TRYWAIT); /* NCPDDEBUG(m);*/ - error = so->so_proto->pr_usrreqs->pru_sosend(so, to, 0, m, 0, flags, td); + error = sosend(so, to, 0, m, 0, flags, td); if (error == 0 || error == EINTR || error == ENETDOWN) break; if (rqp->rexmit == 0) break; @@ -443,8 +442,8 @@ auio.uio_resid = len = 1000000; auio.uio_td = curthread; flags = MSG_DONTWAIT; - error = so->so_proto->pr_usrreqs->pru_soreceive(so, - (struct sockaddr**)&sa, &auio, &m, (struct mbuf**)0, &flags); + error = soreceive(so, (struct sockaddr**)&sa, &auio, &m, + (struct mbuf**)0, &flags); if (error) break; len -= auio.uio_resid; NCPSDEBUG("got watch dog %d\n",len); @@ -452,7 +451,7 @@ buf = mtod(m, char*); if (buf[1] != '?') break; buf[1] = 'Y'; - error = so->so_proto->pr_usrreqs->pru_sosend(so, (struct sockaddr*)sa, 0, m, 0, 0, curthread); + error = sosend(so, (struct sockaddr*)sa, 0, m, 0, 0, curthread); NCPSDEBUG("send watch dog %d\n",error); break; } --- //depot/vendor/freebsd/src/sys/netsmb/smb_trantcp.c 2005/01/07 01:52:23 +++ //depot/user/rwatson/socleanup/src/sys/netsmb/smb_trantcp.c 2006/07/23 15:32:54 @@ -75,8 +75,7 @@ SYSCTL_INT(_net_smb, OID_AUTO, tcpsndbuf, CTLFLAG_RW, &smb_tcpsndbuf, 0, ""); SYSCTL_INT(_net_smb, OID_AUTO, tcprcvbuf, CTLFLAG_RW, &smb_tcprcvbuf, 0, ""); -#define nb_sosend(so,m,flags,td) (so)->so_proto->pr_usrreqs->pru_sosend( \ - so, NULL, 0, m, 0, flags, td) +#define nb_sosend(so,m,flags,td) sosend(so, NULL, 0, m, 0, flags, td) static int nbssn_recv(struct nbpcb *nbp, struct mbuf **mpp, int *lenp, u_int8_t *rpcodep, struct thread *td); @@ -377,8 +376,7 @@ auio.uio_offset = 0; auio.uio_resid = sizeof(len); auio.uio_td = td; - error = so->so_proto->pr_usrreqs->pru_soreceive - (so, (struct sockaddr **)NULL, &auio, + error = soreceive(so, (struct sockaddr **)NULL, &auio, (struct mbuf **)NULL, (struct mbuf **)NULL, &flags); if (error) return error; @@ -461,8 +459,7 @@ */ do { rcvflg = MSG_WAITALL; - error = so->so_proto->pr_usrreqs->pru_soreceive - (so, (struct sockaddr **)NULL, + error = soreceive(so, (struct sockaddr **)NULL, &auio, &tm, (struct mbuf **)NULL, &rcvflg); } while (error == EWOULDBLOCK || error == EINTR || error == ERESTART); --- //depot/vendor/freebsd/src/sys/nfsclient/nfs_socket.c 2006/07/08 15:41:11 +++ //depot/user/rwatson/socleanup/src/sys/nfsclient/nfs_socket.c 2006/07/23 15:32:54 @@ -606,8 +606,7 @@ else flags = 0; - error = so->so_proto->pr_usrreqs->pru_sosend(so, sendnam, 0, top, 0, - flags, curthread /*XXX*/); + error = sosend(so, sendnam, 0, top, 0, flags, curthread /*XXX*/); if (error == ENOBUFS && so->so_type == SOCK_DGRAM) { error = 0; mtx_lock(&rep->r_mtx); @@ -946,9 +945,8 @@ auio.uio_iovcnt = 0; mp = NULL; rcvflg = (MSG_DONTWAIT | MSG_SOCALLBCK); - error = so->so_proto->pr_usrreqs->pru_soreceive - (so, (struct sockaddr **)0, - &auio, &mp, (struct mbuf **)0, &rcvflg); + error = soreceive(so, (struct sockaddr **)0, &auio, + &mp, (struct mbuf **)0, &rcvflg); /* * We've already tested that the socket is readable. 2 cases * here, we either read 0 bytes (client closed connection), @@ -1016,9 +1014,8 @@ auio.uio_iovcnt = 0; mp = NULL; rcvflg = (MSG_DONTWAIT | MSG_SOCALLBCK); - error = so->so_proto->pr_usrreqs->pru_soreceive - (so, (struct sockaddr **)0, - &auio, &mp, (struct mbuf **)0, &rcvflg); + error = soreceive(so, (struct sockaddr **)0, &auio, + &mp, (struct mbuf **)0, &rcvflg); if (error || auio.uio_resid > 0) { if (error && error != ECONNRESET) { log(LOG_ERR, @@ -1058,9 +1055,7 @@ auio.uio_resid = 1000000000; do { mp = control = NULL; - error = so->so_proto->pr_usrreqs->pru_soreceive(so, - NULL, &auio, &mp, - &control, &rcvflag); + error = soreceive(so, NULL, &auio, &mp, &control, &rcvflag); if (control) m_freem(control); if (mp) --- //depot/vendor/freebsd/src/sys/nfsserver/nfs_srvsock.c 2006/04/06 23:35:17 +++ //depot/user/rwatson/socleanup/src/sys/nfsserver/nfs_srvsock.c 2006/07/23 15:32:54 @@ -466,8 +466,7 @@ auio.uio_resid = 1000000000; flags = MSG_DONTWAIT; NFSD_UNLOCK(); - error = so->so_proto->pr_usrreqs->pru_soreceive - (so, &nam, &auio, &mp, NULL, &flags); + error = soreceive(so, &nam, &auio, &mp, NULL, &flags); NFSD_LOCK(); if (error || mp == NULL) { if (error == EWOULDBLOCK) @@ -503,8 +502,7 @@ auio.uio_resid = 1000000000; flags = MSG_DONTWAIT; NFSD_UNLOCK(); - error = so->so_proto->pr_usrreqs->pru_soreceive - (so, &nam, &auio, &mp, NULL, &flags); + error = soreceive(so, &nam, &auio, &mp, NULL, &flags); if (mp) { struct nfsrv_rec *rec; rec = malloc(sizeof(struct nfsrv_rec), @@ -785,8 +783,7 @@ else flags = 0; - error = so->so_proto->pr_usrreqs->pru_sosend(so, sendnam, 0, top, 0, - flags, curthread/*XXX*/); + error = sosend(so, sendnam, 0, top, 0, flags, curthread/*XXX*/); if (error == ENOBUFS && so->so_type == SOCK_DGRAM) error = 0; --- //depot/vendor/freebsd/src/sys/sys/protosw.h 2006/07/14 10:06:50 +++ //depot/user/rwatson/socleanup/src/sys/sys/protosw.h 2006/07/23 15:32:54 @@ -228,15 +228,6 @@ int (*pru_sense)(struct socket *so, struct stat *sb); int (*pru_shutdown)(struct socket *so); int (*pru_sockaddr)(struct socket *so, struct sockaddr **nam); - - /* - * These four added later, so they are out of order. They are used - * for shortcutting (fast path input/output) in some protocols. - * XXX - that's a lie, they are not implemented yet - * Rather than calling sosend() etc. directly, calls are made - * through these entry points. For protocols which still use - * the generic code, these just point to those routines. - */ int (*pru_sosend)(struct socket *so, struct sockaddr *addr, struct uio *uio, struct mbuf *top, struct mbuf *control, int flags, struct thread *td); --- //depot/vendor/freebsd/src/sys/sys/socketvar.h 2006/06/17 22:50:57 +++ //depot/user/rwatson/socleanup/src/sys/sys/socketvar.h 2006/07/23 15:32:54 @@ -532,6 +532,9 @@ struct thread *td); int soreceive(struct socket *so, struct sockaddr **paddr, struct uio *uio, struct mbuf **mp0, struct mbuf **controlp, int *flagsp); +int soreceive_generic(struct socket *so, struct sockaddr **paddr, + struct uio *uio, struct mbuf **mp0, struct mbuf **controlp, + int *flagsp); int soreserve(struct socket *so, u_long sndcc, u_long rcvcc); void sorflush(struct socket *so); int sosend(struct socket *so, struct sockaddr *addr, struct uio *uio, @@ -540,6 +543,9 @@ int sosend_dgram(struct socket *so, struct sockaddr *addr, struct uio *uio, struct mbuf *top, struct mbuf *control, int flags, struct thread *td); +int sosend_generic(struct socket *so, struct sockaddr *addr, + struct uio *uio, struct mbuf *top, struct mbuf *control, + int flags, struct thread *td); int sosetopt(struct socket *so, struct sockopt *sopt); int soshutdown(struct socket *so, int how); void sotoxsocket(struct socket *so, struct xsocket *xso); From owner-freebsd-arch@FreeBSD.ORG Mon Jul 24 00:24:37 2006 Return-Path: X-Original-To: arch@freebsd.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E84A816A4DA; Mon, 24 Jul 2006 00:24:37 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from mrout1-b.corp.dcn.yahoo.com (mrout1-b.corp.dcn.yahoo.com [216.109.112.27]) by mx1.FreeBSD.org (Postfix) with ESMTP id 557B243D49; Mon, 24 Jul 2006 00:24:37 +0000 (GMT) (envelope-from gnn@neville-neil.com) Received: from minion.local.neville-neil.com (proxy7.corp.yahoo.com [216.145.48.98]) by mrout1-b.corp.dcn.yahoo.com (8.13.6/8.13.6/y.out) with ESMTP id k6O0OUSj019926; Sun, 23 Jul 2006 17:24:31 -0700 (PDT) Date: Mon, 24 Jul 2006 09:24:22 +0900 Message-ID: From: gnn@freebsd.org To: Robert Watson In-Reply-To: <20060723171734.K35186@fledge.watson.org> References: <20060723171734.K35186@fledge.watson.org> User-Agent: Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.8 (=?ISO-8859-4?Q?Shij=F2?=) APEL/10.6 Emacs/22.0.50 (i386-apple-darwin8.6.1) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Cc: arch@freebsd.org Subject: Re: sosend/soreceive consistency improvements X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Jul 2006 00:24:38 -0000 At Sun, 23 Jul 2006 19:57:56 +0100 (BST), rwatson wrote: > Rather than continue in this "in between state", in which the > uio/mbuf chain sosend and soreceive are reached via the protocol > switch in each occurrence, I propose a change: sosend() and > soreceive() will now be the formal APIs for sending and receiveing > on sockets within the kernel, as is the case with many other so*() > functions, and they will perform the protocol switch dereference. > The existing functions are renamed to sosend_generic() and > soreceive_generic(), and in most cases are never referenced by > protocols since our protocol domain registration already uses > sosend() and soreceive() as the defaults today. The new code > strikes me as quite a bit more readable, and likely easier for > socket consumers to use. > > Any thoughts and/or objections? > Makes sense to me. Can we document these? That is, is there a man page in section 9 we could add these to? Later, George From owner-freebsd-arch@FreeBSD.ORG Mon Jul 24 00:27:59 2006 Return-Path: X-Original-To: arch@freebsd.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7694516A4DF; Mon, 24 Jul 2006 00:27:59 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id F413743D55; Mon, 24 Jul 2006 00:27:58 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 75D9046BC2; Sun, 23 Jul 2006 20:27:58 -0400 (EDT) Date: Mon, 24 Jul 2006 01:27:58 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: gnn@freebsd.org In-Reply-To: Message-ID: <20060724012707.A44945@fledge.watson.org> References: <20060723171734.K35186@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org Subject: Re: sosend/soreceive consistency improvements X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Jul 2006 00:27:59 -0000 On Mon, 24 Jul 2006, gnn@freebsd.org wrote: > At Sun, 23 Jul 2006 19:57:56 +0100 (BST), > rwatson wrote: > >> Rather than continue in this "in between state", in which the uio/mbuf >> chain sosend and soreceive are reached via the protocol switch in each >> occurrence, I propose a change: sosend() and soreceive() will now be the >> formal APIs for sending and receiveing on sockets within the kernel, as is >> the case with many other so*() functions, and they will perform the >> protocol switch dereference. The existing functions are renamed to >> sosend_generic() and soreceive_generic(), and in most cases are never >> referenced by protocols since our protocol domain registration already uses >> sosend() and soreceive() as the defaults today. The new code strikes me as >> quite a bit more readable, and likely easier for socket consumers to use. >> >> Any thoughts and/or objections? > > Makes sense to me. Can we document these? That is, is there a man page in > section 9 we could add these to? I have plans to add a socket(9) man page, but because I'm still tearing things up, I've deferred doing that. I've started increasing the number of notes in uipc_socket.c in order to document some of the things that will eventually be in socket(9). Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Mon Jul 24 00:40:10 2006 Return-Path: X-Original-To: arch@freebsd.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9ED1B16A4DF; Mon, 24 Jul 2006 00:40:10 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from mrout1.yahoo.com (mrout1.yahoo.com [216.145.54.171]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3B9F443D45; Mon, 24 Jul 2006 00:40:10 +0000 (GMT) (envelope-from gnn@neville-neil.com) Received: from minion.local.neville-neil.com (proxy7.corp.yahoo.com [216.145.48.98]) by mrout1.yahoo.com (8.13.6/8.13.6/y.out) with ESMTP id k6O0dnXk007712; Sun, 23 Jul 2006 17:39:49 -0700 (PDT) Date: Mon, 24 Jul 2006 09:39:41 +0900 Message-ID: From: gnn@freebsd.org To: Robert Watson In-Reply-To: <20060724012707.A44945@fledge.watson.org> References: <20060723171734.K35186@fledge.watson.org> <20060724012707.A44945@fledge.watson.org> User-Agent: Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.8 (=?ISO-8859-4?Q?Shij=F2?=) APEL/10.6 Emacs/22.0.50 (i386-apple-darwin8.6.1) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Cc: arch@freebsd.org Subject: Re: sosend/soreceive consistency improvements X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Jul 2006 00:40:10 -0000 At Mon, 24 Jul 2006 01:27:58 +0100 (BST), rwatson wrote: > I have plans to add a socket(9) man page, but because I'm still > tearing things up, I've deferred doing that. I've started > increasing the number of notes in uipc_socket.c in order to document > some of the things that will eventually be in socket(9). Works for me. Later, George From owner-freebsd-arch@FreeBSD.ORG Mon Jul 24 17:32:15 2006 Return-Path: X-Original-To: arch@freebsd.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D469B16A4DF; Mon, 24 Jul 2006 17:32:15 +0000 (UTC) (envelope-from brooks@lor.one-eyed-alien.net) Received: from sccmmhc91.asp.att.net (sccmmhc91.asp.att.net [204.127.203.211]) by mx1.FreeBSD.org (Postfix) with ESMTP id 228B343D46; Mon, 24 Jul 2006 17:32:14 +0000 (GMT) (envelope-from brooks@lor.one-eyed-alien.net) Received: from lor.one-eyed-alien.net ([12.207.12.9]) by sccmmhc91.asp.att.net (sccmmhc91) with ESMTP id <20060724173209m910087fine>; Mon, 24 Jul 2006 17:32:09 +0000 Received: from lor.one-eyed-alien.net (localhost [127.0.0.1]) by lor.one-eyed-alien.net (8.13.6/8.13.6) with ESMTP id k6OHW6jG093142; Mon, 24 Jul 2006 12:32:06 -0500 (CDT) (envelope-from brooks@lor.one-eyed-alien.net) Received: (from brooks@localhost) by lor.one-eyed-alien.net (8.13.6/8.13.6/Submit) id k6OHW64Q093141; Mon, 24 Jul 2006 12:32:06 -0500 (CDT) (envelope-from brooks) Date: Mon, 24 Jul 2006 12:32:05 -0500 From: Brooks Davis To: Robert Watson Message-ID: <20060724173205.GD91329@lor.one-eyed-alien.net> References: <20060723171734.K35186@fledge.watson.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="C1iGAkRnbeBonpVg" Content-Disposition: inline In-Reply-To: <20060723171734.K35186@fledge.watson.org> User-Agent: Mutt/1.5.11 Cc: arch@freebsd.org Subject: Re: sosend/soreceive consistency improvements X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Jul 2006 17:32:16 -0000 --C1iGAkRnbeBonpVg Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Jul 23, 2006 at 07:57:56PM +0100, Robert Watson wrote: >=20 > As part of cleanups, locking, and optimization work, I've been looking at= =20 > the socket send and receive paths. >=20 > In the past, work was done do allow the uio/mbuf chain send and receive= =20 > paths (sosend, soreceive) to be pluggable for a protocol, so that the=20 > protocol could provide substitute implementations. This is not, generall= y,=20 > currently used, although I recently changed UDP to use an optimized=20 > datagram send routine. This pluggability is made possible by virtue of ea= ch=20 > protocol providing its own pru_sosend() and pru_soreceive() methods in th= e=20 > protocol switch. >=20 > There's another side to the pluggability, however -- the socket consumers= =20 > in the kernel, of which there are quite a few -- obviously the socket=20 > system calls, but also netgraph, distributed file systems, etc. Some of= =20 > these consumers have been modified to call=20 > so->so_proto->pr_usrreqs->pru_soreceive and ...->pru_sosend, but it turns= =20 > out many haven't. New references to sosend() and soreceive() periodicall= y=20 > get encoded into consumers -- presumably because they are easy to spell,= =20 > and in fact are generally functionally identical. But not always! It=20 > turns out that the NFS code isn't using the optimized UDP send path via= =20 > sosend_dgram(), because it's calling sosend() directly. >=20 > Rather than continue in this "in between state", in which the uio/mbuf=20 > chain sosend and soreceive are reached via the protocol switch in each=20 > occurrence, I propose a change: sosend() and soreceive() will now be the= =20 > formal APIs for sending and receiveing on sockets within the kernel, as i= s=20 > the case with many other so*() functions, and they will perform the=20 > protocol switch dereference. The existing functions are renamed to=20 > sosend_generic() and soreceive_generic(), and in most cases are never=20 > referenced by protocols since our protocol domain registration already us= es=20 > sosend() and soreceive() as the defaults today. The new code strikes me = as=20 > quite a bit more readable, and likely easier for socket consumers to use. >=20 > Any thoughts and/or objections? Makes sense to me. Is there an measurable performance impact? I wouldn't really expect much if any, but it's probably worth a check. The function is a fairly obvious target for inlining if there is any. -- Brooks --C1iGAkRnbeBonpVg Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (FreeBSD) iD8DBQFExQQVXY6L6fI4GtQRAjGAAKCjDjqRR/cWKa4nQGd4afgyLUlIKgCfUp1k SyoZWWIxUgs4/Afn+mVlllg= =lV/a -----END PGP SIGNATURE----- --C1iGAkRnbeBonpVg-- From owner-freebsd-arch@FreeBSD.ORG Tue Jul 25 10:14:10 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 59EA916A4DF for ; Tue, 25 Jul 2006 10:14:10 +0000 (UTC) (envelope-from jdom-commits-bounces@jdom.org) Received: from servlets.kattare.com (servlets.com [65.212.180.182]) by mx1.FreeBSD.org (Postfix) with ESMTP id 09D9E43D49 for ; Tue, 25 Jul 2006 10:14:09 +0000 (GMT) (envelope-from jdom-commits-bounces@jdom.org) Received: from servlets.kattare.com (localhost [127.0.0.1]) by servlets.kattare.com (8.12.10/8.12.11) with ESMTP id k6PA9mED019349 for ; Tue, 25 Jul 2006 03:09:48 -0700 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit From: jdom-commits-bounces@jdom.org To: freebsd-arch@freebsd.org Message-ID: Date: Tue, 25 Jul 2006 03:09:47 -0700 Precedence: bulk X-BeenThere: jdom-commits@jdom.org X-Mailman-Version: 2.1.5 X-List-Administrivia: yes Sender: jdom-commits-bounces@jdom.org Errors-To: jdom-commits-bounces@jdom.org Subject: Your message to jdom-commits awaits moderator approval X-BeenThere: freebsd-arch@freebsd.org List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Jul 2006 10:14:10 -0000 Your mail to 'jdom-commits' with the subject Returned mail: see transcript for details Is being held until the list moderator can review it for approval. The reason it is being held: Post by non-member to a members-only list Either the message will get posted to the list, or you will receive notification of the moderator's decision. If you would like to cancel this posting, please visit the following URL: http://www.jdom.org/mailman/confirm/jdom-commits/4efeea91c0f3c92fa4294bd262a6a89d173455e3 From owner-freebsd-arch@FreeBSD.ORG Tue Jul 25 15:13:14 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 674BE16A51C for ; Tue, 25 Jul 2006 15:13:14 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from wx-out-0102.google.com (wx-out-0102.google.com [66.249.82.199]) by mx1.FreeBSD.org (Postfix) with ESMTP id AEA6343D45 for ; Tue, 25 Jul 2006 15:13:13 +0000 (GMT) (envelope-from asmrookie@gmail.com) Received: by wx-out-0102.google.com with SMTP id i31so976848wxd for ; Tue, 25 Jul 2006 08:13:13 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:cc:mime-version:content-type:content-transfer-encoding:content-disposition:x-google-sender-auth; b=P2EVHfsFDdJWzxFj9Zec/KTAwcbyaqtlAIzVPyTxvF4S5lSFBnRxm8gE0ltxne5zGMMQ272qRVSX8+58VGbxPO96jGKbQSpmwjA+71cD0y/8Qnj238gG+c1LxfrzWsnnyzAxdD5xFPiV5dJa2KrgWizWJuLlqGeu6/Ey7RyEdBw= Received: by 10.70.44.5 with SMTP id r5mr820589wxr; Tue, 25 Jul 2006 08:13:12 -0700 (PDT) Received: by 10.70.11.18 with HTTP; Tue, 25 Jul 2006 08:13:12 -0700 (PDT) Message-ID: <3bbf2fe10607250813w8ff9e34pc505bf290e71758@mail.gmail.com> Date: Tue, 25 Jul 2006 17:13:12 +0200 From: "Attilio Rao" Sender: asmrookie@gmail.com To: freebsd-arch@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Google-Sender-Auth: cc91b2f89e4d364c Cc: Subject: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Jul 2006 15:13:14 -0000 Hi, Intel documentation points out that having a 128-bytes aligned syncronizing primitive (which fits in a cache line) will minimize the traffic for cache bus, so this patch implements an alignment for i386 on turnstiles. Any comments, feedbacks? Attilio PS: Using __aligned on MI code is usually a bad practice, but please note that the case !__i386__ is not affected (as you can see in the patch) -- Peace can only be achieved by understanding - A. Einstein From owner-freebsd-arch@FreeBSD.ORG Tue Jul 25 15:14:51 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0033016A4E0 for ; Tue, 25 Jul 2006 15:14:50 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from ug-out-1314.google.com (ug-out-1314.google.com [66.249.92.173]) by mx1.FreeBSD.org (Postfix) with ESMTP id A4A4743D67 for ; Tue, 25 Jul 2006 15:14:49 +0000 (GMT) (envelope-from asmrookie@gmail.com) Received: by ug-out-1314.google.com with SMTP id m2so2884769uge for ; Tue, 25 Jul 2006 08:14:48 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=oXxVFjd/DX0rB28+5fSudbjlX6CpcOPBKzSgnIwhfBdMz1CTTJm0IySTxtY1KkvN8BCfYCG1Xyx/ozAhyCuDUGgD7NPAH3JMYMiaTy0UH4zkdBTrs6lWaU17spMgf6EtI0YcNnRVcqB4og1Gwss72wiUXS0ap5L1HSQlcAYRw3c= Received: by 10.82.123.16 with SMTP id v16mr45427buc; Tue, 25 Jul 2006 08:14:48 -0700 (PDT) Received: by 10.70.11.18 with HTTP; Tue, 25 Jul 2006 08:14:47 -0700 (PDT) Message-ID: <3bbf2fe10607250814m1a476f09p2d962dedc0c99be1@mail.gmail.com> Date: Tue, 25 Jul 2006 17:14:47 +0200 From: "Attilio Rao" Sender: asmrookie@gmail.com To: freebsd-arch@freebsd.org In-Reply-To: <3bbf2fe10607250813w8ff9e34pc505bf290e71758@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_7543_7531642.1153840487712" References: <3bbf2fe10607250813w8ff9e34pc505bf290e71758@mail.gmail.com> X-Google-Sender-Auth: dda0f5246fbad58e Cc: Subject: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Jul 2006 15:14:51 -0000 ------=_Part_7543_7531642.1153840487712 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline 2006/7/25, Attilio Rao : > Hi, > Intel documentation points out that having a 128-bytes aligned > syncronizing primitive (which fits in a cache line) will minimize the > traffic for cache bus, so this patch implements an alignment for i386 > on turnstiles. > > Any comments, feedbacks? Oh, sorry, I've unforgotten the diff. Attilio -- Peace can only be achieved by understanding - A. Einstein ------=_Part_7543_7531642.1153840487712 Content-Type: application/octet-stream; name=subr_turnstile.diff Content-Transfer-Encoding: base64 X-Attachment-Id: f_eq2w67e1 Content-Disposition: attachment; filename="subr_turnstile.diff" LS0tIHN1YnJfdHVybnN0aWxlLmMJV2VkIEp1bCAyNiAwMToxMDozMyAyMDA2CisrKyBwYXRjaC9z dWJyX3R1cm5zdGlsZS5jCVdlZCBKdWwgMjYgMDE6MTQ6MjEgMjAwNgpAQCAtODEsNiArODEsMTgg QEAKICNlbmRpZgogCiAvKgorICogRm9yIHRoZSBpMzg2IHByb2Nlc3NvcnMgZmFtaWx5LCBoYXZp bmcgYSAxMjgtYnl0ZXMgYWxpZ25lZCB0dXJuc3RpbGUKKyAqICh3aGljaCBleGFjdGx5IGZpdHMg aW4gYSBjYWNoZWxpbmUpIHdvdWxkIG1pbmltaXplIGNhY2hlL21lbW9yeQorICogdHJhZmZpYyBm b3IgdHVybnN0aWxlIG1vdmVzIGluIFNNUCBlbnZpcm9ubWVudC4gSGF2aW5nIGEKKyAqIGxvd2Vz dC1hbGlnbmVkIGJ5dGUgc3RydWN0dXJlIHdpbGwgYXNzdXJlIHRvIG5vdCBhZmZlY3Qgb3RoZXIg YXJjaHMuCisgKi8KKyNpZiBkZWZpbmVkKF9faTM4Nl9fKSAmJiBkZWZpbmVkKFNNUCkKKyNkZWZp bmUJVFVSTlNUSUxFX0FMSUdOCTB4ODAKKyNlbHNlCisjZGVmaW5lCVRVUk5TVElMRV9BTElHTgkw eDAxCisjZW5kaWYKKworLyoKICAqIENvbnN0YW50cyBmb3IgdGhlIGhhc2ggdGFibGUgb2YgdHVy bnN0aWxlIGNoYWlucy4gIFRDX1NISUZUIGlzIGEgbWFnaWMKICAqIG51bWJlciBjaG9zZW4gYmVj YXVzZSB0aGUgc2xlZXAgcXVldWUncyB1c2UgdGhlIHNhbWUgdmFsdWUgZm9yIHRoZQogICogc2hp ZnQuICBCYXNpY2FsbHksIHdlIGlnbm9yZSB0aGUgbG93ZXIgOCBiaXRzIG9mIHRoZSBhZGRyZXNz LgpAQCAtMTIwLDcgKzEzMiw3IEBACiAJTElTVF9IRUFEKCwgdHVybnN0aWxlKSB0c19mcmVlOwkJ LyogKGMpIEZyZWUgdHVybnN0aWxlcy4gKi8KIAlzdHJ1Y3QgbG9ja19vYmplY3QgKnRzX2xvY2tv Ymo7CQkvKiAoYykgTG9jayB3ZSByZWZlcmVuY2UuICovCiAJc3RydWN0IHRocmVhZCAqdHNfb3du ZXI7CQkvKiAoYyArIHEpIFdobyBvd25zIHRoZSBsb2NrLiAqLwotfTsKK30gX19hbGlnbmVkKFRV Uk5TVElMRV9BTElHTik7CiAKIHN0cnVjdCB0dXJuc3RpbGVfY2hhaW4gewogCUxJU1RfSEVBRCgs IHR1cm5zdGlsZSkgdGNfdHVybnN0aWxlczsJLyogTGlzdCBvZiB0dXJuc3RpbGVzLiAqLwo= ------=_Part_7543_7531642.1153840487712-- From owner-freebsd-arch@FreeBSD.ORG Tue Jul 25 16:34:06 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 381C816A4DA; Tue, 25 Jul 2006 16:34:05 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (66-23-211-162.clients.speedfactory.net [66.23.211.162]) by mx1.FreeBSD.org (Postfix) with ESMTP id B25F743D73; Tue, 25 Jul 2006 16:34:04 +0000 (GMT) (envelope-from jhb@freebsd.org) Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1]) (authenticated bits=0) by server.baldwin.cx (8.13.4/8.13.4) with ESMTP id k6PGY0UE036542; Tue, 25 Jul 2006 12:34:03 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: "Attilio Rao" Date: Tue, 25 Jul 2006 12:32:50 -0400 User-Agent: KMail/1.9.1 References: <3bbf2fe10607250813w8ff9e34pc505bf290e71758@mail.gmail.com> <3bbf2fe10607250814m1a476f09p2d962dedc0c99be1@mail.gmail.com> In-Reply-To: <3bbf2fe10607250814m1a476f09p2d962dedc0c99be1@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200607251232.51230.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]); Tue, 25 Jul 2006 12:34:04 -0400 (EDT) X-Virus-Scanned: ClamAV 0.87.1/1618/Mon Jul 24 21:12:40 2006 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.0 X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on server.baldwin.cx Cc: freebsd-arch@freebsd.org Subject: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Jul 2006 16:34:06 -0000 On Tuesday 25 July 2006 11:14, Attilio Rao wrote: > 2006/7/25, Attilio Rao : > > Hi, > > Intel documentation points out that having a 128-bytes aligned > > syncronizing primitive (which fits in a cache line) will minimize the > > traffic for cache bus, so this patch implements an alignment for i386 > > on turnstiles. > > > > Any comments, feedbacks? > > Oh, sorry, I've unforgotten the diff. > > Attilio I think a better approach would be to stick turnstiles (and sleepqueues) in a UMA zone and specify cache-size alignment to the zone. However, turnstiles aren't really sychronization primitives in that you don't spin on a variable inside the structure, and I think it's the spinning and avoiding bouncing cache lines around that Intel's documentation is really about. In that case, the things you want aligned are things like mutexes, rwlocks, etc. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Tue Jul 25 17:04:35 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9E1F616A4DF for ; Tue, 25 Jul 2006 17:04:35 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from wx-out-0102.google.com (wx-out-0102.google.com [66.249.82.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8872A43D45 for ; Tue, 25 Jul 2006 17:04:34 +0000 (GMT) (envelope-from asmrookie@gmail.com) Received: by wx-out-0102.google.com with SMTP id i31so992751wxd for ; Tue, 25 Jul 2006 10:04:33 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=JxDGMjP2DlFRNxY9wT8MXU/f8G6GbjHJAVPrkwIvUZ8ndalMjklo3sbf9GTk0PzTifmJ0R/62fQhNuPVJ1f1wGBnkDCu332ex50tiZG75Vq60jReRjYB06RYtXDsq+xhrOK3uCoFS95KgOiJOVmbMua++isngLGeQohRihtazbk= Received: by 10.70.84.16 with SMTP id h16mr6805220wxb; Tue, 25 Jul 2006 10:04:33 -0700 (PDT) Received: by 10.70.11.18 with HTTP; Tue, 25 Jul 2006 10:04:33 -0700 (PDT) Message-ID: <3bbf2fe10607251004wf94e238xb5ea7a31c973817f@mail.gmail.com> Date: Tue, 25 Jul 2006 19:04:33 +0200 From: "Attilio Rao" Sender: asmrookie@gmail.com To: "John Baldwin" In-Reply-To: <200607251232.51230.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <3bbf2fe10607250813w8ff9e34pc505bf290e71758@mail.gmail.com> <3bbf2fe10607250814m1a476f09p2d962dedc0c99be1@mail.gmail.com> <200607251232.51230.jhb@freebsd.org> X-Google-Sender-Auth: 125ef96a6c39eae2 Cc: freebsd-arch@freebsd.org Subject: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Jul 2006 17:04:35 -0000 2006/7/25, John Baldwin : > On Tuesday 25 July 2006 11:14, Attilio Rao wrote: > > 2006/7/25, Attilio Rao : > > > Hi, > > > Intel documentation points out that having a 128-bytes aligned > > > syncronizing primitive (which fits in a cache line) will minimize the > > > traffic for cache bus, so this patch implements an alignment for i386 > > > on turnstiles. > > > > > > Any comments, feedbacks? > > > > Oh, sorry, I've unforgotten the diff. > > > > Attilio > > I think a better approach would be to stick turnstiles (and sleepqueues) in a > UMA zone and specify cache-size alignment to the zone. However, turnstiles > aren't really sychronization primitives in that you don't spin on a variable > inside the structure, and I think it's the spinning and avoiding bouncing > cache lines around that Intel's documentation is really about. In that case, > the things you want aligned are things like mutexes, rwlocks, etc. Well, I think that this is referred in particular to the latter issue you mentioned. Spinning is not really concerned to cache bus issues (more, in particular, datapath latency). With this point of view, turnstiles (as sleepqueues) are passed around CPUs more than a mutex/rwlock (or a cv), I guess, so I was thinking that it's better optimizing turnstile than the real syncronizing primitive itself. Attilio -- Peace can only be achieved by understanding - A. Einstein From owner-freebsd-arch@FreeBSD.ORG Wed Jul 26 18:27:03 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E975E16A4E5 for ; Wed, 26 Jul 2006 18:27:03 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from wx-out-0102.google.com (wx-out-0102.google.com [66.249.82.204]) by mx1.FreeBSD.org (Postfix) with ESMTP id D3D0043D55 for ; Wed, 26 Jul 2006 18:27:02 +0000 (GMT) (envelope-from asmrookie@gmail.com) Received: by wx-out-0102.google.com with SMTP id s18so1142348wxc for ; Wed, 26 Jul 2006 11:27:01 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=H1+zjsNuBWJvyuLliL1I+6ClPbNgCwlhptBibOB6d+5mELKeIZ094fsBM+bKmOQTtGGDfMqNg/1tpjSiVMDQIEBLS0cYHixB8qDXof5RkQVj8Rr9EN6sL0WcDjB30abQbVHvVuyZmPI3yYyxzUNH2qwnsrSYATtxbPd2zOw8ue0= Received: by 10.70.115.17 with SMTP id n17mr6990443wxc; Wed, 26 Jul 2006 11:27:01 -0700 (PDT) Received: by 10.70.11.18 with HTTP; Wed, 26 Jul 2006 11:27:01 -0700 (PDT) Message-ID: <3bbf2fe10607261127p3f01a6c3w80027754f7d4e594@mail.gmail.com> Date: Wed, 26 Jul 2006 20:27:01 +0200 From: "Attilio Rao" Sender: asmrookie@gmail.com To: "John Baldwin" In-Reply-To: <3bbf2fe10607251004wf94e238xb5ea7a31c973817f@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <3bbf2fe10607250813w8ff9e34pc505bf290e71758@mail.gmail.com> <3bbf2fe10607250814m1a476f09p2d962dedc0c99be1@mail.gmail.com> <200607251232.51230.jhb@freebsd.org> <3bbf2fe10607251004wf94e238xb5ea7a31c973817f@mail.gmail.com> X-Google-Sender-Auth: d3e34ec819ecb3ac Cc: freebsd-arch@freebsd.org Subject: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Jul 2006 18:27:04 -0000 2006/7/25, Attilio Rao : > 2006/7/25, John Baldwin : > > On Tuesday 25 July 2006 11:14, Attilio Rao wrote: > > > 2006/7/25, Attilio Rao : > > > > Hi, > > > > Intel documentation points out that having a 128-bytes aligned > > > > syncronizing primitive (which fits in a cache line) will minimize the > > > > traffic for cache bus, so this patch implements an alignment for i386 > > > > on turnstiles. > > > > > > > > Any comments, feedbacks? > > > > > > Oh, sorry, I've unforgotten the diff. > > > > > > Attilio > > > > I think a better approach would be to stick turnstiles (and sleepqueues) in a > > UMA zone and specify cache-size alignment to the zone. However, turnstiles > > aren't really sychronization primitives in that you don't spin on a variable > > inside the structure, and I think it's the spinning and avoiding bouncing > > cache lines around that Intel's documentation is really about. In that case, > > the things you want aligned are things like mutexes, rwlocks, etc. > > Well, I think that this is referred in particular to the latter issue > you mentioned. > Spinning is not really concerned to cache bus issues (more, in > particular, datapath latency). > With this point of view, turnstiles (as sleepqueues) are passed around > CPUs more than a mutex/rwlock (or a cv), I guess, so I was thinking > that it's better optimizing turnstile than the real syncronizing > primitive itself. This is a patch which let turnstiles/sleepqueues using an UMA zone. I've tried in my 6.1R branch and it works quite fine, so this HEAD version might be alright (I've not tried yet, so please test): http://users.gufi.org/~rookie/works/patches/uma_sync.diff It, obviously, set default alignment for i386 at 128 bytes. Any comments, feedbacks, ideas, are welcome. Attilio PS: I know that I could simplify *_alloc(), *_free() routines implementing init/fini but it is simpler and more optimized having things like so. -- Peace can only be achieved by understanding - A. Einstein From owner-freebsd-arch@FreeBSD.ORG Fri Jul 28 17:05:14 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0DB5B16A505 for ; Fri, 28 Jul 2006 17:05:14 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from wx-out-0102.google.com (wx-out-0102.google.com [66.249.82.194]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4653843D7E for ; Fri, 28 Jul 2006 17:04:23 +0000 (GMT) (envelope-from asmrookie@gmail.com) Received: by wx-out-0102.google.com with SMTP id t5so300878wxc for ; Fri, 28 Jul 2006 10:04:22 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=ApHCH6f9+0sWsQhWEGddvzoyDONUstLlshtbvyk7VT6Pd4XAzcY9dfQ3h9YNKdMYm1QfqQSWNslXkLpP1rsLM7nYrtuaoPpMwMSMMyexvRZiMEQ8lZw0mpAO6aiftjPK9x4jwemk87gVteDYj7VaHA6E5Q2++aqmoro9eiJ8MnY= Received: by 10.70.38.9 with SMTP id l9mr2439224wxl; Fri, 28 Jul 2006 10:04:22 -0700 (PDT) Received: by 10.70.11.18 with HTTP; Fri, 28 Jul 2006 10:04:22 -0700 (PDT) Message-ID: <3bbf2fe10607281004o6727e976h19ee7e054876f914@mail.gmail.com> Date: Fri, 28 Jul 2006 19:04:22 +0200 From: "Attilio Rao" Sender: asmrookie@gmail.com To: "John Baldwin" In-Reply-To: <3bbf2fe10607261127p3f01a6c3w80027754f7d4e594@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <3bbf2fe10607250813w8ff9e34pc505bf290e71758@mail.gmail.com> <3bbf2fe10607250814m1a476f09p2d962dedc0c99be1@mail.gmail.com> <200607251232.51230.jhb@freebsd.org> <3bbf2fe10607251004wf94e238xb5ea7a31c973817f@mail.gmail.com> <3bbf2fe10607261127p3f01a6c3w80027754f7d4e594@mail.gmail.com> X-Google-Sender-Auth: fdef080e21fdb949 Cc: freebsd-arch@freebsd.org Subject: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Jul 2006 17:05:14 -0000 2006/7/26, Attilio Rao : > 2006/7/25, Attilio Rao : > > 2006/7/25, John Baldwin : > > > On Tuesday 25 July 2006 11:14, Attilio Rao wrote: > > > > 2006/7/25, Attilio Rao : > > > > > Hi, > > > > > Intel documentation points out that having a 128-bytes aligned > > > > > syncronizing primitive (which fits in a cache line) will minimize the > > > > > traffic for cache bus, so this patch implements an alignment for i386 > > > > > on turnstiles. > > > > > > > > > > Any comments, feedbacks? > > > > > > > > Oh, sorry, I've unforgotten the diff. > > > > > > > > Attilio > > > > > > I think a better approach would be to stick turnstiles (and sleepqueues) in a > > > UMA zone and specify cache-size alignment to the zone. However, turnstiles > > > aren't really sychronization primitives in that you don't spin on a variable > > > inside the structure, and I think it's the spinning and avoiding bouncing > > > cache lines around that Intel's documentation is really about. In that case, > > > the things you want aligned are things like mutexes, rwlocks, etc. > > > > Well, I think that this is referred in particular to the latter issue > > you mentioned. > > Spinning is not really concerned to cache bus issues (more, in > > particular, datapath latency). > > With this point of view, turnstiles (as sleepqueues) are passed around > > CPUs more than a mutex/rwlock (or a cv), I guess, so I was thinking > > that it's better optimizing turnstile than the real syncronizing > > primitive itself. > > This is a patch which let turnstiles/sleepqueues using an UMA zone. > > I've tried in my 6.1R branch and it works quite fine, so this HEAD > version might be alright (I've not tried yet, so please test): > http://users.gufi.org/~rookie/works/patches/uma_sync.diff > > It, obviously, set default alignment for i386 at 128 bytes. > Any comments, feedbacks, ideas, are welcome. > > Attilio > > PS: I know that I could simplify *_alloc(), *_free() routines > implementing init/fini but it is simpler and more optimized having > things like so. After some thinking, I think it's better using init/fini methods (since they hide the sizeof(struct turnstile) with size parameter). Feedbacks and comments are welcome: http://users.gufi.org/~rookie/works/patches/uma_sync_init.diff Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein