From owner-freebsd-arch@FreeBSD.ORG Mon Jun 3 10:06:24 2013 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 5D7C880C for ; Mon, 3 Jun 2013 10:06:24 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.69.10]) by mx1.freebsd.org (Postfix) with ESMTP id 852A51152 for ; Mon, 3 Jun 2013 10:06:19 +0000 (UTC) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.14.6/8.14.6) with ESMTP id r53A6Ia3084200 for ; Mon, 3 Jun 2013 14:06:18 +0400 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.14.6/8.14.6/Submit) id r53A6IMC084199 for arch@FreeBSD.org; Mon, 3 Jun 2013 14:06:18 +0400 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Mon, 3 Jun 2013 14:06:18 +0400 From: Gleb Smirnoff To: arch@FreeBSD.org Subject: aio_mlock(2) system call Message-ID: <20130603100618.GH67170@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="KR/qxknboQ7+Tpez" Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Jun 2013 10:06:24 -0000 --KR/qxknboQ7+Tpez Content-Type: text/plain; charset=koi8-r Content-Disposition: inline Hello! This patch brings a new system call - aio_mlock(2). The idea is quite clear from its name: it performs mlock(2), which can take a long time if pages aren't resident, under aio(4) control. The patch is quite simple, and non-desctructive. Here it is for your review. If no one objects, I'd like to add it to FreeBSD 10. -- Totus tuus, Glebius. --KR/qxknboQ7+Tpez Content-Type: text/x-diff; charset=koi8-r Content-Disposition: attachment; filename="aio_mlock.diff" Index: lib/libc/sys/Makefile.inc =================================================================== --- lib/libc/sys/Makefile.inc (revision 251294) +++ lib/libc/sys/Makefile.inc (working copy) @@ -85,6 +85,7 @@ MAN+= abort2.2 \ adjtime.2 \ aio_cancel.2 \ aio_error.2 \ + aio_mlock.2 \ aio_read.2 \ aio_return.2 \ aio_suspend.2 \ Index: lib/libc/sys/Symbol.map =================================================================== --- lib/libc/sys/Symbol.map (revision 251294) +++ lib/libc/sys/Symbol.map (working copy) @@ -378,6 +378,7 @@ FBSD_1.2 { }; FBSD_1.3 { + aio_mlock; accept4; bindat; cap_fcntls_get; Index: lib/libc/sys/aio_mlock.2 =================================================================== --- lib/libc/sys/aio_mlock.2 (revision 0) +++ lib/libc/sys/aio_mlock.2 (working copy) @@ -0,0 +1,133 @@ +.\" Copyright (c) 2013 Gleb Smirnoff +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.Dd June 3, 2013 +.Dt AIO_MLOCK 2 +.Os +.Sh NAME +.Nm aio_mlock +.Nd asynchronous +.Xr mlock 2 +operation +.Sh LIBRARY +.Lb libc +.Sh SYNOPSIS +.In aio.h +.Ft int +.Fn aio_mlock "struct aiocb *iocb" +.Sh DESCRIPTION +The +.Fn aio_mlock +system call allows the calling process to lock into memory the +physical pages associated with the virtual address range starting at +.Fa iocb->aio_buf +for +.Fa iocb->aio_nbytes +bytes. +The call returns immediately after the locking request has +been enqueued; the operation may or may not have completed at the time +the call returns. +.Pp +The +.Fa iocb +pointer may be subsequently used as an argument to +.Fn aio_return +and +.Fn aio_error +in order to determine return or error status for the enqueued operation +while it is in progress. +.Pp +If the request could not be enqueued (generally due to +.Xr aio 4 +limits), +then the call returns without having enqueued the request. +.Sh RESTRICTIONS +The Asynchronous I/O Control Block structure pointed to by +.Fa iocb +and the buffer that the +.Fa iocb->aio_buf +member of that structure references must remain valid until the +operation has completed. +For this reason, use of auto (stack) variables +for these objects is discouraged. +.Pp +The asynchronous I/O control buffer +.Fa iocb +should be zeroed before the +.Fn aio_mlock +call to avoid passing bogus context information to the kernel. +.Pp +Modifications of the Asynchronous I/O Control Block structure or the +buffer contents after the request has been enqueued, but before the +request has completed, are not allowed. +.Sh RETURN VALUES +.Rv -std aio_mlock +.Sh ERRORS +The +.Fn aio_read +system call will fail if: +.Bl -tag -width Er +.It Bq Er EAGAIN +The request was not queued because of system resource limitations. +.It Bq Er ENOSYS +The +.Fn aio_mlock +system call is not supported. +.El +.Pp +If the request is successfully enqueued, but subsequently cancelled +or an error occurs, the value returned by the +.Fn aio_return +system call is per the +.Xr mlock 2 +system call, and the value returned by the +.Fn aio_error +system call is one of the error returns from the +.Xr mlock 2 +system call, or +.Er ECANCELED +if the request was explicitly cancelled via a call to +.Fn aio_cancel . +.Sh SEE ALSO +.Xr aio_cancel 2 , +.Xr aio_error 2 , +.Xr aio_return 2 , +.Xr aio 4 , +.Xr mlock 2 +.Sh PORTABILITY +The +.Fn aio_mlock +system call is a +.Fx +extension, and shouldn't be used in portable code. +.Sh HISTORY +The +.Fn aio_mlock +system call first appeared in +.Fx 10.0 . +.Sh AUTHORS +The system call was introduced by +.An Gleb Smirnoff Aq glebius@FreeBSD.org . Property changes on: lib/libc/sys/aio_mlock.2 ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Index: sys/compat/freebsd32/syscalls.master =================================================================== --- sys/compat/freebsd32/syscalls.master (revision 251294) +++ sys/compat/freebsd32/syscalls.master (working copy) @@ -476,7 +476,8 @@ 257 AUE_NULL NOSTD { int freebsd32_lio_listio(int mode, \ struct aiocb32 * const *acb_list, \ int nent, struct sigevent *sig); } -258 AUE_NULL UNIMPL nosys +258 AUE_NULL NOSTD { int freebsd32_aio_mlock( \ + struct aiocb32 *aiocbp); } 259 AUE_NULL UNIMPL nosys 260 AUE_NULL UNIMPL nosys 261 AUE_NULL UNIMPL nosys Index: sys/kern/syscalls.master =================================================================== --- sys/kern/syscalls.master (revision 251294) +++ sys/kern/syscalls.master (working copy) @@ -480,7 +480,7 @@ 257 AUE_NULL NOSTD { int lio_listio(int mode, \ struct aiocb * const *acb_list, \ int nent, struct sigevent *sig); } -258 AUE_NULL UNIMPL nosys +258 AUE_NULL NOSTD { int aio_mlock(struct aiocb *aiocbp); } 259 AUE_NULL UNIMPL nosys 260 AUE_NULL UNIMPL nosys 261 AUE_NULL UNIMPL nosys Index: sys/kern/vfs_aio.c =================================================================== --- sys/kern/vfs_aio.c (revision 251294) +++ sys/kern/vfs_aio.c (working copy) @@ -339,6 +339,8 @@ void aio_init_aioinfo(struct proc *p); static int aio_onceonly(void); static int aio_free_entry(struct aiocblist *aiocbe); static void aio_process(struct aiocblist *aiocbe); +static void aio_process_sync(struct aiocblist *aiocbe); +static void aio_process_mlock(struct aiocblist *aiocbe); static int aio_newproc(int *); int aio_aqueue(struct thread *td, struct aiocb *job, struct aioliojob *lio, int type, struct aiocb_ops *ops); @@ -425,6 +427,7 @@ static struct syscall_helper_data aio_syscalls[] = SYSCALL_INIT_HELPER(aio_cancel), SYSCALL_INIT_HELPER(aio_error), SYSCALL_INIT_HELPER(aio_fsync), + SYSCALL_INIT_HELPER(aio_mlock), SYSCALL_INIT_HELPER(aio_read), SYSCALL_INIT_HELPER(aio_return), SYSCALL_INIT_HELPER(aio_suspend), @@ -452,6 +455,7 @@ static struct syscall_helper_data aio32_syscalls[] SYSCALL32_INIT_HELPER(freebsd32_aio_cancel), SYSCALL32_INIT_HELPER(freebsd32_aio_error), SYSCALL32_INIT_HELPER(freebsd32_aio_fsync), + SYSCALL32_INIT_HELPER(freebsd32_aio_mlock), SYSCALL32_INIT_HELPER(freebsd32_aio_read), SYSCALL32_INIT_HELPER(freebsd32_aio_write), SYSCALL32_INIT_HELPER(freebsd32_aio_waitcomplete), @@ -701,7 +705,8 @@ aio_free_entry(struct aiocblist *aiocbe) * at open time, but this is already true of file descriptors in * a multithreaded process. */ - fdrop(aiocbe->fd_file, curthread); + if (aiocbe->fd_file) + fdrop(aiocbe->fd_file, curthread); crfree(aiocbe->cred); uma_zfree(aiocb_zone, aiocbe); AIO_LOCK(ki); @@ -855,10 +860,10 @@ drop: } /* - * The AIO processing activity. This is the code that does the I/O request for - * the non-physio version of the operations. The normal vn operations are used, - * and this code should work in all instances for every type of file, including - * pipes, sockets, fifos, and regular files. + * The AIO processing activity for LIO_READ/LIO_WRITE. This is the code that + * does the I/O request for the non-physio version of the operations. The + * normal vn operations are used, and this code should work in all instances + * for every type of file, including pipes, sockets, fifos, and regular files. * * XXX I don't think it works well for socket, pipe, and fifo. */ @@ -883,17 +888,6 @@ aio_process(struct aiocblist *aiocbe) cb = &aiocbe->uaiocb; fp = aiocbe->fd_file; - if (cb->aio_lio_opcode == LIO_SYNC) { - error = 0; - cnt = 0; - if (fp->f_vnode != NULL) - error = aio_fsync_vnode(td, fp->f_vnode); - cb->_aiocb_private.error = error; - cb->_aiocb_private.status = 0; - td->td_ucred = td_savedcred; - return; - } - aiov.iov_base = (void *)(uintptr_t)cb->aio_buf; aiov.iov_len = cb->aio_nbytes; @@ -954,6 +948,35 @@ aio_process(struct aiocblist *aiocbe) } static void +aio_process_sync(struct aiocblist *aiocbe) +{ + struct thread *td = curthread; + struct ucred *td_savedcred = td->td_ucred; + struct aiocb *cb = &aiocbe->uaiocb; + struct file *fp = aiocbe->fd_file; + int error = 0; + + td->td_ucred = aiocbe->cred; + if (fp->f_vnode != NULL) + error = aio_fsync_vnode(td, fp->f_vnode); + cb->_aiocb_private.error = error; + cb->_aiocb_private.status = 0; + td->td_ucred = td_savedcred; +} + +static void +aio_process_mlock(struct aiocblist *aiocbe) +{ + struct aiocb *cb = &aiocbe->uaiocb; + int error; + + error = vm_mlock(aiocbe->userproc, aiocbe->cred, + (void *)(uintptr_t)cb->aio_buf, cb->aio_nbytes); + cb->_aiocb_private.error = error; + cb->_aiocb_private.status = 0; +} + +static void aio_bio_done_notify(struct proc *userp, struct aiocblist *aiocbe, int type) { struct aioliojob *lj; @@ -1121,7 +1144,18 @@ aio_daemon(void *_id) ki = userp->p_aioinfo; /* Do the I/O function. */ - aio_process(aiocbe); + switch(aiocbe->uaiocb.aio_lio_opcode) { + case LIO_READ: + case LIO_WRITE: + aio_process(aiocbe); + break; + case LIO_SYNC: + aio_process_sync(aiocbe); + break; + case LIO_MLOCK: + aio_process_mlock(aiocbe); + break; + } mtx_lock(&aio_job_mtx); /* Decrement the active job count. */ @@ -1261,7 +1295,7 @@ aio_qphysio(struct proc *p, struct aiocblist *aioc cb = &aiocbe->uaiocb; fp = aiocbe->fd_file; - if (fp->f_type != DTYPE_VNODE) + if (fp == NULL || fp->f_type != DTYPE_VNODE) return (-1); vp = fp->f_vnode; @@ -1613,6 +1647,9 @@ aio_aqueue(struct thread *td, struct aiocb *job, s case LIO_SYNC: error = fget(td, fd, CAP_FSYNC, &fp); break; + case LIO_MLOCK: + fp = NULL; + break; case LIO_NOP: error = fget(td, fd, CAP_NONE, &fp); break; @@ -1670,7 +1707,8 @@ aio_aqueue(struct thread *td, struct aiocb *job, s error = kqfd_register(kqfd, &kev, td, 1); aqueue_fail: if (error) { - fdrop(fp, td); + if (fp) + fdrop(fp, td); uma_zfree(aiocb_zone, aiocbe); ops->store_error(job, error); goto done; @@ -1687,7 +1725,7 @@ no_kqueue: if (opcode == LIO_SYNC) goto queueit; - if (fp->f_type == DTYPE_SOCKET) { + if (fp && fp->f_type == DTYPE_SOCKET) { /* * Alternate queueing for socket ops: Reach down into the * descriptor to get the socket data. Then check to see if the @@ -2165,6 +2203,13 @@ sys_aio_write(struct thread *td, struct aio_write_ return (aio_aqueue(td, uap->aiocbp, NULL, LIO_WRITE, &aiocb_ops)); } +int +sys_aio_mlock(struct thread *td, struct aio_mlock_args *uap) +{ + + return (aio_aqueue(td, uap->aiocbp, NULL, LIO_MLOCK, &aiocb_ops)); +} + static int kern_lio_listio(struct thread *td, int mode, struct aiocb * const *uacb_list, struct aiocb **acb_list, int nent, struct sigevent *sig, @@ -2907,6 +2952,14 @@ freebsd32_aio_write(struct thread *td, struct free } int +freebsd32_aio_mlock(struct thread *td, struct freebsd32_aio_mlock_args *uap) +{ + + return (aio_aqueue(td, (struct aiocb *)uap->aiocbp, NULL, LIO_MLOCK, + &aiocb32_ops)); +} + +int freebsd32_aio_waitcomplete(struct thread *td, struct freebsd32_aio_waitcomplete_args *uap) { Index: sys/sys/aio.h =================================================================== --- sys/sys/aio.h (revision 251294) +++ sys/sys/aio.h (working copy) @@ -38,6 +38,7 @@ #ifdef _KERNEL #define LIO_SYNC 0x3 #endif +#define LIO_MLOCK 0x4 /* * LIO modes @@ -124,6 +125,11 @@ int aio_cancel(int, struct aiocb *); */ int aio_suspend(const struct aiocb * const[], int, const struct timespec *); +/* + * Asynchronous mlock + */ +int aio_mlock(struct aiocb *); + #ifdef __BSD_VISIBLE int aio_waitcomplete(struct aiocb **, struct timespec *); #endif Index: sys/vm/vm_extern.h =================================================================== --- sys/vm/vm_extern.h (revision 251294) +++ sys/vm/vm_extern.h (working copy) @@ -90,5 +90,6 @@ struct sf_buf *vm_imgact_map_page(vm_object_t obje void vm_imgact_unmap_page(struct sf_buf *sf); void vm_thread_dispose(struct thread *td); int vm_thread_new(struct thread *td, int pages); +int vm_mlock(struct proc *, struct ucred *, const void *, size_t); #endif /* _KERNEL */ #endif /* !_VM_EXTERN_H_ */ Index: sys/vm/vm_mmap.c =================================================================== --- sys/vm/vm_mmap.c (revision 251294) +++ sys/vm/vm_mmap.c (working copy) @@ -1036,18 +1036,24 @@ sys_mlock(td, uap) struct thread *td; struct mlock_args *uap; { - struct proc *proc; + + return (vm_mlock(td->td_proc, td->td_ucred, uap->addr, uap->len)); +} + +int +vm_mlock(struct proc *proc, struct ucred *cred, const void *addr0, size_t len) +{ vm_offset_t addr, end, last, start; vm_size_t npages, size; vm_map_t map; unsigned long nsize; int error; - error = priv_check(td, PRIV_VM_MLOCK); + error = priv_check_cred(cred, PRIV_VM_MLOCK, 0); if (error) return (error); - addr = (vm_offset_t)uap->addr; - size = uap->len; + addr = (vm_offset_t)addr0; + size = len; last = addr + size; start = trunc_page(addr); end = round_page(last); @@ -1056,7 +1062,6 @@ sys_mlock(td, uap) npages = atop(end - start); if (npages > vm_page_max_wired) return (ENOMEM); - proc = td->td_proc; map = &proc->p_vmspace->vm_map; PROC_LOCK(proc); nsize = ptoa(npages + pmap_wired_count(map->pmap)); --KR/qxknboQ7+Tpez-- From owner-freebsd-arch@FreeBSD.ORG Mon Jun 3 13:17:10 2013 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 5DA3052C for ; Mon, 3 Jun 2013 13:17:10 +0000 (UTC) (envelope-from oppermann@networx.ch) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id F3D3E1C9B for ; Mon, 3 Jun 2013 13:17:09 +0000 (UTC) Received: (qmail 90250 invoked from network); 3 Jun 2013 14:14:38 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 3 Jun 2013 14:14:38 -0000 Message-ID: <51AC9748.5070908@networx.ch> Date: Mon, 03 Jun 2013 15:16:56 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130509 Thunderbird/17.0.6 MIME-Version: 1.0 To: Gleb Smirnoff Subject: Re: aio_mlock(2) system call References: <20130603100618.GH67170@FreeBSD.org> In-Reply-To: <20130603100618.GH67170@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Jun 2013 13:17:10 -0000 On 03.06.2013 12:06, Gleb Smirnoff wrote: > Hello! > > This patch brings a new system call - aio_mlock(2). The idea is > quite clear from its name: it performs mlock(2), which can take > a long time if pages aren't resident, under aio(4) control. > > The patch is quite simple, and non-desctructive. Here it is > for your review. I didn't immediately see something about permissions to prevent normal users from easily exhausting all kernel memory. Since this is likely to be only used on dedicated servers it may be sufficient to have a global sysctl allowing its use for non-root users. -- Andre From owner-freebsd-arch@FreeBSD.ORG Mon Jun 3 13:23:21 2013 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id BA2D363B for ; Mon, 3 Jun 2013 13:23:21 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.69.10]) by mx1.freebsd.org (Postfix) with ESMTP id 49C1D1CDA for ; Mon, 3 Jun 2013 13:23:20 +0000 (UTC) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.14.6/8.14.6) with ESMTP id r53DNKHi085481; Mon, 3 Jun 2013 17:23:20 +0400 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.14.6/8.14.6/Submit) id r53DNKGX085480; Mon, 3 Jun 2013 17:23:20 +0400 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Mon, 3 Jun 2013 17:23:20 +0400 From: Gleb Smirnoff To: Andre Oppermann Subject: Re: aio_mlock(2) system call Message-ID: <20130603132320.GP67170@glebius.int.ru> References: <20130603100618.GH67170@FreeBSD.org> <51AC9748.5070908@networx.ch> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <51AC9748.5070908@networx.ch> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Jun 2013 13:23:21 -0000 On Mon, Jun 03, 2013 at 03:16:56PM +0200, Andre Oppermann wrote: A> > This patch brings a new system call - aio_mlock(2). The idea is A> > quite clear from its name: it performs mlock(2), which can take A> > a long time if pages aren't resident, under aio(4) control. A> > A> > The patch is quite simple, and non-desctructive. Here it is A> > for your review. A> A> I didn't immediately see something about permissions to prevent normal A> users from easily exhausting all kernel memory. A> A> Since this is likely to be only used on dedicated servers it may be A> sufficient to have a global sysctl allowing its use for non-root users. The aio thread uses credentials of the process that issued aio_mlock(), thus in terms of security semantics are equal to direct mlock() syscall. -- Totus tuus, Glebius. From owner-freebsd-arch@FreeBSD.ORG Mon Jun 3 14:04:24 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 10259199; Mon, 3 Jun 2013 14:04:24 +0000 (UTC) (envelope-from edschouten@gmail.com) Received: from mail-ve0-x230.google.com (mail-ve0-x230.google.com [IPv6:2607:f8b0:400c:c01::230]) by mx1.freebsd.org (Postfix) with ESMTP id BA5861ED6; Mon, 3 Jun 2013 14:04:23 +0000 (UTC) Received: by mail-ve0-f176.google.com with SMTP id c13so2783512vea.7 for ; Mon, 03 Jun 2013 07:04:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type; bh=5mOuUcOvJhlE7fzQiapnilkp6TFn7nTM41OAxUuOws8=; b=S4u1MiTYxZLIYpMI9pEq+Jgnv0tsImRWnKaIG7avhfuJiYtqN434gO4X4HJ737gdlE nYSAMAXZa28xvfZhYPwssZoOxY8xnQaurLMJ5eEteOXGQKZdj0pOs00Vu6b4n5LFzRYw fNY7ww7/hXYiIXD8OA8T62vMQ9acNHZ+EkxbPg+b4Nz8pjVX16FARtb/glX4rPMClYu4 m7bZFQX8UhcshntZ+L9N/9cuAA4bbTTSOztLacr+COUDC3oJ8LMm5FAEUZ/PU2DKiH+F 0nTmgR2pgUu8ZjeoyHZCWEzBz/ukncgtLMIkbbrsb8zDIThl0k3UKJ6BvV/OzR9MZxPL ddeA== MIME-Version: 1.0 X-Received: by 10.52.183.170 with SMTP id en10mr14491274vdc.5.1370268263228; Mon, 03 Jun 2013 07:04:23 -0700 (PDT) Sender: edschouten@gmail.com Received: by 10.220.107.139 with HTTP; Mon, 3 Jun 2013 07:04:23 -0700 (PDT) Date: Mon, 3 Jun 2013 16:04:23 +0200 X-Google-Sender-Auth: Inb7_DbHLzfo8UoJdPCaDKalzAk Message-ID: Subject: Kernelspace C11 atomics for MIPS From: Ed Schouten To: freebsd-mips@freebsd.org Content-Type: text/plain; charset=UTF-8 Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Jun 2013 14:04:24 -0000 Hi, As of r251230, it should be possible to use C11 atomics in kernelspace, by including ! Even when not using Clang (but GCC 4.2), it is possible to use quite a large portion of the API. A couple of limitations: - The memory order argument is simply ignored, making all the calls do a full memory barrier. - At least Clang allows you to do arithmetic on C11 atomics directly (e.g. "a += 5" == "atomic_fetch_add(&a, 5)"), which is of course not possible to mimick. - The atomic functions only work on 1,2,4,8-byte types, which is probably a good thing. Amazingly, it turns out that it most of the architectures, with the exception of ARM and MIPS. To make MIPS work, we need to implement some of the __sync_* functions that are described here: http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html Some time ago I already added some of these functions to our libcompiler-rt in userspace, to make atomics work there. Unfortunately, these functions were quite horribly implemented, as I tried to build them on top of , which is far from trivial/efficient. It is also restricted to 4 and 8-byte types. That's why I thought: why not spend some time learning MIPS assembly and write some decent implementations for these functions? The result: http://80386.nl/pub/mips-stdatomic.txt For now, please focus on sys/mips/mips/stdatomic.c. It implements all the __sync_* functions called by for 1, 2, 4 and 8 byte types. There is some testing code in there as well, which can be ignored. This code disassembles to the following: http://80386.nl/pub/mips-stdatomic-disasm.txt As I don't own a MIPS system myself, I was thinking about tinkering a bit with qemu to see whether these functions work properly. My questions are: - Does anyone have any comments on the C code and/or the machine code generated? Are there some nifty tricks I can apply to make the machine code more efficient that I am unaware o? - Is there anyone interested in testing this code a bit more thoroughly on physical hardware? - Would anyone mind if I committed this to HEAD? Thanks, -- Ed Schouten From owner-freebsd-arch@FreeBSD.ORG Mon Jun 3 16:13:03 2013 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 1F1C7BE8; Mon, 3 Jun 2013 16:13:03 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 8C90317E4; Mon, 3 Jun 2013 16:13:02 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id r53GCtlo070844; Mon, 3 Jun 2013 19:12:55 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua r53GCtlo070844 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id r53GCthv070843; Mon, 3 Jun 2013 19:12:55 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 3 Jun 2013 19:12:55 +0300 From: Konstantin Belousov To: Gleb Smirnoff Subject: Re: aio_mlock(2) system call Message-ID: <20130603161255.GM3047@kib.kiev.ua> References: <20130603100618.GH67170@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="jwY/bXrd6avG4scw" Content-Disposition: inline In-Reply-To: <20130603100618.GH67170@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Jun 2013 16:13:03 -0000 --jwY/bXrd6avG4scw Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jun 03, 2013 at 02:06:18PM +0400, Gleb Smirnoff wrote: > Hello! >=20 > This patch brings a new system call - aio_mlock(2). The idea is > quite clear from its name: it performs mlock(2), which can take > a long time if pages aren't resident, under aio(4) control. >=20 > The patch is quite simple, and non-desctructive. Here it is > for your review. >=20 > If no one objects, I'd like to add it to FreeBSD 10. I suggest to rename the aio_process() to aio_process_rw(). Also, it might make sense to assert aio_lio_opcode value on the entry to aio_process_*() functions. > +static void > +aio_process_mlock(struct aiocblist *aiocbe) > +{ > + struct aiocb *cb =3D &aiocbe->uaiocb; > + int error; > + > + error =3D vm_mlock(aiocbe->userproc, aiocbe->cred, > + (void *)(uintptr_t)cb->aio_buf, cb->aio_nbytes); This probably should be spelled __DEVOLATILE(). We traditionally do not reuse the gaps in the syscall table, but add new syscalls at the end. Did you tested the kqueue completion notifications with the aio_mlock() ? --jwY/bXrd6avG4scw Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (FreeBSD) iQIcBAEBAgAGBQJRrMCHAAoJEJDCuSvBvK1BgWAQAIybpSeMlA5tMd8xEfWodRzN jmF62DI9uh+ThrTHQ7cB0FXIZtvCEbWFMbvHNKo9Z9lE53Pld9QnQ3Y8xe7ccZ7Q OUWbdQbwlQ+B+bzjuGwciYaASdSJSiUHhUb32prDd2p1umRc0fGeeihb8PBEO6jM TLzAA0tK5UrcJvS7DgfQan2JdJlrjxrtyItvHuxUlhS8gnESACrD2WtenC6SlHrv gEpMohQKrQjCK9FPd1/s2k4aj25bsIB2LTJVoXrbmabP9/Mh1daaXbLWTMXf6BgY wyxhbXMmiAsqPDo2anggWi8YwN742nuWawEZGm7LxhCcEvqraCpFQQk2g3KOHA9l mSzYRIxCNImTpDzLCqHsgvjNqjT36VNGc/fY8VB6GfSAO28OIp7vatnFF4qL9m+P H/cH6uNmp0PHqpztoOUxZ3ykT4N/hxYHSH+wvUvJ8bK0PHGOGti26EraNYnH3WyJ 8IbMocB97olBC81yhpJN7kBOon/78NzPvA9Ymz6YdQUcpTzSnT+GKSpHQ5jrMgqP qiwMuzvoJhmJR2d+c/zJYwDrXYN75+U7xQyp2ghUOkg2yKE4HF1+g2ZFcHbw29xN HHDk/IWC4KcVjnJM/hZ2r4O+jL2U6d7F74oBq9WZstyHy6Y6KluTKzDmTcuL2JZG hM99hqXEZu4iZsQdxJ9o =n7ol -----END PGP SIGNATURE----- --jwY/bXrd6avG4scw-- From owner-freebsd-arch@FreeBSD.ORG Mon Jun 3 17:53:19 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D272E6D4 for ; Mon, 3 Jun 2013 17:53:19 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from mail-vc0-f173.google.com (mail-vc0-f173.google.com [209.85.220.173]) by mx1.freebsd.org (Postfix) with ESMTP id 8EDC81CFC for ; Mon, 3 Jun 2013 17:53:19 +0000 (UTC) Received: by mail-vc0-f173.google.com with SMTP id ht11so204305vcb.18 for ; Mon, 03 Jun 2013 10:53:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=zZ9JiRMvGJMFntbqO8iJk2bKrHzgv6TJ/3Bb1fz2rI4=; b=NgodFm6DO5PMwW9/jcelrFZNd2+t+RsEuf1qtv9HxX9x9s1j+3jQ0VVN7kseQ+vzZU 8jUYFtqUuWtK9LhkRmv6FwVZ1YzhNYqLCB6DM5EQnW19kYBg976NXRmpEhrcGpDxaCA1 JE1ayUmEwQfGsFd02GaJfUWZEstFMxNnmQ8w599Zd1sbHw5UdqzQkN2vXEJFhpzjlpWB h8+EpIX0nt1rzvLj+QGU6rfpRAVYKe2jcrC5B4R9pfIijV7zTq/cOATXxO/k/hJR2JCH Q2QSNUfYoUJT9ae0Ancu8VOKeMYpTNqcGP6ZRm8ct/zkDsjJaytwCnyCwy+LmQyEd0YQ lvBA== X-Received: by 10.52.30.14 with SMTP id o14mr13771998vdh.106.1370281993273; Mon, 03 Jun 2013 10:53:13 -0700 (PDT) Received: from monkey-bot.int.fusionio.com ([209.117.142.2]) by mx.google.com with ESMTPSA id s6sm37041931vdj.5.2013.06.03.10.53.11 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 03 Jun 2013 10:53:12 -0700 (PDT) Sender: Warner Losh Subject: Re: Kernelspace C11 atomics for MIPS Mime-Version: 1.0 (Apple Message framework v1085) Content-Type: text/plain; charset=us-ascii From: Warner Losh In-Reply-To: Date: Mon, 3 Jun 2013 11:53:09 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: Ed Schouten X-Mailer: Apple Mail (2.1085) X-Gm-Message-State: ALoCoQmlPepmLT2RF9vmnbL8X4SkI2hPrNpgRVuvOu8H7RU0B5u9CBrH6D/3MOJfgilam4DHVL0V Cc: freebsd-mips@freebsd.org, freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Jun 2013 17:53:19 -0000 On Jun 3, 2013, at 8:04 AM, Ed Schouten wrote: > Hi, >=20 > As of r251230, it should be possible to use C11 atomics in > kernelspace, by including ! Even when not using Clang > (but GCC 4.2), it is possible to use quite a large portion of the API. > A couple of limitations: >=20 > - The memory order argument is simply ignored, making all the calls do > a full memory barrier. > - At least Clang allows you to do arithmetic on C11 atomics directly > (e.g. "a +=3D 5" =3D=3D "atomic_fetch_add(&a, 5)"), which is of course = not > possible to mimick. > - The atomic functions only work on 1,2,4,8-byte types, which is > probably a good thing. >=20 > Amazingly, it turns out that it most of the architectures, with the > exception of ARM and MIPS. To make MIPS work, we need to implement > some of the __sync_* functions that are described here: >=20 > http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html >=20 > Some time ago I already added some of these functions to our > libcompiler-rt in userspace, to make atomics work there. > Unfortunately, these functions were quite horribly implemented, as I > tried to build them on top of , which is far from > trivial/efficient. It is also restricted to 4 and 8-byte types. That's > why I thought: why not spend some time learning MIPS assembly and > write some decent implementations for these functions? >=20 > The result: >=20 > http://80386.nl/pub/mips-stdatomic.txt The number of necessary syncs varies by processor type. There's also = newer synchronization instructions that make this as efficient as = possible for all mips32r2 and mips64r2-based machines. Older Caviums, at = least and maybe newer ones, also have their own variants. What you have = will mostly work for the processors we have to support. mips_sync could = therefore be better. Doing it before AND after seems like overkill as = well. Since sync is a fairly performance killing assembler instruction, = how would you feel about allowing optimizations? This is my biggest single concern about the patch, but it also my = current biggest concern about the MIPS atomic operators in general. > For now, please focus on sys/mips/mips/stdatomic.c. It implements all > the __sync_* functions called by for 1, 2, 4 and 8 byte > types. There is some testing code in there as well, which can be > ignored. This code disassembles to the following: >=20 > http://80386.nl/pub/mips-stdatomic-disasm.txt >=20 > As I don't own a MIPS system myself, I was thinking about tinkering a > bit with qemu to see whether these functions work properly. My > questions are: >=20 > - Does anyone have any comments on the C code and/or the machine code > generated? Are there some nifty tricks I can apply to make the machine > code more efficient that I am unaware o? > - Is there anyone interested in testing this code a bit more > thoroughly on physical hardware? > - Would anyone mind if I committed this to HEAD? I have some cavium gear I can easily test on, and some other stuff I can = less-easily test on. It wouldn't be horrible to commit to head, but it would affect = performance in many places. Don't commit the kern/bla.c standard change to conf/files, it looks to = be bogus :) Warner From owner-freebsd-arch@FreeBSD.ORG Mon Jun 3 20:16:00 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 2E47FFA; Mon, 3 Jun 2013 20:16:00 +0000 (UTC) (envelope-from edschouten@gmail.com) Received: from mail-ve0-x230.google.com (mail-ve0-x230.google.com [IPv6:2607:f8b0:400c:c01::230]) by mx1.freebsd.org (Postfix) with ESMTP id CFDC91368; Mon, 3 Jun 2013 20:15:59 +0000 (UTC) Received: by mail-ve0-f176.google.com with SMTP id c13so3133867vea.7 for ; Mon, 03 Jun 2013 13:15:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=epO2QtREq85Qirwj2jzCOL7EyD4dKSnfZsna/f1A2Y4=; b=YeX3NKgJ1qIhGHuQRp0KfESFzmdD2puMnlPCvs9E5M9yawAO+yH1kYeJ/ulEayyEdS m4LsNbphuwT+oJGwdyCOc7EQ01W+RJReGApnLRQ1p/nGnLGgPFu6MEoCV9wYLEH0Y5Rs NISPaMrZCYMtRlrBdnKaKO6xVAw9DDMeERxkDYYhihRJi6AaRghF/6lyzGKh7rc4FQZS zJYjQCGWBmuBBH7nGFMTi7YPXUK5skC+8rDwEshjwKvQX9kB42f99XI70WJJxVcpApD+ Me2ozlt6VGuSbw+U69dxdhPxM3jTT9fhVrMswYPRRYI6dAw0mhzWLw2oXniy4ubno8BG XwwA== MIME-Version: 1.0 X-Received: by 10.52.183.170 with SMTP id en10mr14965864vdc.5.1370290559285; Mon, 03 Jun 2013 13:15:59 -0700 (PDT) Sender: edschouten@gmail.com Received: by 10.220.107.139 with HTTP; Mon, 3 Jun 2013 13:15:59 -0700 (PDT) In-Reply-To: References: Date: Mon, 3 Jun 2013 22:15:59 +0200 X-Google-Sender-Auth: FeVDz4OIqufmn0UG2b3qrapNfk0 Message-ID: Subject: Re: Kernelspace C11 atomics for MIPS From: Ed Schouten To: Warner Losh Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-mips@freebsd.org, freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Jun 2013 20:16:00 -0000 Hey Warner, After playing around a bit I managed to get qemu user mode emulation for mips64 working on my workstation. Apart from a stupid thinko (__sync_fetch_and_sub() did B - A instead of A - B), the code managed to survive the following test, which calls the stdatomic library functions with random parameters: http://80386.nl/pub/stdatomic-fuzzer.txt I'll see if I can push this tool into the tree after polishing it up a bit. Unfortunately it only tests the logical aspects of the routines -- not whether the routines are actually atomic. 2013/6/3 Warner Losh : > The number of necessary syncs varies by processor type. There's also newe= r synchronization instructions that make this as efficient as possible for = all mips32r2 and mips64r2-based machines. Older Caviums, at least and maybe= newer ones, also have their own variants. What you have will mostly work f= or the processors we have to support. mips_sync could therefore be better. = Doing it before AND after seems like overkill as well. Since sync is a fair= ly performance killing assembler instruction, how would you feel about allo= wing optimizations? > > This is my biggest single concern about the patch, but it also my current= biggest concern about the MIPS atomic operators in general. I have to confess, that's exactly the part I know very little about. The code I wrote is largely based on the code in . The mips_sync() function has been copied over almost literally. I think tuning this could be done separately. Regarding calling mips_sync() before and after. I think we always have to call it before we perform the action (for example if the atomic call is used to implement an unlock). But indeed, afterwards makes little sense. We would only need to perform a barrier at the compiler level -- not the memory level. It wouldn't make sense to add this explicitly, because these are separate functions in a separate compilation unit anyway. Thoughts? > I have some cavium gear I can easily test on, and some other stuff I can = less-easily test on. Awesome! At least testing it on Cavium would be nice. I've updated the diff. Please refresh. http://80386.nl/pub/mips-stdatomic.txt One easy way to test this, would be to link the fuzzer source file against the stdatomic.c file added by my patch and run that. The source file should compile both in user+kernel now. Thanks, -- Ed Schouten From owner-freebsd-arch@FreeBSD.ORG Mon Jun 3 21:29:16 2013 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 0036124F for ; Mon, 3 Jun 2013 21:29:15 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.69.10]) by mx1.freebsd.org (Postfix) with ESMTP id 7EEA7192A for ; Mon, 3 Jun 2013 21:29:14 +0000 (UTC) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.14.6/8.14.6) with ESMTP id r53LTDdR087999; Tue, 4 Jun 2013 01:29:13 +0400 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.14.6/8.14.6/Submit) id r53LTDTx087998; Tue, 4 Jun 2013 01:29:13 +0400 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Tue, 4 Jun 2013 01:29:13 +0400 From: Gleb Smirnoff To: Konstantin Belousov Subject: Re: aio_mlock(2) system call Message-ID: <20130603212913.GU67170@glebius.int.ru> References: <20130603100618.GH67170@FreeBSD.org> <20130603161255.GM3047@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <20130603161255.GM3047@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Jun 2013 21:29:16 -0000 On Mon, Jun 03, 2013 at 07:12:55PM +0300, Konstantin Belousov wrote: K> On Mon, Jun 03, 2013 at 02:06:18PM +0400, Gleb Smirnoff wrote: K> > Hello! K> > K> > This patch brings a new system call - aio_mlock(2). The idea is K> > quite clear from its name: it performs mlock(2), which can take K> > a long time if pages aren't resident, under aio(4) control. K> > K> > The patch is quite simple, and non-desctructive. Here it is K> > for your review. K> > K> > If no one objects, I'd like to add it to FreeBSD 10. K> K> I suggest to rename the aio_process() to aio_process_rw(). K> Also, it might make sense to assert aio_lio_opcode value on the entry K> to aio_process_*() functions. Will do. K> > +static void K> > +aio_process_mlock(struct aiocblist *aiocbe) K> > +{ K> > + struct aiocb *cb = &aiocbe->uaiocb; K> > + int error; K> > + K> > + error = vm_mlock(aiocbe->userproc, aiocbe->cred, K> > + (void *)(uintptr_t)cb->aio_buf, cb->aio_nbytes); K> This probably should be spelled __DEVOLATILE(). K> K> We traditionally do not reuse the gaps in the syscall table, but add K> new syscalls at the end. Hmm. I did that because I wanted to be all aio_* grouped together. Why not? K> Did you tested the kqueue completion notifications with the aio_mlock() ? Sure. This is my main use case here. -- Totus tuus, Glebius. From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 02:45:50 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 5F1F2E25; Tue, 4 Jun 2013 02:45:50 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-qc0-x231.google.com (mail-qc0-x231.google.com [IPv6:2607:f8b0:400d:c01::231]) by mx1.freebsd.org (Postfix) with ESMTP id 0EB2A13BF; Tue, 4 Jun 2013 02:45:49 +0000 (UTC) Received: by mail-qc0-f177.google.com with SMTP id e1so2570765qcy.8 for ; Mon, 03 Jun 2013 19:45:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=bnGVtp2/EgnS27TtyDbKsZ87JYvZoL8p61vYDb7erdA=; b=Hf5pBfMTlR4UlJsOlMuWk3cuYUfFaGBoMd9thKut04fmfqvUvJoTQBlPorQcuxzm5u yBBm7uol/quWIH50jxHXcAOk5KSRCE2tPbXiWKMLt6x/vHLoo2OTcRU3WJSItbM/7edG D0K9RH/ITfZG1AhrOcenss6vnLwshrW84Ki6WNKvH3lWtdjeYjMZC46VGIdSDH9vOqrN 7fbGIF0Ozd78M20cMEO2RaOCr9B4zLkxs6l6zFJc2DxMLY9wzW/W7mBpCy79ADGGg9Hi giq/9S+RL4DmnnymScFWOZj+eGPGz3jlYkRTfQZrxqikmm+s3ovCchMAN03dZ/c+aExV vy4Q== MIME-Version: 1.0 X-Received: by 10.49.38.169 with SMTP id h9mr24294063qek.54.1370313948960; Mon, 03 Jun 2013 19:45:48 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.224.71.12 with HTTP; Mon, 3 Jun 2013 19:45:48 -0700 (PDT) In-Reply-To: References: Date: Mon, 3 Jun 2013 19:45:48 -0700 X-Google-Sender-Auth: kt6s0ZfAuLunSnQESDzNm8hulqU Message-ID: Subject: Re: Kernelspace C11 atomics for MIPS From: Adrian Chadd To: Warner Losh Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Ed Schouten , freebsd-mips@freebsd.org, freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 02:45:50 -0000 Speaking of this; any idea why the SYNC operators have 8 NOPs following the= m? I noticed that when going through disassemblies of various mips24k .o files= . Adrian On 3 June 2013 10:53, Warner Losh wrote: > > On Jun 3, 2013, at 8:04 AM, Ed Schouten wrote: > >> Hi, >> >> As of r251230, it should be possible to use C11 atomics in >> kernelspace, by including ! Even when not using Clang >> (but GCC 4.2), it is possible to use quite a large portion of the API. >> A couple of limitations: >> >> - The memory order argument is simply ignored, making all the calls do >> a full memory barrier. >> - At least Clang allows you to do arithmetic on C11 atomics directly >> (e.g. "a +=3D 5" =3D=3D "atomic_fetch_add(&a, 5)"), which is of course n= ot >> possible to mimick. >> - The atomic functions only work on 1,2,4,8-byte types, which is >> probably a good thing. >> >> Amazingly, it turns out that it most of the architectures, with the >> exception of ARM and MIPS. To make MIPS work, we need to implement >> some of the __sync_* functions that are described here: >> >> http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html >> >> Some time ago I already added some of these functions to our >> libcompiler-rt in userspace, to make atomics work there. >> Unfortunately, these functions were quite horribly implemented, as I >> tried to build them on top of , which is far from >> trivial/efficient. It is also restricted to 4 and 8-byte types. That's >> why I thought: why not spend some time learning MIPS assembly and >> write some decent implementations for these functions? >> >> The result: >> >> http://80386.nl/pub/mips-stdatomic.txt > > The number of necessary syncs varies by processor type. There's also newe= r synchronization instructions that make this as efficient as possible for = all mips32r2 and mips64r2-based machines. Older Caviums, at least and maybe= newer ones, also have their own variants. What you have will mostly work f= or the processors we have to support. mips_sync could therefore be better. = Doing it before AND after seems like overkill as well. Since sync is a fair= ly performance killing assembler instruction, how would you feel about allo= wing optimizations? > > This is my biggest single concern about the patch, but it also my current= biggest concern about the MIPS atomic operators in general. > >> For now, please focus on sys/mips/mips/stdatomic.c. It implements all >> the __sync_* functions called by for 1, 2, 4 and 8 byte >> types. There is some testing code in there as well, which can be >> ignored. This code disassembles to the following: >> >> http://80386.nl/pub/mips-stdatomic-disasm.txt >> >> As I don't own a MIPS system myself, I was thinking about tinkering a >> bit with qemu to see whether these functions work properly. My >> questions are: >> >> - Does anyone have any comments on the C code and/or the machine code >> generated? Are there some nifty tricks I can apply to make the machine >> code more efficient that I am unaware o? >> - Is there anyone interested in testing this code a bit more >> thoroughly on physical hardware? >> - Would anyone mind if I committed this to HEAD? > > I have some cavium gear I can easily test on, and some other stuff I can = less-easily test on. > > It wouldn't be horrible to commit to head, but it would affect performanc= e in many places. > > Don't commit the kern/bla.c standard change to conf/files, it looks to be= bogus :) > > Warner > > _______________________________________________ > freebsd-mips@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-mips > To unsubscribe, send any mail to "freebsd-mips-unsubscribe@freebsd.org" From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 03:55:26 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id A30F2151 for ; Tue, 4 Jun 2013 03:55:26 +0000 (UTC) (envelope-from juli@clockworksquid.com) Received: from mail-la0-x232.google.com (mail-la0-x232.google.com [IPv6:2a00:1450:4010:c03::232]) by mx1.freebsd.org (Postfix) with ESMTP id 2A8531902 for ; Tue, 4 Jun 2013 03:55:25 +0000 (UTC) Received: by mail-la0-f50.google.com with SMTP id ed20so4118951lab.23 for ; Mon, 03 Jun 2013 20:55:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type :x-gm-message-state; bh=wOTHXKDbCzxubNM6SPoOBUamwHGn0dQNYctqIK7v5SQ=; b=ECDVdEY/lecqUjD3M1D/sYqDA0AVbXw/x9r+B2onTgmHFUMCQ/kaTN5ciI0hgqMxeV HOfvHWt8E6pFGyGLuwyT8tHcHIAwZc80MdnA/oUwnNcsvnsnisTFSEryFHPBIVI6Gpei FcGT+l3Qy6NsB9D1oN33YQHZhF3fTmsQ8q6ZZCxUZfAA76nQ5CoVhNjTyTYELHHYRadw n07XaDhB0CPsD9PIhmKBl9QYamZNFJm48IbMsQgRxTmcYuyUxc2npmnqxaO4lf3VwBhG TeMYssy1oy2rB33j5q161zR6LpwmSjOq44htLiE6oAxKhuL1lEXXfuvjhpK7ITKx3Ft4 hUIg== X-Received: by 10.112.89.8 with SMTP id bk8mr12168340lbb.73.1370318124329; Mon, 03 Jun 2013 20:55:24 -0700 (PDT) MIME-Version: 1.0 Sender: juli@clockworksquid.com Received: by 10.152.129.195 with HTTP; Mon, 3 Jun 2013 20:55:04 -0700 (PDT) In-Reply-To: References: From: Juli Mallett Date: Mon, 3 Jun 2013 20:55:04 -0700 X-Google-Sender-Auth: 6Wr2OOY81fqP7rhfs45eOMWDZbw Message-ID: Subject: Re: Kernelspace C11 atomics for MIPS To: Adrian Chadd X-Gm-Message-State: ALoCoQlwYa+iFk4MJobdbBUbtoEuVOVA1AZcLPDWlQWaX2ojyyfQDKOwcGIY7AhX3lwg1l0lChRm Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Ed Schouten , "freebsd-mips@FreeBSD.org" , FreeBSD-arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 03:55:26 -0000 On Mon, Jun 3, 2013 at 7:45 PM, Adrian Chadd wrote: > Speaking of this; any idea why the SYNC operators have 8 NOPs following > them? > > I noticed that when going through disassemblies of various mips24k .o > files. > To drain the pipeline on certain deficient (and mostly older) CPUs by way of guesswork and a little vague magic. Most CPUs we support, I would guess, do not need this, and it continues to exist solely for hysterical reasons. I've certainly gotten rid of them and some other cargo cult synchronization on Octeon for testing and had it survive under considerable load, and occasionally with some slight speedups (for some more commonly-used or slower things than Just a Bunch Of NOPs.) The trouble is that proving they aren't necessary requires being rigorous and careful in understanding documentation and errata, and FUD about their possible necessity is somewhat-intimidating. It's not an easy kind of corruption/unreliability/etc., to prove the lack of empirically. Juli. From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 03:57:52 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 77B1F235; Tue, 4 Jun 2013 03:57:52 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-qe0-f48.google.com (mail-qe0-f48.google.com [209.85.128.48]) by mx1.freebsd.org (Postfix) with ESMTP id B56F01918; Tue, 4 Jun 2013 03:57:51 +0000 (UTC) Received: by mail-qe0-f48.google.com with SMTP id 2so1352401qea.7 for ; Mon, 03 Jun 2013 20:57:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=wzZzWKsOzqixlVirXSFfSMCMvKvuBGyXjgtmRkySlFw=; b=hFV5iuxDYtq+KoiaW1e0xIeGfbw9XHuf3LEaSnSIjsPZsTWiuidnYpA3dYo37sjW7/ gD1ZMDJsJ1qPdJ7GhsTSkGu92GPVSirmKYTCsrn7ZPJ9n/9Do5S7zTzy8Smc6sCDW967 kYvpZUTnNpdzHqUSLYTlpNHDGNTqIIol7WjBuzQCrxqxOeQ2n4te3okfJl8Rom0GPU1S Fg4Fe7NVkyW28dhLZ2AY91suKYzGEuguC4g5YeqsIFuCJVlTuo2Ys6rXyoS4MdwsOugv 5pcnRQwQxhZkKM7f2VvzLul4lOM/vXi+ZJ+SY+ZJyoDFKkNxBMaTStYfJpKyHibzeg6Q leiA== MIME-Version: 1.0 X-Received: by 10.229.149.14 with SMTP id r14mr5802450qcv.59.1370318265612; Mon, 03 Jun 2013 20:57:45 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.224.71.12 with HTTP; Mon, 3 Jun 2013 20:57:45 -0700 (PDT) In-Reply-To: References: Date: Mon, 3 Jun 2013 20:57:45 -0700 X-Google-Sender-Auth: 4aJjyK5RjzXSjOg7ZrQc72CBkFU Message-ID: Subject: Re: Kernelspace C11 atomics for MIPS From: Adrian Chadd To: Juli Mallett Content-Type: text/plain; charset=ISO-8859-1 Cc: Ed Schouten , "freebsd-mips@FreeBSD.org" , FreeBSD-arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 03:57:52 -0000 On 3 June 2013 20:55, Juli Mallett wrote: > To drain the pipeline on certain deficient (and mostly older) CPUs by way of > guesswork and a little vague magic. Most CPUs we support, I would guess, do > not need this, and it continues to exist solely for hysterical reasons. How can I turn it off for my compiles? > I've certainly gotten rid of them and some other cargo cult synchronization > on Octeon for testing and had it survive under considerable load, and > occasionally with some slight speedups (for some more commonly-used or > slower things than Just a Bunch Of NOPs.) Right. Well, since it's happening on every inlined lock, it's a bit silly. > The trouble is that proving they aren't necessary requires being rigorous > and careful in understanding documentation and errata, and FUD about their > possible necessity is somewhat-intimidating. It's not an easy kind of > corruption/unreliability/etc., to prove the lack of empirically. I've checked the diassembly from gcc-4.mumble on linux; it doesn't include NOPs like this as far as I can tell. Adrian From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 04:05:25 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 01E79384 for ; Tue, 4 Jun 2013 04:05:25 +0000 (UTC) (envelope-from juli@clockworksquid.com) Received: from mail-lb0-f178.google.com (mail-lb0-f178.google.com [209.85.217.178]) by mx1.freebsd.org (Postfix) with ESMTP id 7C454194F for ; Tue, 4 Jun 2013 04:05:24 +0000 (UTC) Received: by mail-lb0-f178.google.com with SMTP id w10so180322lbi.9 for ; Mon, 03 Jun 2013 21:05:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type :x-gm-message-state; bh=WVG2tjYb43vhRNx/JWGJd6FCPwewo2DPNIzuj6sE36g=; b=dteRKqoni8tKMfTn+bZcdKmEZCDZPQJVVVF6IQuRFzdbv8/u4ifKSZRIMmJJ5sawNJ YRWd6mqt0xeXxNgBbHQT5h0zvUso4AYWyePRHNkPTB4yOmgnPCJNfW55isxrIETMN/r2 uiFfIx1dVMaDmy/JWlaK/09UKTPwNt2NbGvCoXKAhSsLuOUPmOUA1TloAAS9GM++ASA6 o/QnVzndC2+FiXiaJO3EXak6CkxybzCEyfb41HRhXuza0A7NEFfaQXAzdSX1PmyatVXE pAWNm74pLP/N4PdxPgk7CyoivTlp9vj75OAN5xGgSs8K1eB5hOoo8BH0136+nikpPAUy lQsg== X-Received: by 10.112.180.232 with SMTP id dr8mr12138264lbc.67.1370318717158; Mon, 03 Jun 2013 21:05:17 -0700 (PDT) MIME-Version: 1.0 Sender: juli@clockworksquid.com Received: by 10.152.129.195 with HTTP; Mon, 3 Jun 2013 21:04:57 -0700 (PDT) In-Reply-To: References: From: Juli Mallett Date: Mon, 3 Jun 2013 21:04:57 -0700 X-Google-Sender-Auth: XobjzOOSHy3MIjEHOo-pP8uINW0 Message-ID: Subject: Re: Kernelspace C11 atomics for MIPS To: Adrian Chadd X-Gm-Message-State: ALoCoQmFf1enXc1K5hJ6gRyMmmfGE77EpKacwaip8ykCp/RL4x00gajZKnIcdlkZo1A1dHljfR2c Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Ed Schouten , "freebsd-mips@FreeBSD.org" , FreeBSD-arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 04:05:25 -0000 On Mon, Jun 3, 2013 at 8:57 PM, Adrian Chadd wrote: > On 3 June 2013 20:55, Juli Mallett wrote: > > > To drain the pipeline on certain deficient (and mostly older) CPUs by > way of > > guesswork and a little vague magic. Most CPUs we support, I would > guess, do > > not need this, and it continues to exist solely for hysterical reasons. > > How can I turn it off for my compiles? Edit the source. This is not and this kind of thing must not be a user-visible go-faster knob. I'm anticipating that someone might want to respond to "edit the source" by saying that users don't have to edit the source, without understanding the kind of change this is. > > I've certainly gotten rid of them and some other cargo cult > synchronization > > on Octeon for testing and had it survive under considerable load, and > > occasionally with some slight speedups (for some more commonly-used or > > slower things than Just a Bunch Of NOPs.) > > Right. Well, since it's happening on every inlined lock, it's a bit silly. Yes. > > The trouble is that proving they aren't necessary requires being > rigorous > > and careful in understanding documentation and errata, and FUD about > their > > possible necessity is somewhat-intimidating. It's not an easy kind of > > corruption/unreliability/etc., to prove the lack of empirically. > > I've checked the diassembly from gcc-4.mumble on linux; it doesn't > include NOPs like this as far as I can tell. > Neat. You might also like to look at usage of 'sync' (and its variants, or the lack of use of its variants) and the possibility of using newer mips32/64 instructions to change whether interrupts are enabled, and a number of other things, at least for certain CPU types. And excessive use of all kinds of memory barriers (including simple memory-clobber barriers) and and and. There's a lot of small changes that can be made that add up, but building confidence across the range of hardware we support is genuinely-hard. Juli. From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 04:08:49 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4610B442; Tue, 4 Jun 2013 04:08:49 +0000 (UTC) (envelope-from pkelsey@gmail.com) Received: from mail-bk0-x22f.google.com (mail-bk0-x22f.google.com [IPv6:2a00:1450:4008:c01::22f]) by mx1.freebsd.org (Postfix) with ESMTP id 4EEDF1962; Tue, 4 Jun 2013 04:08:48 +0000 (UTC) Received: by mail-bk0-f47.google.com with SMTP id jg9so1957392bkc.6 for ; Mon, 03 Jun 2013 21:08:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=to6zmT8ADvmoITeNfrPJrkIAgVgpUlojSrM7V1oplNY=; b=vfVG2Ao4eQcZms5aWRvBqZYTDeKRXgkS2OA6qJpHg2VHczR9YkOu+OYdx05gb/Q8e5 dk6ePR21/IRMzOk2avpWoj/g+4JzHRcxmJ5RnY73xq684q5nCF9yqDsr/lmd51TqA4YV 3l6Zxifi3a8GD7a+Qsm9RXVmz4fSzqqdKp0sE/V/ihxcr17pk3YP+NFqj60XC4RyDXLH KV+nWt7OJFanvMv2oZTGywI1R0Agvc7m3TPcHdxCYzmRHYMv1tqCrFTC/JSFhWjOdZg/ HqRZMoGr315Qm91sbBgjXsKUrdGc/RI+nZOnBpkroDrKi9MjIm3GqEK4iZ8dFluVIGqy 13Cg== MIME-Version: 1.0 X-Received: by 10.204.63.1 with SMTP id z1mr7346196bkh.148.1370318927404; Mon, 03 Jun 2013 21:08:47 -0700 (PDT) Sender: pkelsey@gmail.com Received: by 10.205.141.68 with HTTP; Mon, 3 Jun 2013 21:08:47 -0700 (PDT) In-Reply-To: References: Date: Tue, 4 Jun 2013 00:08:47 -0400 X-Google-Sender-Auth: m26drlMzlt_pTRy98rdfaB80jr0 Message-ID: Subject: Re: Kernelspace C11 atomics for MIPS From: Patrick Kelsey To: Adrian Chadd Content-Type: text/plain; charset=ISO-8859-1 Cc: Juli Mallett , Ed Schouten , "freebsd-mips@FreeBSD.org" , FreeBSD-arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 04:08:49 -0000 On Mon, Jun 3, 2013 at 11:57 PM, Adrian Chadd wrote: > On 3 June 2013 20:55, Juli Mallett wrote: > >> To drain the pipeline on certain deficient (and mostly older) CPUs by way of >> guesswork and a little vague magic. Most CPUs we support, I would guess, do >> not need this, and it continues to exist solely for hysterical reasons. > > How can I turn it off for my compiles? > >> I've certainly gotten rid of them and some other cargo cult synchronization >> on Octeon for testing and had it survive under considerable load, and >> occasionally with some slight speedups (for some more commonly-used or >> slower things than Just a Bunch Of NOPs.) > > Right. Well, since it's happening on every inlined lock, it's a bit silly. > >> The trouble is that proving they aren't necessary requires being rigorous >> and careful in understanding documentation and errata, and FUD about their >> possible necessity is somewhat-intimidating. It's not an easy kind of >> corruption/unreliability/etc., to prove the lack of empirically. > > I've checked the diassembly from gcc-4.mumble on linux; it doesn't > include NOPs like this as far as I can tell. > The sync + 8 nops is coming from the definition of mips_sync() in sys/mips/include/atomic.h. I agree with Juli that it appears to be a manual pipeline-flush holdover from earlier days - I'm guessing there's 8 nops because the R4000/4400 had both the sync instruction and an 8-stage pipeline. I'm further guessing this was an attempt at providing stronger ordering semantics than the sync instruction itself for the following mb()/wmb()/rmb() definitions that use it, as the sync instruction definition doesn't restrict execution of the before/after loads/stores with respect to the sync instruction itself. From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 04:15:20 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 68DF862D; Tue, 4 Jun 2013 04:15:20 +0000 (UTC) (envelope-from pkelsey@gmail.com) Received: from mail-bk0-x233.google.com (mail-bk0-x233.google.com [IPv6:2a00:1450:4008:c01::233]) by mx1.freebsd.org (Postfix) with ESMTP id 710E4198F; Tue, 4 Jun 2013 04:15:19 +0000 (UTC) Received: by mail-bk0-f51.google.com with SMTP id ji2so1022881bkc.10 for ; Mon, 03 Jun 2013 21:15:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=H1WND2Yc9hbvyisbZ6h5Xe9U6SOzd4cTJmVloti6WtU=; b=GUnkZGY18aWJ3BmoAuE6ShVUJC8El4GStrAUmMGJvkOxJQkQshhyzJrpirC8MwBKdI OK1fZ6UOvQfqDM8ArR1Vc/GhV0yqVWW1G3JeCNXjx/f6JhBAembg4s2V//1KfrC4Swr5 7XNSiZo0692hDkt54w4bRJEoekSD6g80IsuQlaNKO8MmSCpT6BG6v1AvZcpaWaudOaRC meGmU3Abrd5XyWdFNL8HKGZXuNjAzbMTDfyMomQckxwxs+FYtq86qGIh8ZyMGSCUc8WN 5StdB4LWUmt4OpndBNNh7bqowg5fetBtztj+uOrecWEvQdrQtndNhFR4MN8zNC6s+v2x 6iHA== MIME-Version: 1.0 X-Received: by 10.205.107.202 with SMTP id dz10mr7284751bkc.180.1370319317940; Mon, 03 Jun 2013 21:15:17 -0700 (PDT) Sender: pkelsey@gmail.com Received: by 10.205.141.68 with HTTP; Mon, 3 Jun 2013 21:15:17 -0700 (PDT) In-Reply-To: References: Date: Tue, 4 Jun 2013 00:15:17 -0400 X-Google-Sender-Auth: GsvU6Ysz54D3_WzBTuuZdPLpqiM Message-ID: Subject: Re: Kernelspace C11 atomics for MIPS From: Patrick Kelsey To: Adrian Chadd Content-Type: text/plain; charset=ISO-8859-1 Cc: Juli Mallett , Ed Schouten , "freebsd-mips@FreeBSD.org" , FreeBSD-arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 04:15:20 -0000 On Tue, Jun 4, 2013 at 12:08 AM, Patrick Kelsey wrote: > On Mon, Jun 3, 2013 at 11:57 PM, Adrian Chadd wrote: >> On 3 June 2013 20:55, Juli Mallett wrote: >> >>> To drain the pipeline on certain deficient (and mostly older) CPUs by way of >>> guesswork and a little vague magic. Most CPUs we support, I would guess, do >>> not need this, and it continues to exist solely for hysterical reasons. >> >> How can I turn it off for my compiles? >> >>> I've certainly gotten rid of them and some other cargo cult synchronization >>> on Octeon for testing and had it survive under considerable load, and >>> occasionally with some slight speedups (for some more commonly-used or >>> slower things than Just a Bunch Of NOPs.) >> >> Right. Well, since it's happening on every inlined lock, it's a bit silly. >> >>> The trouble is that proving they aren't necessary requires being rigorous >>> and careful in understanding documentation and errata, and FUD about their >>> possible necessity is somewhat-intimidating. It's not an easy kind of >>> corruption/unreliability/etc., to prove the lack of empirically. >> >> I've checked the diassembly from gcc-4.mumble on linux; it doesn't >> include NOPs like this as far as I can tell. >> > > The sync + 8 nops is coming from the definition of mips_sync() in > sys/mips/include/atomic.h. > > I agree with Juli that it appears to be a manual pipeline-flush > holdover from earlier days - I'm guessing there's 8 nops because the > R4000/4400 had both the sync instruction and an 8-stage pipeline. I'm > further guessing this was an attempt at providing stronger ordering > semantics than the sync instruction itself for the following > mb()/wmb()/rmb() definitions that use it, as the sync instruction > definition doesn't restrict execution of the before/after loads/stores > with respect to the sync instruction itself. Forgot to emphasize that this particular bit of old-school nop-counting is either pointless or a latent hazard - 8 does not cover the deepest MIPS pipeline around, then there's superscalar issue to consider - so I think it's either unnecessary or insufficient. So far, that's all criticism and no solution :/ From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 04:32:04 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id F3E7F7FC for ; Tue, 4 Jun 2013 04:32:03 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from mail-ie0-x22f.google.com (mail-ie0-x22f.google.com [IPv6:2607:f8b0:4001:c03::22f]) by mx1.freebsd.org (Postfix) with ESMTP id C2B2C1A04 for ; Tue, 4 Jun 2013 04:32:03 +0000 (UTC) Received: by mail-ie0-f175.google.com with SMTP id a11so1323118iee.20 for ; Mon, 03 Jun 2013 21:32:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=0sGsWd94t071+T76PYAZNXbdQAwu+/CjisI7Kru0qj4=; b=PWLXFQt8JIdKa5buC1/2K4QiwiGOQTRqF6fAT0sNf9TYd/dZThZwAjuX+hE9B6Bzdx F1EIfvtfegz8nbqBq5AmqyXbxeytzJiJRFTqk7CZz6pp6DR5ncaaYWGUJG5HwANBG0ZU LYWyalKGUNUTWEk1tJM1LXqvQPsm2OgkrHC/HE6bdXZ2ztVKxn9XHoblIc1R0Yq9bZRe ggOu2TZmvZ35/7tCe6cNe8LHWttyH0kNnVXZfidZsuYxvWUnIpKsVt85q+98Ki4jBUKr +QlKBgAj0Pov3eBrI2AghD3rRlPuxnh8Pg/43vJaLxpXiepB9B5V7WbmxH7C1Trq8SM/ +C8w== X-Received: by 10.50.73.226 with SMTP id o2mr672932igv.22.1370320323385; Mon, 03 Jun 2013 21:32:03 -0700 (PDT) Received: from 53.imp.bsdimp.com (50-78-194-198-static.hfc.comcastbusiness.net. [50.78.194.198]) by mx.google.com with ESMTPSA id ot10sm16729268igb.9.2013.06.03.21.32.01 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 03 Jun 2013 21:32:02 -0700 (PDT) Sender: Warner Losh Subject: Re: Kernelspace C11 atomics for MIPS Mime-Version: 1.0 (Apple Message framework v1085) Content-Type: text/plain; charset=us-ascii From: Warner Losh In-Reply-To: Date: Mon, 3 Jun 2013 22:32:00 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: Juli Mallett X-Mailer: Apple Mail (2.1085) X-Gm-Message-State: ALoCoQlSOoy9hcB9RxQfvb/ctfvJAITsjx0KCI+xY18wemfEIk+kXIWmSyL3QEINXYK+OWgPsbWR Cc: Ed Schouten , Adrian Chadd , "freebsd-mips@FreeBSD.org" , FreeBSD-arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 04:32:04 -0000 On Jun 3, 2013, at 10:04 PM, Juli Mallett wrote: > On Mon, Jun 3, 2013 at 8:57 PM, Adrian Chadd = wrote: > On 3 June 2013 20:55, Juli Mallett wrote: >=20 > > To drain the pipeline on certain deficient (and mostly older) CPUs = by way of > > guesswork and a little vague magic. Most CPUs we support, I would = guess, do > > not need this, and it continues to exist solely for hysterical = reasons. >=20 > How can I turn it off for my compiles? >=20 > Edit the source. This is not and this kind of thing must not be a = user-visible go-faster knob. I'm anticipating that someone might want = to respond to "edit the source" by saying that users don't have to edit = the source, without understanding the kind of change this is. This isn't a user-visible knob. If you don't know what you are doing, = then don't do it. In the absence of errata, they aren't called out as being required. =46rom Linux, I could find them in the following contexts: One of the places where sync was used in au1000 (at the end of the = DO_SLEEP macro) On octeon, syncw; syncw is used to work around CN3xxx core bugs bmips_read_zscm_reg has a bunch of _ssnops after it. But it is unused = (perhaps in retired CPUs) au1k_wait() has them And that's it. I think they can safely go if Linux doesn't have them :) > > I've certainly gotten rid of them and some other cargo cult = synchronization > > on Octeon for testing and had it survive under considerable load, = and > > occasionally with some slight speedups (for some more commonly-used = or > > slower things than Just a Bunch Of NOPs.) >=20 > Right. Well, since it's happening on every inlined lock, it's a bit = silly. >=20 > Yes. Yes. > > The trouble is that proving they aren't necessary requires being = rigorous > > and careful in understanding documentation and errata, and FUD about = their > > possible necessity is somewhat-intimidating. It's not an easy kind = of > > corruption/unreliability/etc., to prove the lack of empirically. >=20 > I've checked the diassembly from gcc-4.mumble on linux; it doesn't > include NOPs like this as far as I can tell. >=20 > Neat. >=20 > You might also like to look at usage of 'sync' (and its variants, or = the lack of use of its variants) and the possibility of using newer = mips32/64 instructions to change whether interrupts are enabled, and a = number of other things, at least for certain CPU types. =20 Yes, that would be awesome... > And excessive use of all kinds of memory barriers (including simple = memory-clobber barriers) and and and. There's a lot of small changes = that can be made that add up, but building confidence across the range = of hardware we support is genuinely-hard. Yes, we need read barrier, write barrier and general memory barrier = better in the tree. And MIPS' implementation needs to improve. I think they date to the very earliest days of the port (and predate the = Juniper MIPS merge)... If you look at svn blame, it says: 178172 imp "nop\n\t" for all of them. I have no clue where they came from. Looking at the svn = log, it appears they came from the mips2 branch, which may or may not = have been before or after the Juniper code base was merged in. I think they can safely be relegated to the dustbin of history. Warner= From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 04:42:42 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D87ECA57 for ; Tue, 4 Jun 2013 04:42:42 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from mail-ie0-x236.google.com (mail-ie0-x236.google.com [IPv6:2607:f8b0:4001:c03::236]) by mx1.freebsd.org (Postfix) with ESMTP id A56831A69 for ; Tue, 4 Jun 2013 04:42:42 +0000 (UTC) Received: by mail-ie0-f182.google.com with SMTP id 9so7504813iec.27 for ; Mon, 03 Jun 2013 21:42:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=1vTRJLmUnOFfInZqxD6Y4bkeBW+s05zLyd8c9y47FSo=; b=ZLxK4c4ZayxNVsfqLYZWG5oplzEWAR5uC1yFXqdWbSgZO9gu0+dAFb/a8+1jHDu7qr e1xZcYXQyI+apazMuL+OME7f4mgHez7bYDUc2JrLiFJRtwc8DxU4Jk8toMYRywYBZMu8 oY3+fgDhPP9PP7jYkfAAA26WSj8XSXwYJYx2k+hYuYvT8DRmPCBvJg3Pl21/gnaTNvhv iZo6dWW+eQWa7ZTLKdKdIM5qmmjpEc2CoRnGEZZ6KMrBRGMEWNtRUcaeQy4iL6fn9tX7 HjIrxRsxFiBiGnDF8Ha7SNG8Oc5fCFLnHmzI4rpd49f7k7dpiAAYCL5wtNhMSz+9cl+2 1uJQ== X-Received: by 10.50.136.138 with SMTP id qa10mr671216igb.53.1370320962060; Mon, 03 Jun 2013 21:42:42 -0700 (PDT) Received: from 53.imp.bsdimp.com (50-78-194-198-static.hfc.comcastbusiness.net. [50.78.194.198]) by mx.google.com with ESMTPSA id l14sm22938417igf.9.2013.06.03.21.42.40 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 03 Jun 2013 21:42:41 -0700 (PDT) Sender: Warner Losh Subject: Re: Kernelspace C11 atomics for MIPS Mime-Version: 1.0 (Apple Message framework v1085) Content-Type: text/plain; charset=us-ascii From: Warner Losh In-Reply-To: Date: Mon, 3 Jun 2013 22:42:39 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: <05C98B6B-1531-4E09-80D7-4F3B1A88FF01@bsdimp.com> References: To: Patrick Kelsey X-Mailer: Apple Mail (2.1085) X-Gm-Message-State: ALoCoQlWl8QA/vHY2srwY+Mn4kw+g2v1qt9H/1DvuhxTp5vd9172cetMkUmJME2izhbVdD4WR6iR Cc: Juli Mallett , Ed Schouten , Adrian Chadd , "freebsd-mips@FreeBSD.org" , FreeBSD-arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 04:42:42 -0000 On Jun 3, 2013, at 10:15 PM, Patrick Kelsey wrote: > On Tue, Jun 4, 2013 at 12:08 AM, Patrick Kelsey = wrote: >> On Mon, Jun 3, 2013 at 11:57 PM, Adrian Chadd = wrote: >>> On 3 June 2013 20:55, Juli Mallett wrote: >>>=20 >>>> To drain the pipeline on certain deficient (and mostly older) CPUs = by way of >>>> guesswork and a little vague magic. Most CPUs we support, I would = guess, do >>>> not need this, and it continues to exist solely for hysterical = reasons. >>>=20 >>> How can I turn it off for my compiles? >>>=20 >>>> I've certainly gotten rid of them and some other cargo cult = synchronization >>>> on Octeon for testing and had it survive under considerable load, = and >>>> occasionally with some slight speedups (for some more commonly-used = or >>>> slower things than Just a Bunch Of NOPs.) >>>=20 >>> Right. Well, since it's happening on every inlined lock, it's a bit = silly. >>>=20 >>>> The trouble is that proving they aren't necessary requires being = rigorous >>>> and careful in understanding documentation and errata, and FUD = about their >>>> possible necessity is somewhat-intimidating. It's not an easy kind = of >>>> corruption/unreliability/etc., to prove the lack of empirically. >>>=20 >>> I've checked the diassembly from gcc-4.mumble on linux; it doesn't >>> include NOPs like this as far as I can tell. >>>=20 >>=20 >> The sync + 8 nops is coming from the definition of mips_sync() in >> sys/mips/include/atomic.h. They came from the old mips2 branch, which may have been from the = Juniper code merge, or maybe not. >> I agree with Juli that it appears to be a manual pipeline-flush >> holdover from earlier days - I'm guessing there's 8 nops because the >> R4000/4400 had both the sync instruction and an 8-stage pipeline. = I'm >> further guessing this was an attempt at providing stronger ordering >> semantics than the sync instruction itself for the following >> mb()/wmb()/rmb() definitions that use it, as the sync instruction >> definition doesn't restrict execution of the before/after = loads/stores >> with respect to the sync instruction itself. >=20 > Forgot to emphasize that this particular bit of old-school > nop-counting is either pointless or a latent hazard - 8 does not cover > the deepest MIPS pipeline around, then there's superscalar issue to > consider - so I think it's either unnecessary or insufficient. So > far, that's all criticism and no solution :/ Yes, there's new nops for these situations starting in mips32r2 and = mips64r2 ISAs. I think that this originated in the mips-jnpr merge, but can't find the = old branches anywhere... Warner From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 04:43:21 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id CE81EB18 for ; Tue, 4 Jun 2013 04:43:21 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from mail-ie0-x22f.google.com (mail-ie0-x22f.google.com [IPv6:2607:f8b0:4001:c03::22f]) by mx1.freebsd.org (Postfix) with ESMTP id 9FB8E1A6D for ; Tue, 4 Jun 2013 04:43:21 +0000 (UTC) Received: by mail-ie0-f175.google.com with SMTP id a11so1341757iee.20 for ; Mon, 03 Jun 2013 21:43:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=Hi/EKc4snZ+PBS/UVEUSxyh7IB7PY1jEqB3WBPP6uYE=; b=Igd9UP9+51Ma0AE6u36jZ9Hij6CoF9R2F0OSdzXaaG6pEBdHhqu+lOmdLCF64JH9O1 fsPhrn6dTHqf4rAvxmnsPTv2Vd3UlMXy9lDaDDJaDoFnxTfmmOjMf5WgmH6iPGzI32Ew LAMeKVqCqrl9DUlAxKmwBxXvhxEodOIUAiUGWI3tkruXxxp42pats+1I700qIzTLXxXP E/SQUF/XNeEDQg3BT2BEmn1ch7IV6p03SZsX3rD563+cn9Ybq8XacWIW+y+80lAHoYq+ SzqFZT6G0y/j8s/JSkGpH+qe1LTRhFHpmQDVSjG2TwlhYG6FthYMvgahEdSYxZnLchKJ WlCQ== X-Received: by 10.50.87.4 with SMTP id t4mr648650igz.76.1370321001338; Mon, 03 Jun 2013 21:43:21 -0700 (PDT) Received: from 53.imp.bsdimp.com (50-78-194-198-static.hfc.comcastbusiness.net. [50.78.194.198]) by mx.google.com with ESMTPSA id l14sm22938417igf.9.2013.06.03.21.43.19 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 03 Jun 2013 21:43:20 -0700 (PDT) Sender: Warner Losh Subject: Re: Kernelspace C11 atomics for MIPS Mime-Version: 1.0 (Apple Message framework v1085) Content-Type: text/plain; charset=us-ascii From: Warner Losh In-Reply-To: Date: Mon, 3 Jun 2013 22:43:19 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: Adrian Chadd X-Mailer: Apple Mail (2.1085) X-Gm-Message-State: ALoCoQmiPFMtAdKsuyL9EU6OrpVGr28+UHOOA+HgKEU5z7qKO9f/92L9BFSWQtm/5FvXbIX5JdZb Cc: Ed Schouten , freebsd-mips@freebsd.org, freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 04:43:21 -0000 On Jun 3, 2013, at 8:45 PM, Adrian Chadd wrote: > Speaking of this; any idea why the SYNC operators have 8 NOPs = following them? Yes, that's the exact issue that I've had with them, but have never had = time to sort it out... Warner > I noticed that when going through disassemblies of various mips24k .o = files. >=20 >=20 >=20 > Adrian >=20 > On 3 June 2013 10:53, Warner Losh wrote: >>=20 >> On Jun 3, 2013, at 8:04 AM, Ed Schouten wrote: >>=20 >>> Hi, >>>=20 >>> As of r251230, it should be possible to use C11 atomics in >>> kernelspace, by including ! Even when not using = Clang >>> (but GCC 4.2), it is possible to use quite a large portion of the = API. >>> A couple of limitations: >>>=20 >>> - The memory order argument is simply ignored, making all the calls = do >>> a full memory barrier. >>> - At least Clang allows you to do arithmetic on C11 atomics directly >>> (e.g. "a +=3D 5" =3D=3D "atomic_fetch_add(&a, 5)"), which is of = course not >>> possible to mimick. >>> - The atomic functions only work on 1,2,4,8-byte types, which is >>> probably a good thing. >>>=20 >>> Amazingly, it turns out that it most of the architectures, with the >>> exception of ARM and MIPS. To make MIPS work, we need to implement >>> some of the __sync_* functions that are described here: >>>=20 >>> http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html >>>=20 >>> Some time ago I already added some of these functions to our >>> libcompiler-rt in userspace, to make atomics work there. >>> Unfortunately, these functions were quite horribly implemented, as I >>> tried to build them on top of , which is far from >>> trivial/efficient. It is also restricted to 4 and 8-byte types. = That's >>> why I thought: why not spend some time learning MIPS assembly and >>> write some decent implementations for these functions? >>>=20 >>> The result: >>>=20 >>> http://80386.nl/pub/mips-stdatomic.txt >>=20 >> The number of necessary syncs varies by processor type. There's also = newer synchronization instructions that make this as efficient as = possible for all mips32r2 and mips64r2-based machines. Older Caviums, at = least and maybe newer ones, also have their own variants. What you have = will mostly work for the processors we have to support. mips_sync could = therefore be better. Doing it before AND after seems like overkill as = well. Since sync is a fairly performance killing assembler instruction, = how would you feel about allowing optimizations? >>=20 >> This is my biggest single concern about the patch, but it also my = current biggest concern about the MIPS atomic operators in general. >>=20 >>> For now, please focus on sys/mips/mips/stdatomic.c. It implements = all >>> the __sync_* functions called by for 1, 2, 4 and 8 = byte >>> types. There is some testing code in there as well, which can be >>> ignored. This code disassembles to the following: >>>=20 >>> http://80386.nl/pub/mips-stdatomic-disasm.txt >>>=20 >>> As I don't own a MIPS system myself, I was thinking about tinkering = a >>> bit with qemu to see whether these functions work properly. My >>> questions are: >>>=20 >>> - Does anyone have any comments on the C code and/or the machine = code >>> generated? Are there some nifty tricks I can apply to make the = machine >>> code more efficient that I am unaware o? >>> - Is there anyone interested in testing this code a bit more >>> thoroughly on physical hardware? >>> - Would anyone mind if I committed this to HEAD? >>=20 >> I have some cavium gear I can easily test on, and some other stuff I = can less-easily test on. >>=20 >> It wouldn't be horrible to commit to head, but it would affect = performance in many places. >>=20 >> Don't commit the kern/bla.c standard change to conf/files, it looks = to be bogus :) >>=20 >> Warner >>=20 >> _______________________________________________ >> freebsd-mips@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-mips >> To unsubscribe, send any mail to = "freebsd-mips-unsubscribe@freebsd.org" From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 05:07:51 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D514A79 for ; Tue, 4 Jun 2013 05:07:51 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from mail-pd0-f178.google.com (mail-pd0-f178.google.com [209.85.192.178]) by mx1.freebsd.org (Postfix) with ESMTP id A93391B19 for ; Tue, 4 Jun 2013 05:07:51 +0000 (UTC) Received: by mail-pd0-f178.google.com with SMTP id w16so1865393pde.23 for ; Mon, 03 Jun 2013 22:07:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:subject:mime-version:content-type:from:in-reply-to:date:cc :message-id:references:to:x-mailer:x-gm-message-state; bh=ZT2hy3S9lWNrMxgh/knBaazXir6HH2gE2yZYYehORdo=; b=UoSFbKozHjWqtLMmM4oRZnUWjV36X+gEiPeE49Ug5Y5BcQJeY9KQpmr8oPgkzMhQdh eqnRzT6Anctt0mkOFIeyMqt9/PWN8U2K0dgHmVlDKB86OWCKMWB3yNGY606RypFJxp3+ LVavep6pXLlKmOcxxex/WPZl12fLyVA7fJ49+qDOQmXhjpfMqKAKttEoUFJlpkCw+mHU PcJcUxU6I44Wh5mHE1N0m5Z7uhJk81HetkgeYVzvegujnIXNcuDBnP1k+IFjrl4FsN5F GmtW+WIS1sIbjMbSScp/VQXN4+OBq9joS0jGeCsFFkoBUz5tZwOekahgVHgnY7rgpVQe z/Vg== X-Received: by 10.68.71.129 with SMTP id v1mr27218958pbu.136.1370322465563; Mon, 03 Jun 2013 22:07:45 -0700 (PDT) Received: from 53.imp.bsdimp.com (50-78-194-198-static.hfc.comcastbusiness.net. [50.78.194.198]) by mx.google.com with ESMTPSA id b10sm14925302pag.22.2013.06.03.22.07.43 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 03 Jun 2013 22:07:44 -0700 (PDT) Sender: Warner Losh Subject: Re: Kernelspace C11 atomics for MIPS Mime-Version: 1.0 (Apple Message framework v1085) Content-Type: multipart/mixed; boundary=Apple-Mail-8--974245673 From: Warner Losh In-Reply-To: Date: Mon, 3 Jun 2013 23:07:41 -0600 Message-Id: <232DBBD8-3F32-4D42-85AB-AC5647EEA768@bsdimp.com> References: To: Patrick Kelsey X-Mailer: Apple Mail (2.1085) X-Gm-Message-State: ALoCoQk+ar9UAcnqVgeAs5+lrgszivyg+ynh9xgU3hIuLMp4WbKFoT1JGIXXblE+ttbek8h0gijh Cc: Juli Mallett , Ed Schouten , Adrian Chadd , "freebsd-mips@FreeBSD.org" , FreeBSD-arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 05:07:51 -0000 --Apple-Mail-8--974245673 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Please find attached a simple patch that I'd like all MIPS users to try. Warner --Apple-Mail-8--974245673 Content-Disposition: attachment; filename=P Content-Type: application/octet-stream; x-unix-mode=0664; name="P" Content-Transfer-Encoding: 7bit Index: atomic.h =================================================================== --- atomic.h (revision 250753) +++ atomic.h (working copy) @@ -44,20 +44,16 @@ * do not have atomic operations defined for them, but generally shouldn't * need atomic operations. */ +#ifndef __MIPS_PLATFORM_SYNC_NOPS +#define __MIPS_PLATFORM_SYNC_NOPS "" +#endif static __inline void mips_sync(void) { - __asm __volatile (".set noreorder\n\t" - "sync\n\t" - "nop\n\t" - "nop\n\t" - "nop\n\t" - "nop\n\t" - "nop\n\t" - "nop\n\t" - "nop\n\t" - "nop\n\t" + __asm __volatile (".set noreorder\n" + "\tsync\n" + __MIPS_PLATFORM_SYNC_NOPS ".set reorder\n" : : : "memory"); } --Apple-Mail-8--974245673 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On Jun 3, 2013, at 10:15 PM, Patrick Kelsey wrote: > On Tue, Jun 4, 2013 at 12:08 AM, Patrick Kelsey = wrote: >> On Mon, Jun 3, 2013 at 11:57 PM, Adrian Chadd = wrote: >>> On 3 June 2013 20:55, Juli Mallett wrote: >>>=20 >>>> To drain the pipeline on certain deficient (and mostly older) CPUs = by way of >>>> guesswork and a little vague magic. Most CPUs we support, I would = guess, do >>>> not need this, and it continues to exist solely for hysterical = reasons. >>>=20 >>> How can I turn it off for my compiles? >>>=20 >>>> I've certainly gotten rid of them and some other cargo cult = synchronization >>>> on Octeon for testing and had it survive under considerable load, = and >>>> occasionally with some slight speedups (for some more commonly-used = or >>>> slower things than Just a Bunch Of NOPs.) >>>=20 >>> Right. Well, since it's happening on every inlined lock, it's a bit = silly. >>>=20 >>>> The trouble is that proving they aren't necessary requires being = rigorous >>>> and careful in understanding documentation and errata, and FUD = about their >>>> possible necessity is somewhat-intimidating. It's not an easy kind = of >>>> corruption/unreliability/etc., to prove the lack of empirically. >>>=20 >>> I've checked the diassembly from gcc-4.mumble on linux; it doesn't >>> include NOPs like this as far as I can tell. >>>=20 >>=20 >> The sync + 8 nops is coming from the definition of mips_sync() in >> sys/mips/include/atomic.h. >>=20 >> I agree with Juli that it appears to be a manual pipeline-flush >> holdover from earlier days - I'm guessing there's 8 nops because the >> R4000/4400 had both the sync instruction and an 8-stage pipeline. = I'm >> further guessing this was an attempt at providing stronger ordering >> semantics than the sync instruction itself for the following >> mb()/wmb()/rmb() definitions that use it, as the sync instruction >> definition doesn't restrict execution of the before/after = loads/stores >> with respect to the sync instruction itself. >=20 > Forgot to emphasize that this particular bit of old-school > nop-counting is either pointless or a latent hazard - 8 does not cover > the deepest MIPS pipeline around, then there's superscalar issue to > consider - so I think it's either unnecessary or insufficient. So > far, that's all criticism and no solution :/ > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to = "freebsd-arch-unsubscribe@freebsd.org" --Apple-Mail-8--974245673-- From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 05:20:04 2013 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 41487400; Tue, 4 Jun 2013 05:20:04 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id AE2561B9F; Tue, 4 Jun 2013 05:20:03 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id r545Jxip044656; Tue, 4 Jun 2013 08:19:59 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua r545Jxip044656 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id r545Jw2k044655; Tue, 4 Jun 2013 08:19:58 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 4 Jun 2013 08:19:58 +0300 From: Konstantin Belousov To: Gleb Smirnoff Subject: Re: aio_mlock(2) system call Message-ID: <20130604051958.GO3047@kib.kiev.ua> References: <20130603100618.GH67170@FreeBSD.org> <20130603161255.GM3047@kib.kiev.ua> <20130603212913.GU67170@glebius.int.ru> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="n7SYMHupzoBwC59z" Content-Disposition: inline In-Reply-To: <20130603212913.GU67170@glebius.int.ru> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 05:20:04 -0000 --n7SYMHupzoBwC59z Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jun 04, 2013 at 01:29:13AM +0400, Gleb Smirnoff wrote: > On Mon, Jun 03, 2013 at 07:12:55PM +0300, Konstantin Belousov wrote: > K> We traditionally do not reuse the gaps in the syscall table, but add > K> new syscalls at the end. >=20 > Hmm. I did that because I wanted to be all aio_* grouped together. Why no= t? The aio_* syscalls are already split between several number sequences. I suspect that we try to not use the holes in the syscall table as small gratis to the third-party users. Also, there was probably an attempt to keep NetBSD/OpenBSD/FreeBSD syscall numbers out of conflict, which obviously failed already. --n7SYMHupzoBwC59z Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (FreeBSD) iQIcBAEBAgAGBQJRrXj+AAoJEJDCuSvBvK1BbVgP/A/ceVVd+JzKFT5/ntlbfjIO iSzMMnuKIpvSxqI+xDhgvTJueeIu5owLXVNQ9xLL1njxXY0REhf6xMe72eGcqskU pmuSQ5k3/bRsqARanp3YODRPlglBglGjIYf+FLpV17C2iXnJZIQVPhbkSn5EG1pB fEm37UGP42KMA6RGhiy/hI8PphdRHPHFKSXt1kfTuQ+w+LujLft8WLfwiYyNMuK2 3xhwvm5RZwGcf4bAY39MTTpPN41TylFCVb5Q0jP6j5Jq9RsrrNfMbkkHzDgqoAgp jN5Uhhun1jukwgNEL47VBC3Jp6I7d8JINM9yDxoIqkK9mb/atnlVltmoVkNS/gWN YqUkT6QU/wX6M0IXZo0KZm5nDcHwsuh0TfRjZLzxE1yE8gSCrrf0J6u/IsBXdbc7 rNQjJMz197sAO7obId4Wz6o9IpL4H4PrNR27g+iJPJTXVWhhI9yNqEALoT+DwBrp xHH5SagMdhW2meVsmLGu+g5AS2l/ZsKn0vFPx5NjILs/SKXFzLc+lboDm4TjLFJF Y95rIL+7syXoJIoD8oFqrnJmqfQ8jZyouky4JUcjix6K6w+R+w+JuiPVzwg3Ss3r CxXdwvkqKwcd/8oj26N+pfME4UsUVihZ3Dj099dcXGRPw83fijpQkbu3g2rYJydO 434FETlMvxafDTLXthf8 =lzbX -----END PGP SIGNATURE----- --n7SYMHupzoBwC59z-- From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 05:23:36 2013 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id EB2F185C for ; Tue, 4 Jun 2013 05:23:36 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from mail-ie0-x232.google.com (mail-ie0-x232.google.com [IPv6:2607:f8b0:4001:c03::232]) by mx1.freebsd.org (Postfix) with ESMTP id BE6D31BF4 for ; Tue, 4 Jun 2013 05:23:36 +0000 (UTC) Received: by mail-ie0-f178.google.com with SMTP id f4so12518336iea.23 for ; Mon, 03 Jun 2013 22:23:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=50mtuMKu71dXurrivf9XbNR5xWFPMgeeEpT4yNHOOPI=; b=CkLmmFBfsJvuYRI+zRfwVyR0q5aKuVq8U40p+uk+UtsQy/kwAShbzrIOHdJXtGIaZ+ lpCiTogDXY19+5Mf1LFApznt8m1p+d6g7rSz/1bsps06vAcyZEMG0BETVPkOqK/DwFxa i14T5JsLPjJlTcCTeLjJockAsKqIDi4S8m6FxtbjUNQbW7AlFLvIG/pAzlr4Bf4gjbps TwZNeVLsm1qAOsY1urPmNZBgbKB/KRae50sSN3luco2MGkMK8iCQcbcP0oazdfznNEQ/ 9Q3ZhsKPYs8rHzDhPA/eKellcLfS+/7bl9YMo5gJdPqN+qdarmDPXELQ+/noaJs7s6IJ c3Vg== X-Received: by 10.50.112.4 with SMTP id im4mr801757igb.1.1370323416451; Mon, 03 Jun 2013 22:23:36 -0700 (PDT) Received: from 53.imp.bsdimp.com (50-78-194-198-static.hfc.comcastbusiness.net. [50.78.194.198]) by mx.google.com with ESMTPSA id d9sm133392igr.4.2013.06.03.22.23.34 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 03 Jun 2013 22:23:35 -0700 (PDT) Sender: Warner Losh Subject: Re: aio_mlock(2) system call Mime-Version: 1.0 (Apple Message framework v1085) Content-Type: text/plain; charset=us-ascii From: Warner Losh In-Reply-To: <20130604051958.GO3047@kib.kiev.ua> Date: Mon, 3 Jun 2013 23:23:33 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: <2D222D74-31FC-4723-BE2D-A451CA1EB297@bsdimp.com> References: <20130603100618.GH67170@FreeBSD.org> <20130603161255.GM3047@kib.kiev.ua> <20130603212913.GU67170@glebius.int.ru> <20130604051958.GO3047@kib.kiev.ua> To: Konstantin Belousov X-Mailer: Apple Mail (2.1085) X-Gm-Message-State: ALoCoQlHRe3gomw7+CazCj0VlTNQNe5ulKxOQwiXU54YQmO15qFzsRc41Er4nG7HQO7FPu9AilN8 Cc: arch@FreeBSD.org, Gleb Smirnoff X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 05:23:37 -0000 On Jun 3, 2013, at 11:19 PM, Konstantin Belousov wrote: > On Tue, Jun 04, 2013 at 01:29:13AM +0400, Gleb Smirnoff wrote: >> On Mon, Jun 03, 2013 at 07:12:55PM +0300, Konstantin Belousov wrote: >> K> We traditionally do not reuse the gaps in the syscall table, but = add >> K> new syscalls at the end. >>=20 >> Hmm. I did that because I wanted to be all aio_* grouped together. = Why not? > The aio_* syscalls are already split between several number sequences. >=20 > I suspect that we try to not use the holes in the syscall table as > small gratis to the third-party users. Also, there was probably an > attempt to keep NetBSD/OpenBSD/FreeBSD syscall numbers out of = conflict, > which obviously failed already. Originally it was done as a courtesy to the OtherBSDs so that we would = be able to run each other's binaries. Sadly, even this small goal was = never reached, but not due to system call numbering.... Warner From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 07:08:57 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B0938D9E; Tue, 4 Jun 2013 07:08:57 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-qc0-x22f.google.com (mail-qc0-x22f.google.com [IPv6:2607:f8b0:400d:c01::22f]) by mx1.freebsd.org (Postfix) with ESMTP id 5105B1EAC; Tue, 4 Jun 2013 07:08:57 +0000 (UTC) Received: by mail-qc0-f175.google.com with SMTP id a1so2729186qcx.34 for ; Tue, 04 Jun 2013 00:08:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=D9i+6Bc7wu79/ZOiNtq4M0IHPsQlgoNuxMY225E8AVY=; b=b55dNU1W/NM6yyJ/HpvMIBr9FBce57JVxpNrZ+SRUzhPclQqxESSZ0uG7xZMHZ+YBh lkN6PsMjWAl1H6rn4ms/bogJ+rwlWLfnifAJWd90pCf/t15gFcBrVXmucOycYJ7t1sMT SpHdQwG5uyL/ZiZSPMmujP6PxTSoe9s0BpXgkJHegq4uTOgSHJKLxEbX/WptfdPNl7fv ixOJ6ZFYJTWusJSnWozF8RZf880DyBE4g7NPZSO+bG41WaodZ2yF5zZoFLnKBCWEdTFV swRoIPFMmeh/+sh/DbtLzPTY4iHNfL8ltdV1P1zPaajYZp3vbs6YbdUmhANERmEWwV2T 4yDA== MIME-Version: 1.0 X-Received: by 10.49.120.198 with SMTP id le6mr25760086qeb.59.1370329736693; Tue, 04 Jun 2013 00:08:56 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.224.71.12 with HTTP; Tue, 4 Jun 2013 00:08:56 -0700 (PDT) In-Reply-To: <232DBBD8-3F32-4D42-85AB-AC5647EEA768@bsdimp.com> References: <232DBBD8-3F32-4D42-85AB-AC5647EEA768@bsdimp.com> Date: Tue, 4 Jun 2013 00:08:56 -0700 X-Google-Sender-Auth: jb8X0a2919fvzPwcpjzjGNgZAGU Message-ID: Subject: Re: Kernelspace C11 atomics for MIPS From: Adrian Chadd To: Warner Losh Content-Type: text/plain; charset=ISO-8859-1 Cc: Patrick Kelsey , Juli Mallett , FreeBSD-arch , "freebsd-mips@FreeBSD.org" , Ed Schouten X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 07:08:57 -0000 I'm running this particular patch on my test AR71xx AP. Everything seems to be ok so far. Adrian From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 08:19:25 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 0505980C for ; Tue, 4 Jun 2013 08:19:25 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 70A5010F8 for ; Tue, 4 Jun 2013 08:19:23 +0000 (UTC) Received: (qmail 96941 invoked from network); 4 Jun 2013 09:16:43 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 4 Jun 2013 09:16:43 -0000 Message-ID: <51ADA308.6040904@freebsd.org> Date: Tue, 04 Jun 2013 10:19:20 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130509 Thunderbird/17.0.6 MIME-Version: 1.0 To: Ed Schouten Subject: Re: Kernelspace C11 atomics for MIPS References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-mips@freebsd.org, freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 08:19:25 -0000 On 03.06.2013 16:04, Ed Schouten wrote: > Hi, > > As of r251230, it should be possible to use C11 atomics in > kernelspace, by including ! Even when not using Clang > (but GCC 4.2), it is possible to use quite a large portion of the API. I'm a bit wary of *kernel* developers using C11-native atomics as opposed to our own atomic API. This could lead to a proliferation of home-grown, more or less correctly working, locks and variants thereof (mostly less correct). Atomics and locks are difficult enough to get right and reason about even with our rather good API and I scream in fear thinking about everyone(tm) doing their own "optimized" lock or even forgoing it because "it's atomic". I would even propose to go as far as disbarring the use of C11 atomics in the kernel other than inside the officially supported lock API. -- Andre From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 11:30:44 2013 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B6EE5A0A for ; Tue, 4 Jun 2013 11:30:44 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.69.10]) by mx1.freebsd.org (Postfix) with ESMTP id E82651BF0 for ; Tue, 4 Jun 2013 11:30:43 +0000 (UTC) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.14.6/8.14.6) with ESMTP id r54BUaTX092633; Tue, 4 Jun 2013 15:30:36 +0400 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.14.6/8.14.6/Submit) id r54BUZhR092632; Tue, 4 Jun 2013 15:30:35 +0400 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Tue, 4 Jun 2013 15:30:35 +0400 From: Gleb Smirnoff To: Konstantin Belousov Subject: Re: aio_mlock(2) system call Message-ID: <20130604113035.GV67170@glebius.int.ru> References: <20130603100618.GH67170@FreeBSD.org> <20130603161255.GM3047@kib.kiev.ua> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="svZFHVx8/dhPCe52" Content-Disposition: inline In-Reply-To: <20130603161255.GM3047@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 11:30:44 -0000 --svZFHVx8/dhPCe52 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline Updated patch. -- Totus tuus, Glebius. --svZFHVx8/dhPCe52 Content-Type: text/x-diff; charset=koi8-r Content-Disposition: attachment; filename="aio_mlock.diff" Index: lib/libc/sys/Makefile.inc =================================================================== --- lib/libc/sys/Makefile.inc (revision 251369) +++ lib/libc/sys/Makefile.inc (working copy) @@ -85,6 +85,7 @@ MAN+= abort2.2 \ adjtime.2 \ aio_cancel.2 \ aio_error.2 \ + aio_mlock.2 \ aio_read.2 \ aio_return.2 \ aio_suspend.2 \ Index: lib/libc/sys/Symbol.map =================================================================== --- lib/libc/sys/Symbol.map (revision 251369) +++ lib/libc/sys/Symbol.map (working copy) @@ -378,6 +378,7 @@ FBSD_1.2 { }; FBSD_1.3 { + aio_mlock; accept4; bindat; cap_fcntls_get; Index: lib/libc/sys/aio_mlock.2 =================================================================== --- lib/libc/sys/aio_mlock.2 (revision 0) +++ lib/libc/sys/aio_mlock.2 (working copy) @@ -0,0 +1,133 @@ +.\" Copyright (c) 2013 Gleb Smirnoff +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.Dd June 3, 2013 +.Dt AIO_MLOCK 2 +.Os +.Sh NAME +.Nm aio_mlock +.Nd asynchronous +.Xr mlock 2 +operation +.Sh LIBRARY +.Lb libc +.Sh SYNOPSIS +.In aio.h +.Ft int +.Fn aio_mlock "struct aiocb *iocb" +.Sh DESCRIPTION +The +.Fn aio_mlock +system call allows the calling process to lock into memory the +physical pages associated with the virtual address range starting at +.Fa iocb->aio_buf +for +.Fa iocb->aio_nbytes +bytes. +The call returns immediately after the locking request has +been enqueued; the operation may or may not have completed at the time +the call returns. +.Pp +The +.Fa iocb +pointer may be subsequently used as an argument to +.Fn aio_return +and +.Fn aio_error +in order to determine return or error status for the enqueued operation +while it is in progress. +.Pp +If the request could not be enqueued (generally due to +.Xr aio 4 +limits), +then the call returns without having enqueued the request. +.Sh RESTRICTIONS +The Asynchronous I/O Control Block structure pointed to by +.Fa iocb +and the buffer that the +.Fa iocb->aio_buf +member of that structure references must remain valid until the +operation has completed. +For this reason, use of auto (stack) variables +for these objects is discouraged. +.Pp +The asynchronous I/O control buffer +.Fa iocb +should be zeroed before the +.Fn aio_mlock +call to avoid passing bogus context information to the kernel. +.Pp +Modifications of the Asynchronous I/O Control Block structure or the +buffer contents after the request has been enqueued, but before the +request has completed, are not allowed. +.Sh RETURN VALUES +.Rv -std aio_mlock +.Sh ERRORS +The +.Fn aio_read +system call will fail if: +.Bl -tag -width Er +.It Bq Er EAGAIN +The request was not queued because of system resource limitations. +.It Bq Er ENOSYS +The +.Fn aio_mlock +system call is not supported. +.El +.Pp +If the request is successfully enqueued, but subsequently cancelled +or an error occurs, the value returned by the +.Fn aio_return +system call is per the +.Xr mlock 2 +system call, and the value returned by the +.Fn aio_error +system call is one of the error returns from the +.Xr mlock 2 +system call, or +.Er ECANCELED +if the request was explicitly cancelled via a call to +.Fn aio_cancel . +.Sh SEE ALSO +.Xr aio_cancel 2 , +.Xr aio_error 2 , +.Xr aio_return 2 , +.Xr aio 4 , +.Xr mlock 2 +.Sh PORTABILITY +The +.Fn aio_mlock +system call is a +.Fx +extension, and shouldn't be used in portable code. +.Sh HISTORY +The +.Fn aio_mlock +system call first appeared in +.Fx 10.0 . +.Sh AUTHORS +The system call was introduced by +.An Gleb Smirnoff Aq glebius@FreeBSD.org . Property changes on: lib/libc/sys/aio_mlock.2 ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Index: sys/compat/freebsd32/syscalls.master =================================================================== --- sys/compat/freebsd32/syscalls.master (revision 251369) +++ sys/compat/freebsd32/syscalls.master (working copy) @@ -1044,3 +1044,5 @@ __socklen_t * __restrict anamelen, \ int flags); } 542 AUE_PIPE NOPROTO { int pipe2(int *fildes, int flags); } +543 AUE_NULL NOSTD { int freebsd32_aio_mlock( \ + struct aiocb32 *aiocbp); } Index: sys/kern/syscalls.master =================================================================== --- sys/kern/syscalls.master (revision 251369) +++ sys/kern/syscalls.master (working copy) @@ -977,5 +977,6 @@ __socklen_t * __restrict anamelen, \ int flags); } 542 AUE_PIPE STD { int pipe2(int *fildes, int flags); } +543 AUE_NULL NOSTD { int aio_mlock(struct aiocb *aiocbp); } ; Please copy any additions and changes to the following compatability tables: ; sys/compat/freebsd32/syscalls.master Index: sys/kern/vfs_aio.c =================================================================== --- sys/kern/vfs_aio.c (revision 251369) +++ sys/kern/vfs_aio.c (working copy) @@ -338,7 +338,9 @@ static struct unrhdr *aiod_unr; void aio_init_aioinfo(struct proc *p); static int aio_onceonly(void); static int aio_free_entry(struct aiocblist *aiocbe); -static void aio_process(struct aiocblist *aiocbe); +static void aio_process_rw(struct aiocblist *aiocbe); +static void aio_process_sync(struct aiocblist *aiocbe); +static void aio_process_mlock(struct aiocblist *aiocbe); static int aio_newproc(int *); int aio_aqueue(struct thread *td, struct aiocb *job, struct aioliojob *lio, int type, struct aiocb_ops *ops); @@ -425,6 +427,7 @@ static struct syscall_helper_data aio_syscalls[] = SYSCALL_INIT_HELPER(aio_cancel), SYSCALL_INIT_HELPER(aio_error), SYSCALL_INIT_HELPER(aio_fsync), + SYSCALL_INIT_HELPER(aio_mlock), SYSCALL_INIT_HELPER(aio_read), SYSCALL_INIT_HELPER(aio_return), SYSCALL_INIT_HELPER(aio_suspend), @@ -452,6 +455,7 @@ static struct syscall_helper_data aio32_syscalls[] SYSCALL32_INIT_HELPER(freebsd32_aio_cancel), SYSCALL32_INIT_HELPER(freebsd32_aio_error), SYSCALL32_INIT_HELPER(freebsd32_aio_fsync), + SYSCALL32_INIT_HELPER(freebsd32_aio_mlock), SYSCALL32_INIT_HELPER(freebsd32_aio_read), SYSCALL32_INIT_HELPER(freebsd32_aio_write), SYSCALL32_INIT_HELPER(freebsd32_aio_waitcomplete), @@ -701,7 +705,8 @@ aio_free_entry(struct aiocblist *aiocbe) * at open time, but this is already true of file descriptors in * a multithreaded process. */ - fdrop(aiocbe->fd_file, curthread); + if (aiocbe->fd_file) + fdrop(aiocbe->fd_file, curthread); crfree(aiocbe->cred); uma_zfree(aiocb_zone, aiocbe); AIO_LOCK(ki); @@ -855,15 +860,15 @@ drop: } /* - * The AIO processing activity. This is the code that does the I/O request for - * the non-physio version of the operations. The normal vn operations are used, - * and this code should work in all instances for every type of file, including - * pipes, sockets, fifos, and regular files. + * The AIO processing activity for LIO_READ/LIO_WRITE. This is the code that + * does the I/O request for the non-physio version of the operations. The + * normal vn operations are used, and this code should work in all instances + * for every type of file, including pipes, sockets, fifos, and regular files. * * XXX I don't think it works well for socket, pipe, and fifo. */ static void -aio_process(struct aiocblist *aiocbe) +aio_process_rw(struct aiocblist *aiocbe) { struct ucred *td_savedcred; struct thread *td; @@ -877,23 +882,16 @@ static void int oublock_st, oublock_end; int inblock_st, inblock_end; + KASSERT(aiocbe->uaiocb.aio_lio_opcode == LIO_READ || + aiocbe->uaiocb.aio_lio_opcode == LIO_WRITE, + ("%s: opcode %d", __func__, aiocbe->uaiocb.aio_lio_opcode)); + td = curthread; td_savedcred = td->td_ucred; td->td_ucred = aiocbe->cred; cb = &aiocbe->uaiocb; fp = aiocbe->fd_file; - if (cb->aio_lio_opcode == LIO_SYNC) { - error = 0; - cnt = 0; - if (fp->f_vnode != NULL) - error = aio_fsync_vnode(td, fp->f_vnode); - cb->_aiocb_private.error = error; - cb->_aiocb_private.status = 0; - td->td_ucred = td_savedcred; - return; - } - aiov.iov_base = (void *)(uintptr_t)cb->aio_buf; aiov.iov_len = cb->aio_nbytes; @@ -954,6 +952,41 @@ static void } static void +aio_process_sync(struct aiocblist *aiocbe) +{ + struct thread *td = curthread; + struct ucred *td_savedcred = td->td_ucred; + struct aiocb *cb = &aiocbe->uaiocb; + struct file *fp = aiocbe->fd_file; + int error = 0; + + KASSERT(aiocbe->uaiocb.aio_lio_opcode == LIO_SYNC, + ("%s: opcode %d", __func__, aiocbe->uaiocb.aio_lio_opcode)); + + td->td_ucred = aiocbe->cred; + if (fp->f_vnode != NULL) + error = aio_fsync_vnode(td, fp->f_vnode); + cb->_aiocb_private.error = error; + cb->_aiocb_private.status = 0; + td->td_ucred = td_savedcred; +} + +static void +aio_process_mlock(struct aiocblist *aiocbe) +{ + struct aiocb *cb = &aiocbe->uaiocb; + int error; + + KASSERT(aiocbe->uaiocb.aio_lio_opcode == LIO_MLOCK, + ("%s: opcode %d", __func__, aiocbe->uaiocb.aio_lio_opcode)); + + error = vm_mlock(aiocbe->userproc, aiocbe->cred, + __DEVOLATILE(void *, cb->aio_buf), cb->aio_nbytes); + cb->_aiocb_private.error = error; + cb->_aiocb_private.status = 0; +} + +static void aio_bio_done_notify(struct proc *userp, struct aiocblist *aiocbe, int type) { struct aioliojob *lj; @@ -1024,7 +1057,7 @@ notification_done: } /* - * The AIO daemon, most of the actual work is done in aio_process, + * The AIO daemon, most of the actual work is done in aio_process_*, * but the setup (and address space mgmt) is done in this routine. */ static void @@ -1121,7 +1154,18 @@ aio_daemon(void *_id) ki = userp->p_aioinfo; /* Do the I/O function. */ - aio_process(aiocbe); + switch(aiocbe->uaiocb.aio_lio_opcode) { + case LIO_READ: + case LIO_WRITE: + aio_process_rw(aiocbe); + break; + case LIO_SYNC: + aio_process_sync(aiocbe); + break; + case LIO_MLOCK: + aio_process_mlock(aiocbe); + break; + } mtx_lock(&aio_job_mtx); /* Decrement the active job count. */ @@ -1261,7 +1305,7 @@ aio_qphysio(struct proc *p, struct aiocblist *aioc cb = &aiocbe->uaiocb; fp = aiocbe->fd_file; - if (fp->f_type != DTYPE_VNODE) + if (fp == NULL || fp->f_type != DTYPE_VNODE) return (-1); vp = fp->f_vnode; @@ -1613,6 +1657,9 @@ aio_aqueue(struct thread *td, struct aiocb *job, s case LIO_SYNC: error = fget(td, fd, CAP_FSYNC, &fp); break; + case LIO_MLOCK: + fp = NULL; + break; case LIO_NOP: error = fget(td, fd, CAP_NONE, &fp); break; @@ -1670,7 +1717,8 @@ aio_aqueue(struct thread *td, struct aiocb *job, s error = kqfd_register(kqfd, &kev, td, 1); aqueue_fail: if (error) { - fdrop(fp, td); + if (fp) + fdrop(fp, td); uma_zfree(aiocb_zone, aiocbe); ops->store_error(job, error); goto done; @@ -1687,7 +1735,7 @@ no_kqueue: if (opcode == LIO_SYNC) goto queueit; - if (fp->f_type == DTYPE_SOCKET) { + if (fp && fp->f_type == DTYPE_SOCKET) { /* * Alternate queueing for socket ops: Reach down into the * descriptor to get the socket data. Then check to see if the @@ -2165,6 +2213,13 @@ sys_aio_write(struct thread *td, struct aio_write_ return (aio_aqueue(td, uap->aiocbp, NULL, LIO_WRITE, &aiocb_ops)); } +int +sys_aio_mlock(struct thread *td, struct aio_mlock_args *uap) +{ + + return (aio_aqueue(td, uap->aiocbp, NULL, LIO_MLOCK, &aiocb_ops)); +} + static int kern_lio_listio(struct thread *td, int mode, struct aiocb * const *uacb_list, struct aiocb **acb_list, int nent, struct sigevent *sig, @@ -2907,6 +2962,14 @@ freebsd32_aio_write(struct thread *td, struct free } int +freebsd32_aio_mlock(struct thread *td, struct freebsd32_aio_mlock_args *uap) +{ + + return (aio_aqueue(td, (struct aiocb *)uap->aiocbp, NULL, LIO_MLOCK, + &aiocb32_ops)); +} + +int freebsd32_aio_waitcomplete(struct thread *td, struct freebsd32_aio_waitcomplete_args *uap) { Index: sys/sys/aio.h =================================================================== --- sys/sys/aio.h (revision 251369) +++ sys/sys/aio.h (working copy) @@ -38,6 +38,7 @@ #ifdef _KERNEL #define LIO_SYNC 0x3 #endif +#define LIO_MLOCK 0x4 /* * LIO modes @@ -124,6 +125,11 @@ int aio_cancel(int, struct aiocb *); */ int aio_suspend(const struct aiocb * const[], int, const struct timespec *); +/* + * Asynchronous mlock + */ +int aio_mlock(struct aiocb *); + #ifdef __BSD_VISIBLE int aio_waitcomplete(struct aiocb **, struct timespec *); #endif Index: sys/vm/vm_extern.h =================================================================== --- sys/vm/vm_extern.h (revision 251369) +++ sys/vm/vm_extern.h (working copy) @@ -90,5 +90,6 @@ struct sf_buf *vm_imgact_map_page(vm_object_t obje void vm_imgact_unmap_page(struct sf_buf *sf); void vm_thread_dispose(struct thread *td); int vm_thread_new(struct thread *td, int pages); +int vm_mlock(struct proc *, struct ucred *, const void *, size_t); #endif /* _KERNEL */ #endif /* !_VM_EXTERN_H_ */ Index: sys/vm/vm_mmap.c =================================================================== --- sys/vm/vm_mmap.c (revision 251369) +++ sys/vm/vm_mmap.c (working copy) @@ -1036,18 +1036,24 @@ sys_mlock(td, uap) struct thread *td; struct mlock_args *uap; { - struct proc *proc; + + return (vm_mlock(td->td_proc, td->td_ucred, uap->addr, uap->len)); +} + +int +vm_mlock(struct proc *proc, struct ucred *cred, const void *addr0, size_t len) +{ vm_offset_t addr, end, last, start; vm_size_t npages, size; vm_map_t map; unsigned long nsize; int error; - error = priv_check(td, PRIV_VM_MLOCK); + error = priv_check_cred(cred, PRIV_VM_MLOCK, 0); if (error) return (error); - addr = (vm_offset_t)uap->addr; - size = uap->len; + addr = (vm_offset_t)addr0; + size = len; last = addr + size; start = trunc_page(addr); end = round_page(last); @@ -1056,7 +1062,6 @@ sys_mlock(td, uap) npages = atop(end - start); if (npages > vm_page_max_wired) return (ENOMEM); - proc = td->td_proc; map = &proc->p_vmspace->vm_map; PROC_LOCK(proc); nsize = ptoa(npages + pmap_wired_count(map->pmap)); --svZFHVx8/dhPCe52-- From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 12:53:03 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id CC94DF18; Tue, 4 Jun 2013 12:53:03 +0000 (UTC) (envelope-from aduane@juniper.net) Received: from tx2outboundpool.messaging.microsoft.com (tx2ehsobe001.messaging.microsoft.com [65.55.88.11]) by mx1.freebsd.org (Postfix) with ESMTP id 7B4971013; Tue, 4 Jun 2013 12:53:03 +0000 (UTC) Received: from mail142-tx2-R.bigfish.com (10.9.14.226) by TX2EHSOBE015.bigfish.com (10.9.40.35) with Microsoft SMTP Server id 14.1.225.23; Tue, 4 Jun 2013 12:22:44 +0000 Received: from mail142-tx2 (localhost [127.0.0.1]) by mail142-tx2-R.bigfish.com (Postfix) with ESMTP id 99B2A240091; Tue, 4 Jun 2013 12:22:44 +0000 (UTC) X-Forefront-Antispam-Report: CIP:66.129.224.51; KIP:(null); UIP:(null); IPV:NLI; H:P-EMHUB02-HQ.jnpr.net; RD:none; EFVD:NLI X-SpamScore: -4 X-BigFish: PS-4(zz98dI9371I542I1432Izz1f42h1ee6h1de0h1fdah1202h1e76h1d1ah1d2ah1fc6hzz8275ch17326ah8275dhz31h2a8h683h839h944hd25hf0ah1220h1288h12a5h12a9h12bdh137ah13b6h1441h1504h1537h153bh15d0h162dh1631h1758h18e1h1946h19b5h19ceh1ad9h1b0ah1d07h1d0ch1d2eh1d3fh1de9h1dfeh1dffh1155h) Received-SPF: softfail (mail142-tx2: transitioning domain of juniper.net does not designate 66.129.224.51 as permitted sender) client-ip=66.129.224.51; envelope-from=aduane@juniper.net; helo=P-EMHUB02-HQ.jnpr.net ; -HQ.jnpr.net ; X-Forefront-Antispam-Report-Untrusted: CIP:157.56.244.213; KIP:(null); UIP:(null); (null); H:CH1PRD0510HT002.namprd05.prod.outlook.com; R:internal; EFV:INT Received: from mail142-tx2 (localhost.localdomain [127.0.0.1]) by mail142-tx2 (MessageSwitch) id 13703485634494_30393; Tue, 4 Jun 2013 12:22:43 +0000 (UTC) Received: from TX2EHSMHS008.bigfish.com (unknown [10.9.14.234]) by mail142-tx2.bigfish.com (Postfix) with ESMTP id E7B10440050; Tue, 4 Jun 2013 12:22:42 +0000 (UTC) Received: from P-EMHUB02-HQ.jnpr.net (66.129.224.51) by TX2EHSMHS008.bigfish.com (10.9.99.108) with Microsoft SMTP Server (TLS) id 14.1.225.23; Tue, 4 Jun 2013 12:22:40 +0000 Received: from P-CLDFE02-HQ.jnpr.net (172.24.192.60) by P-EMHUB02-HQ.jnpr.net (172.24.192.36) with Microsoft SMTP Server (TLS) id 8.3.213.0; Tue, 4 Jun 2013 05:22:39 -0700 Received: from o365mail.juniper.net (207.17.137.149) by o365mail.juniper.net (172.24.192.60) with Microsoft SMTP Server id 14.1.355.2; Tue, 4 Jun 2013 05:22:38 -0700 Received: from ch1outboundpool.messaging.microsoft.com (216.32.181.183) by o365mail.juniper.net (207.17.137.149) with Microsoft SMTP Server (TLS) id 14.1.355.2; Tue, 4 Jun 2013 05:25:48 -0700 Received: from mail57-ch1-R.bigfish.com (10.43.68.240) by CH1EHSOBE019.bigfish.com (10.43.70.76) with Microsoft SMTP Server id 14.1.225.23; Tue, 4 Jun 2013 12:22:38 +0000 Received: from mail57-ch1 (localhost [127.0.0.1]) by mail57-ch1-R.bigfish.com (Postfix) with ESMTP id 41C33E0161; Tue, 4 Jun 2013 12:22:38 +0000 (UTC) Received: from mail57-ch1 (localhost.localdomain [127.0.0.1]) by mail57-ch1 (MessageSwitch) id 1370348556289141_8751; Tue, 4 Jun 2013 12:22:36 +0000 (UTC) Received: from CH1EHSMHS014.bigfish.com (snatpool1.int.messaging.microsoft.com [10.43.68.244]) by mail57-ch1.bigfish.com (Postfix) with ESMTP id 43FFE2C00F9; Tue, 4 Jun 2013 12:22:36 +0000 (UTC) Received: from CH1PRD0510HT002.namprd05.prod.outlook.com (157.56.244.213) by CH1EHSMHS014.bigfish.com (10.43.70.14) with Microsoft SMTP Server (TLS) id 14.1.225.23; Tue, 4 Jun 2013 12:22:34 +0000 Received: from CH1PRD0510MB392.namprd05.prod.outlook.com ([169.254.6.109]) by CH1PRD0510HT002.namprd05.prod.outlook.com ([10.255.150.37]) with mapi id 14.16.0311.000; Tue, 4 Jun 2013 12:22:34 +0000 From: Andrew Duane To: Juli Mallett , Adrian Chadd Subject: RE: Kernelspace C11 atomics for MIPS Thread-Topic: Kernelspace C11 atomics for MIPS Thread-Index: AQHOYINHGC9M8xF/6kiJf8fnnz6uaJkk2jUAgAATWwCAAIsx4A== Date: Tue, 4 Jun 2013 12:22:33 +0000 Message-ID: <477C1270D3E5484DA2303CEBE274C9E13210A1C0@CH1PRD0510MB392.namprd05.prod.outlook.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [66.129.232.2] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-FOPE-CONNECTOR: Id%0$Dn%*$RO%0$TLS%0$FQDN%$TlsDn% X-FOPE-CONNECTOR: Id%12219$Dn%FREEBSD.ORG$RO%2$TLS%5$FQDN%onpremiseedge-1018244.customer.frontbridge.com$TlsDn%o365mail.juniper.net X-FOPE-CONNECTOR: Id%12219$Dn%80386.NL$RO%2$TLS%5$FQDN%onpremiseedge-1018244.customer.frontbridge.com$TlsDn%o365mail.juniper.net X-OriginatorOrg: juniper.net Cc: Ed Schouten , "freebsd-mips@FreeBSD.org" , FreeBSD-arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 12:53:03 -0000 > -----Original Message----- > From: owner-freebsd-mips@freebsd.org [mailto:owner-freebsd- > mips@freebsd.org] On Behalf Of Juli Mallett > Sent: Monday, June 03, 2013 11:55 PM > To: Adrian Chadd > Cc: Ed Schouten; freebsd-mips@FreeBSD.org; FreeBSD-arch > Subject: Re: Kernelspace C11 atomics for MIPS >=20 > On Mon, Jun 3, 2013 at 7:45 PM, Adrian Chadd wrote: >=20 > > Speaking of this; any idea why the SYNC operators have 8 NOPs following > > them? > > > > I noticed that when going through disassemblies of various mips24k .o > > files. > > >=20 > To drain the pipeline on certain deficient (and mostly older) CPUs by way > of guesswork and a little vague magic. Most CPUs we support, I would > guess, do not need this, and it continues to exist solely for > hysterical reasons. >=20 > I've certainly gotten rid of them and some other cargo cult synchronizati= on > on Octeon for testing and had it survive under considerable load, and > occasionally with some slight speedups (for some more commonly-used or > slower things than Just a Bunch Of NOPs.) >=20 > The trouble is that proving they aren't necessary requires being rigorous > and careful in understanding documentation and errata, and FUD about thei= r > possible necessity is somewhat-intimidating. It's not an easy kind of > corruption/unreliability/etc., to prove the lack of empirically. The various CPU types are supposed to specify exactly how many NOPs are nee= ded, what kind of barrier is needed and where, and which type of NOP is nee= ded (Alpha had at least two). The barriers are designed to insure correct o= peration ordering across the memory architectures including write buffers, = DMA hardware, L1/L2[/L3] caches and their connections to the cores. The CPU= should specify exactly what is needed where, and why. It should never be "= superstition". There is an exact hardware reason for every sync/barrier ope= ration, and every NOP needed, just like the COP0 hazards. Given that, Juli's last paragraph is right on point. The documentation can = be dense and difficult to understand, since it's usually written by hardwar= e engineers :-) And since getting it wrong can make for some really subtle,= intermittent, and incredibly hard to diagnose problems, it's easier to err= on the side of caution. It also happens that different CPUs included in a = certain compile switch may have different requirements, so you have to use = worst case. >=20 > Juli. > _______________________________________________ > freebsd-mips@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-mips > To unsubscribe, send any mail to "freebsd-mips-unsubscribe@freebsd.org" >=20 .................................... Andrew L. Duane Resident Architect - AT&T Technical Lead m +1 603.770.7088 o +1 408.933.6944 (2-6944) skype: andrewlduane aduane@juniper.net From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 13:12:37 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 0963F4F1; Tue, 4 Jun 2013 13:12:37 +0000 (UTC) (envelope-from mdf356@gmail.com) Received: from mail-ob0-x22c.google.com (mail-ob0-x22c.google.com [IPv6:2607:f8b0:4003:c01::22c]) by mx1.freebsd.org (Postfix) with ESMTP id B2EB01113; Tue, 4 Jun 2013 13:12:36 +0000 (UTC) Received: by mail-ob0-f172.google.com with SMTP id wo10so310488obc.17 for ; Tue, 04 Jun 2013 06:12:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=BVokO6MxSP+o73X/DQF48jeYhdomuymc4EyA8LVGtL8=; b=c8D/2l+kO9thxT0NpY8hUGAEDgFrxYHGam+k8cyFRMLhjP5UJ9ITURXtSHzPeSlc/f /rN/vsmIg8wiwr7Ll5Yz/vQpDK1j40FDJ9zcjiGYpxQ0HvHhZAk300Gs8HUHV7ndFqBG cO1bfcv99K3UAfFZle80rLt1VX0pQ0KOVRme06xxuhiy6L6/+Jg29KWl7cuFFsI/p6CS thHC5aybbKBY3PcUQliWm+I8FFasGpOUYcSt3+deN3CHPBjP1e/DXeA00FgBp23hNMij Sn7QxJzTAwtlscjeBcEQX2NXeCniJoKmcu1FWV+c+i+5xySwVXqRTPDDssemRhD0+++U /YQQ== MIME-Version: 1.0 X-Received: by 10.60.33.102 with SMTP id q6mr12334202oei.111.1370351556336; Tue, 04 Jun 2013 06:12:36 -0700 (PDT) Sender: mdf356@gmail.com Received: by 10.182.237.100 with HTTP; Tue, 4 Jun 2013 06:12:36 -0700 (PDT) In-Reply-To: <51ADA308.6040904@freebsd.org> References: <51ADA308.6040904@freebsd.org> Date: Tue, 4 Jun 2013 06:12:36 -0700 X-Google-Sender-Auth: UKda3lyyqPEvZiuB93DLOxsQkOM Message-ID: Subject: Re: Kernelspace C11 atomics for MIPS From: mdf@FreeBSD.org To: Andre Oppermann Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Ed Schouten , freebsd-mips@freebsd.org, FreeBSD Arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 13:12:37 -0000 On Tue, Jun 4, 2013 at 1:19 AM, Andre Oppermann wrote: > On 03.06.2013 16:04, Ed Schouten wrote: > >> Hi, >> >> As of r251230, it should be possible to use C11 atomics in >> kernelspace, by including ! Even when not using Clang >> (but GCC 4.2), it is possible to use quite a large portion of the API. >> > > I'm a bit wary of *kernel* developers using C11-native atomics as opposed > to our own atomic API. This could lead to a proliferation of home-grown, > more or less correctly working, locks and variants thereof (mostly less > correct). > I don't understand this. > Atomics and locks are difficult enough to get right and reason about even > with our rather good API and I scream in fear thinking about everyone(tm) > doing their own "optimized" lock or even forgoing it because "it's atomic". > Why would we replace primitives that work? Meanwhile, the C11 atomics are at least as well documented as FreeBSD's, and they're standardized. Why should a future programmer, who understands C11 atomics, need to learn all new names to work on our OS? I would even propose to go as far as disbarring the use of C11 atomics in > the kernel other than inside the officially supported lock API. So compare and swap is hard to reason about? The C11 atomics should be no harder to reason about than our own -- their effects are documented. And we expect kernel programmers, of all people, to actually be careful about choosing their instructions. Personally, I find both the C11 atomics and FreeBSD's annoying, since "acquire" and "release" semantics are basically an x86 ism. PPC has no notion of this; it has sync and isync and lwsync instructions which are separate from the atomic set, but can be combined to create the same effect. Except the PPC manual is exceptionally explicit about what guarantees sync provides; it gives a mathematical ordering on loads/stores i, j and which effects can be seen when. "Acquire" and "Release" seem to be named because you kinda need one to acquire a lock and kinda need one to release it. But the effect of ordering loads or stores or both doesn't need to be dependent on the store/load, so putting the two together is just an x86 convenience (and an annoyance on at least PPC). Anyways, that aside, I see no reason to use a home-grown solution when the C standard finally provides one. It seems akin to preferring u_int64_t over uint64_t because we had it first and they changed the spelling on us. Thanks, matthew From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 15:09:21 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 2C7A78A5 for ; Tue, 4 Jun 2013 15:09:21 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from mail-ie0-x22d.google.com (mail-ie0-x22d.google.com [IPv6:2607:f8b0:4001:c03::22d]) by mx1.freebsd.org (Postfix) with ESMTP id EE23E1825 for ; Tue, 4 Jun 2013 15:09:20 +0000 (UTC) Received: by mail-ie0-f173.google.com with SMTP id k13so688950iea.4 for ; Tue, 04 Jun 2013 08:09:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=mQjEh5F3H8ANLTXokocrCCP3P7W6+e2Rh72FQ4MymME=; b=aVQtrvDWUsRcOanK5JetmOo+Sxhma3vxumKNs7taDOp+dbueoVDhCEzY+MJeakOd2G msRH/Ou6Qe0UqhFmu9tAffPZ8BbsTE6eRuGRDeWNSoGfiTb/Chh398kVv6taNJ80ngI0 c35xbLBjCMLURpoU08m6o2i/S79DDXQf9Si2ng4Ze1Wf0A75RNDgzWZ8AiKgeBjD+mLY bNWVYB+3sqJWq5R3X7Colvz5EfRImEEpPZ10h6g39Gw6L0rdZUlpdNr6BTJ8pM6SdGyt k7xuMiu7XAxlOnd7P2HnFLUPdPzZNltl5UtKu2jrRA08MbRGDXqnh/Ta/9mlSaWJN8R8 RbTw== X-Received: by 10.50.18.17 with SMTP id s17mr1027482igd.80.1370358560526; Tue, 04 Jun 2013 08:09:20 -0700 (PDT) Received: from 53.imp.bsdimp.com (50-78-194-198-static.hfc.comcastbusiness.net. [50.78.194.198]) by mx.google.com with ESMTPSA id f6sm2261850igz.1.2013.06.04.08.09.18 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 04 Jun 2013 08:09:19 -0700 (PDT) Sender: Warner Losh Subject: Re: Kernelspace C11 atomics for MIPS Mime-Version: 1.0 (Apple Message framework v1085) Content-Type: text/plain; charset=us-ascii From: Warner Losh In-Reply-To: <477C1270D3E5484DA2303CEBE274C9E13210A1C0@CH1PRD0510MB392.namprd05.prod.outlook.com> Date: Tue, 4 Jun 2013 09:09:17 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: <2EE5CF35-1E46-44B8-83B3-6923FC6FA854@bsdimp.com> References: <477C1270D3E5484DA2303CEBE274C9E13210A1C0@CH1PRD0510MB392.namprd05.prod.outlook.com> To: Andrew Duane X-Mailer: Apple Mail (2.1085) X-Gm-Message-State: ALoCoQkZBQwj0BoelMl+KMUU9fl0C0wjeCnLPSNhHUcOuA5spnybpy3TjxOcLH843c4/W2pV2yM5 Cc: Juli Mallett , Ed Schouten , Adrian Chadd , "freebsd-mips@FreeBSD.org" , FreeBSD-arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 15:09:21 -0000 On Jun 4, 2013, at 6:22 AM, Andrew Duane wrote: >=20 >> -----Original Message----- >> From: owner-freebsd-mips@freebsd.org [mailto:owner-freebsd- >> mips@freebsd.org] On Behalf Of Juli Mallett >> Sent: Monday, June 03, 2013 11:55 PM >> To: Adrian Chadd >> Cc: Ed Schouten; freebsd-mips@FreeBSD.org; FreeBSD-arch >> Subject: Re: Kernelspace C11 atomics for MIPS >>=20 >> On Mon, Jun 3, 2013 at 7:45 PM, Adrian Chadd = wrote: >>=20 >>> Speaking of this; any idea why the SYNC operators have 8 NOPs = following >>> them? >>>=20 >>> I noticed that when going through disassemblies of various mips24k = .o >>> files. >>>=20 >>=20 >> To drain the pipeline on certain deficient (and mostly older) CPUs by = way >> of guesswork and a little vague magic. Most CPUs we support, I would >> guess, do not need this, and it continues to exist solely for >> hysterical reasons. >>=20 >> I've certainly gotten rid of them and some other cargo cult = synchronization >> on Octeon for testing and had it survive under considerable load, and >> occasionally with some slight speedups (for some more commonly-used = or >> slower things than Just a Bunch Of NOPs.) >>=20 >> The trouble is that proving they aren't necessary requires being = rigorous >> and careful in understanding documentation and errata, and FUD about = their >> possible necessity is somewhat-intimidating. It's not an easy kind = of >> corruption/unreliability/etc., to prove the lack of empirically. >=20 > The various CPU types are supposed to specify exactly how many NOPs = are needed, what kind of barrier is needed and where, and which type of = NOP is needed (Alpha had at least two). The barriers are designed to = insure correct operation ordering across the memory architectures = including write buffers, DMA hardware, L1/L2[/L3] caches and their = connections to the cores. The CPU should specify exactly what is needed = where, and why. It should never be "superstition". There is an exact = hardware reason for every sync/barrier operation, and every NOP needed, = just like the COP0 hazards. Except that none of the examples in the ISA manual have them, and = there's no mention of them at all, unlike COP0 hazards. I know of only = one case in the Linux tree where it is done (for the au1xxxx cores). = There's another place where it is defined in a function, but that = function is never called for older Broadcom MIPS. In NetBSD, extra NOPs are not inserted at all. They do do two syncs for = the SB1250 PASS 1. None of the docs I've seen for latter-day MIPS CPUs document the need = for NOPs. In fact, the only place I think that I've seen them was on one = or two of the older R8000, R10000 and R120000 errata that stated they = were needed there. Since these were the first complicated multi-issue = designs, it wasn't surprising that the NOPs were needed as work arounds. = I can't find these errata with a google search now, so I can't confirm = this dim memory that I have. The reason the NOPs are there today likely is superstition. A cargo-cult = workaround from the past whose days have come and gone, leaving nothing = but an echo in the code. > Given that, Juli's last paragraph is right on point. The documentation = can be dense and difficult to understand, since it's usually written by = hardware engineers :-) And since getting it wrong can make for some = really subtle, intermittent, and incredibly hard to diagnose problems, = it's easier to err on the side of caution. It also happens that = different CPUs included in a certain compile switch may have different = requirements, so you have to use worst case. I'll agree about the dense documentation. But usually that's around all = the crazy cache effects that one must understand to cope with the design = that puts half of the cache management in software. I'm all for ditching them unless a specific reason for keeping them can = be found. >> Juli. >> _______________________________________________ >> freebsd-mips@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-mips >> To unsubscribe, send any mail to = "freebsd-mips-unsubscribe@freebsd.org" >>=20 >=20 >=20 > .................................... > Andrew L. Duane > Resident Architect - AT&T Technical Lead > m +1 603.770.7088 > o +1 408.933.6944 (2-6944) > skype: andrewlduane > aduane@juniper.net >=20 >=20 >=20 > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to = "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 16:03:59 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id F28566C3; Tue, 4 Jun 2013 16:03:58 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) by mx1.freebsd.org (Postfix) with ESMTP id CFB2F1B17; Tue, 4 Jun 2013 16:03:58 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id A82C9B94F; Tue, 4 Jun 2013 12:03:57 -0400 (EDT) From: John Baldwin To: freebsd-arch@freebsd.org Subject: Re: Kernelspace C11 atomics for MIPS Date: Tue, 4 Jun 2013 09:52:51 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p25; KDE/4.5.5; amd64; ; ) References: <51ADA308.6040904@freebsd.org> In-Reply-To: <51ADA308.6040904@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201306040952.51513.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 04 Jun 2013 12:03:57 -0400 (EDT) Cc: Ed Schouten , Andre Oppermann , freebsd-mips@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 16:03:59 -0000 On Tuesday, June 04, 2013 4:19:20 am Andre Oppermann wrote: > On 03.06.2013 16:04, Ed Schouten wrote: > > Hi, > > > > As of r251230, it should be possible to use C11 atomics in > > kernelspace, by including ! Even when not using Clang > > (but GCC 4.2), it is possible to use quite a large portion of the API. > > I'm a bit wary of *kernel* developers using C11-native atomics as opposed > to our own atomic API. This could lead to a proliferation of home-grown, > more or less correctly working, locks and variants thereof (mostly less > correct). I think this is not a big deal to worry about as developers have already been free to do this via and haven't gone super crazy. Replacing with is probably fine and should be a simple drop-in replacement for our lock implementations. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 16:03:59 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id A5D776C5; Tue, 4 Jun 2013 16:03:59 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) by mx1.freebsd.org (Postfix) with ESMTP id 818CC1B18; Tue, 4 Jun 2013 16:03:59 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id C88B1B953; Tue, 4 Jun 2013 12:03:58 -0400 (EDT) From: John Baldwin To: freebsd-arch@freebsd.org Subject: Re: Kernelspace C11 atomics for MIPS Date: Tue, 4 Jun 2013 09:56:00 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p25; KDE/4.5.5; amd64; ; ) References: <51ADA308.6040904@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201306040956.01065.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 04 Jun 2013 12:03:58 -0400 (EDT) Cc: mdf@freebsd.org, Andre Oppermann , freebsd-mips@freebsd.org, Ed Schouten X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 16:03:59 -0000 On Tuesday, June 04, 2013 9:12:36 am mdf@freebsd.org wrote: > Personally, I find both the C11 atomics and FreeBSD's annoying, > since "acquire" and "release" semantics are basically an x86 ism. PPC has > no notion of this; it has sync and isync and lwsync instructions which are > separate from the atomic set, but can be combined to create the same > effect. Except the PPC manual is exceptionally explicit about what > guarantees sync provides; it gives a mathematical ordering on loads/stores > i, j and which effects can be seen when. "Acquire" and "Release" seem to > be named because you kinda need one to acquire a lock and kinda need one to > release it. But the effect of ordering loads or stores or both doesn't > need to be dependent on the store/load, so putting the two together is just > an x86 convenience (and an annoyance on at least PPC). Actually, it came from ia64 (at least for FreeBSD's), not x86. :) However, it is still useful to think about, and they are barriers with respect to the load/store of the lock cookie. The requirement that the "acquire" blocks any subsequent loads/stores in program order from occurring until after the operation on the lock cookie succeeds and that "release" prevents any loads/stores frmo moving past the operation on the lock cookie is not quite the same as a traditional read or write barrier. acquire and release only require a barrier in one direction and enforce ordering on both reads and writes. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 16:46:57 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D73DCCB5; Tue, 4 Jun 2013 16:46:57 +0000 (UTC) (envelope-from mdf356@gmail.com) Received: from mail-oa0-x236.google.com (mail-oa0-x236.google.com [IPv6:2607:f8b0:4003:c02::236]) by mx1.freebsd.org (Postfix) with ESMTP id 7DD731DA2; Tue, 4 Jun 2013 16:46:57 +0000 (UTC) Received: by mail-oa0-f54.google.com with SMTP id o17so332216oag.27 for ; Tue, 04 Jun 2013 09:46:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=U321lJJP6z/A5sq5VfRFdvgFCaN/7LN+d3tIChXOn44=; b=H4T8tvGYva69xqc7cwInlVqQiZDqGTrTUY0L7df2COt5iEdFU2kcc//5leLt/pVmiS WQF9UNIqJLwtdYB6C1pgAN5o+CzBvec08NtgY+9X5Av1MkUpJL0SMQForPlAJcnHwFsY tofa+gGnTylaETpTjusWlX4J9D5HRvFcMwaRwq2jldD6JaK0RORx+7WYCiUMLRGJMwxF NPv+51mCe7AfGM5jmlI+Be+AwUi4fQv2i4plYxbdiusdYOYizZGfGAlmyVkt+w9Jw0Xn wxJ1fKJGnaWf3+HoxBkb9pyw1UhSg2aYU1pMCAo/tE/hT6NWm2Xmbs5q2PqV2sSavQyI MOrA== MIME-Version: 1.0 X-Received: by 10.60.79.231 with SMTP id m7mr12868216oex.105.1370364417114; Tue, 04 Jun 2013 09:46:57 -0700 (PDT) Sender: mdf356@gmail.com Received: by 10.182.237.100 with HTTP; Tue, 4 Jun 2013 09:46:57 -0700 (PDT) In-Reply-To: <201306040956.01065.jhb@freebsd.org> References: <51ADA308.6040904@freebsd.org> <201306040956.01065.jhb@freebsd.org> Date: Tue, 4 Jun 2013 09:46:57 -0700 X-Google-Sender-Auth: 4yIQmTIH3ETY_Ag5xxf_lyy1qnE Message-ID: Subject: Re: Kernelspace C11 atomics for MIPS From: mdf@FreeBSD.org To: John Baldwin Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Ed Schouten , Andre Oppermann , freebsd-mips@freebsd.org, FreeBSD Arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 16:46:57 -0000 On Tue, Jun 4, 2013 at 6:56 AM, John Baldwin wrote: > On Tuesday, June 04, 2013 9:12:36 am mdf@freebsd.org wrote: > > Personally, I find both the C11 atomics and FreeBSD's > annoying, > > since "acquire" and "release" semantics are basically an x86 ism. PPC > has > > no notion of this; it has sync and isync and lwsync instructions which > are > > separate from the atomic set, but can be combined to create the same > > effect. Except the PPC manual is exceptionally explicit about what > > guarantees sync provides; it gives a mathematical ordering on > loads/stores > > i, j and which effects can be seen when. "Acquire" and "Release" seem to > > be named because you kinda need one to acquire a lock and kinda need one > to > > release it. But the effect of ordering loads or stores or both doesn't > > need to be dependent on the store/load, so putting the two together is > just > > an x86 convenience (and an annoyance on at least PPC). > > Actually, it came from ia64 (at least for FreeBSD's), not x86. :) However, > it is still useful to think about, and they are barriers with respect to > the > load/store of the lock cookie. The requirement that the "acquire" blocks > any > subsequent loads/stores in program order from occurring until after the > operation on the lock cookie succeeds and that "release" prevents any > loads/stores frmo moving past the operation on the lock cookie is not quite > the same as a traditional read or write barrier. acquire and release only > require a barrier in one direction and enforce ordering on both reads and > writes. Yeah, thinking more I feel sorry for those CISC architectures that need so many C primitives because it's less efficient to emit a memory fence then the load (or fence then store). It is less elegant, though, and the C standard had to add all the fenced variants of atomics to support it. Thanks, matthew From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 16:56:59 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 13400848; Tue, 4 Jun 2013 16:56:59 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id A81711E34; Tue, 4 Jun 2013 16:56:58 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id r54Guqs4088526; Tue, 4 Jun 2013 19:56:52 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua r54Guqs4088526 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id r54GuqYt088525; Tue, 4 Jun 2013 19:56:52 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 4 Jun 2013 19:56:52 +0300 From: Konstantin Belousov To: John Baldwin Subject: Re: Kernelspace C11 atomics for MIPS Message-ID: <20130604165652.GT3047@kib.kiev.ua> References: <51ADA308.6040904@freebsd.org> <201306040952.51513.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="jS6B5y1SGyDdwoJD" Content-Disposition: inline In-Reply-To: <201306040952.51513.jhb@freebsd.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: Ed Schouten , Andre Oppermann , freebsd-mips@freebsd.org, freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 16:56:59 -0000 --jS6B5y1SGyDdwoJD Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jun 04, 2013 at 09:52:51AM -0400, John Baldwin wrote: > On Tuesday, June 04, 2013 4:19:20 am Andre Oppermann wrote: > > On 03.06.2013 16:04, Ed Schouten wrote: > > > Hi, > > > > > > As of r251230, it should be possible to use C11 atomics in > > > kernelspace, by including ! Even when not using Clang > > > (but GCC 4.2), it is possible to use quite a large portion of the API. > >=20 > > I'm a bit wary of *kernel* developers using C11-native atomics as oppos= ed > > to our own atomic API. This could lead to a proliferation of home-grow= n, > > more or less correctly working, locks and variants thereof (mostly less > > correct). >=20 > I think this is not a big deal to worry about as developers have already = been=20 > free to do this via and haven't gone super crazy. =20 > Replacing with is probably fine and= =20 > should be a simple drop-in replacement for our lock implementations. I do not think so. The compilers are free to use whatever means to implement the stdatomic. In particular, they are allowed to use simple global lock to protect the 'atomic' access, see ATOMIC_type_LOCK_FREE documented macros. IMO using our machine/atomic.h gives us the desirable exact control over the semantic of locks both kernel and usermode C runtime implement. Practically speaking, I think most people there are capable of fixing bugs and extending functionality of machine/atomic.h, but have no desire or time digging into to change something. --jS6B5y1SGyDdwoJD Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (FreeBSD) iQIcBAEBAgAGBQJRrhxUAAoJEJDCuSvBvK1BVckP/1ZYaOksy4B8GrcRo2vxCUzp cKoWJJ1hqTmygBMfM0umWi0cyOIQvn15GGzKKwpuNWuUKpZSKGaVDrR9PzPeUwoP N249KNgun+jIEE145WkXaP5DtPg3SEsH1KBWDRjIGHXFDMtBUT6msBf1BToCzKTP 0mMyLA3gG53xuPU6sISQT7pNKEf9WXnua33kEM4j7bdlyKYatvL3EgrbOfS1Gwc6 UDJch3jedXDhPJbD26m5AYQm/oLBEbZfkVgquIa5pDslB4w+7bNK0FimQfd+RrpT BI99dZmhLjS5ra3Udwcn8q+05G02HBfHmNmfmizgVVUwQC/8b64KWmyO6orynE2V QsGsFZWzuE6fxfOkQMFCM0nQQW/jk7zvStEL12wOR/UYIEUdvMHDre02/JOkPNQV qEIupfnyDzjI2cFGVO2ZW/2zbrzaE++bhHhwhq3HDs6s4ZLtZV6joCKYZmDNsfQl vrODQLVpHmiMOJ5BRllk4JXKb3N6pv49HQ0J5oXKjJPDrolAOOrb+FMICEqd0Uuu aTYgaDA0zKbMp4VwBqz4r4QiUC+2tDftW/X+Rs06MWSJIfQF53jyhpyVMO8a2f7t MmhQf2iTMVZzgtEf3+oWNSKrDQkPwcoef09aXmBD5VowFX4MtZa5Dvc5D9OI+r7u e0krjCO8WjPF7INZL43S =BxK5 -----END PGP SIGNATURE----- --jS6B5y1SGyDdwoJD-- From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 17:23:35 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 3AB6276A; Tue, 4 Jun 2013 17:23:35 +0000 (UTC) (envelope-from edschouten@gmail.com) Received: from mail-vc0-f180.google.com (mail-vc0-f180.google.com [209.85.220.180]) by mx1.freebsd.org (Postfix) with ESMTP id BB3A3103D; Tue, 4 Jun 2013 17:23:34 +0000 (UTC) Received: by mail-vc0-f180.google.com with SMTP id gd11so390049vcb.25 for ; Tue, 04 Jun 2013 10:23:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=IkW8ji4gMuhRowwT1AgRR6Wx2nP9CdEAmVAmPGh+zzI=; b=Yslk64yT7w6C3YN6aDro68PEvdUSdrVvg04zhrOTR/fDbO5bquJItSSpXsBJZlETJw C93ke1nlkjJ8wFZlZKo2YNgcf7M9M+Um6dxXrpEB8ndrWbn7sajKzcT/GLPLa+uCYotJ dp/RwBzrogAS7JM8WcO09U9+0faplHuCdfvlec3IJc+jHEp2cTWsudwRDMdpA1OgbuAU sIPCMlo61I3Q/OneI86ZKOTHHFHJzUIqBS1wdj5W4yYGhLzHnuldVn/WrdIF4BVcWpID o3p4pcWBdWdHK9RFG7837Hl6an1LsdFl0KVB0gY5UVll2BiijKP+4icjofPsLfhCVKrr L5vA== MIME-Version: 1.0 X-Received: by 10.221.4.131 with SMTP id oc3mr18813611vcb.49.1370366608682; Tue, 04 Jun 2013 10:23:28 -0700 (PDT) Sender: edschouten@gmail.com Received: by 10.220.107.139 with HTTP; Tue, 4 Jun 2013 10:23:28 -0700 (PDT) In-Reply-To: <20130604165652.GT3047@kib.kiev.ua> References: <51ADA308.6040904@freebsd.org> <201306040952.51513.jhb@freebsd.org> <20130604165652.GT3047@kib.kiev.ua> Date: Tue, 4 Jun 2013 19:23:28 +0200 X-Google-Sender-Auth: 3h2-NMjqZ9YUhjcB7SUcOmt4FZc Message-ID: Subject: Re: Kernelspace C11 atomics for MIPS From: Ed Schouten To: Konstantin Belousov Content-Type: text/plain; charset=UTF-8 Cc: Andre Oppermann , freebsd-mips@freebsd.org, freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 17:23:35 -0000 2013/6/4 Konstantin Belousov : > I do not think so. The compilers are free to use whatever means to > implement the stdatomic. In particular, they are allowed to use simple > global lock to protect the 'atomic' access, see ATOMIC_type_LOCK_FREE > documented macros. Well, yes, no, it's complicated. The fact is, we are still free to implement without using those compiler's features. For example, we could have also decided to implement using only code we provide ourselves, as follows: static inline int8_t our_8bit_atomic_store(...) { ... } #define atomic_store(...) _Generic( \ int8_t: our_8bit_atomic_store(....), \ ... \ ) Also, it is extremely unlikely that compilers implement handlers for non-lock-free atomics themselves. Both Clang and GCC 4.7+, for example, will call into __atomic_*_{1,2,4,8,16,c}() whenever it does not know built-in CPU instructions to perform the operation. More details: http://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary So in my opinion there are tons of ways we can still influence how the atomic operations are performed. The patch I sent out already demonstrates this, as we are free to implement the GCC intrinsics the way we like. -- Ed Schouten From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 18:07:12 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id F20E229D; Tue, 4 Jun 2013 18:07:11 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-qe0-f45.google.com (mail-qe0-f45.google.com [209.85.128.45]) by mx1.freebsd.org (Postfix) with ESMTP id 5A4681210; Tue, 4 Jun 2013 18:07:11 +0000 (UTC) Received: by mail-qe0-f45.google.com with SMTP id q19so409787qeb.4 for ; Tue, 04 Jun 2013 11:07:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=ye+/DK9B8RSq2OPcPcACrpKg94AVyIf7MLo8RohwV+U=; b=TuaeoYMffceTzPIq49hkvisUPY+x9S1kLU+YaiuOCEejtxHYz4bvlV01iecsIwfz+c Qa5wxXbAcyvpzt1i1Kf4UZtID5H7NYPM7Bl06C6IFYpnIgi5GWn3ivFYkq/0K3MPuyW/ a24xwmS4kqa2DBhSaNzKGxmN+3Zz65Oj5bxDxA7CWJBbqC8K1hOG3uRBBlh/z1Scf998 b4MXneK7Pq/9lQXyGJ/EJGlXvD5KJbn0MsF+C3Y4S2fO/iMgtusCBqgXk6kDiCCWAquY VEvFc8n8PI3bo8mZyW8X7yncW81MEAQRhQYyt5Xm1LHQRxcSrugsH15rzI6UStpNJ1H7 oUOA== MIME-Version: 1.0 X-Received: by 10.229.149.14 with SMTP id r14mr6819349qcv.59.1370369225122; Tue, 04 Jun 2013 11:07:05 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.224.71.12 with HTTP; Tue, 4 Jun 2013 11:07:05 -0700 (PDT) In-Reply-To: <232DBBD8-3F32-4D42-85AB-AC5647EEA768@bsdimp.com> References: <232DBBD8-3F32-4D42-85AB-AC5647EEA768@bsdimp.com> Date: Tue, 4 Jun 2013 11:07:05 -0700 X-Google-Sender-Auth: VI61RTyvzr8ndvlahzEoCAnACaw Message-ID: Subject: Re: Kernelspace C11 atomics for MIPS From: Adrian Chadd To: Warner Losh Content-Type: text/plain; charset=ISO-8859-1 Cc: Patrick Kelsey , Juli Mallett , FreeBSD-arch , "freebsd-mips@FreeBSD.org" , Ed Schouten X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 18:07:12 -0000 Hi, It survived an overnight thrashing on my AR7161 (mips24k.) Adrian On 3 June 2013 22:07, Warner Losh wrote: > Please find attached a simple patch that I'd like all MIPS users to try. > > Warner > > > > > On Jun 3, 2013, at 10:15 PM, Patrick Kelsey wrote: > >> On Tue, Jun 4, 2013 at 12:08 AM, Patrick Kelsey wrote: >>> On Mon, Jun 3, 2013 at 11:57 PM, Adrian Chadd wrote: >>>> On 3 June 2013 20:55, Juli Mallett wrote: >>>> >>>>> To drain the pipeline on certain deficient (and mostly older) CPUs by way of >>>>> guesswork and a little vague magic. Most CPUs we support, I would guess, do >>>>> not need this, and it continues to exist solely for hysterical reasons. >>>> >>>> How can I turn it off for my compiles? >>>> >>>>> I've certainly gotten rid of them and some other cargo cult synchronization >>>>> on Octeon for testing and had it survive under considerable load, and >>>>> occasionally with some slight speedups (for some more commonly-used or >>>>> slower things than Just a Bunch Of NOPs.) >>>> >>>> Right. Well, since it's happening on every inlined lock, it's a bit silly. >>>> >>>>> The trouble is that proving they aren't necessary requires being rigorous >>>>> and careful in understanding documentation and errata, and FUD about their >>>>> possible necessity is somewhat-intimidating. It's not an easy kind of >>>>> corruption/unreliability/etc., to prove the lack of empirically. >>>> >>>> I've checked the diassembly from gcc-4.mumble on linux; it doesn't >>>> include NOPs like this as far as I can tell. >>>> >>> >>> The sync + 8 nops is coming from the definition of mips_sync() in >>> sys/mips/include/atomic.h. >>> >>> I agree with Juli that it appears to be a manual pipeline-flush >>> holdover from earlier days - I'm guessing there's 8 nops because the >>> R4000/4400 had both the sync instruction and an 8-stage pipeline. I'm >>> further guessing this was an attempt at providing stronger ordering >>> semantics than the sync instruction itself for the following >>> mb()/wmb()/rmb() definitions that use it, as the sync instruction >>> definition doesn't restrict execution of the before/after loads/stores >>> with respect to the sync instruction itself. >> >> Forgot to emphasize that this particular bit of old-school >> nop-counting is either pointless or a latent hazard - 8 does not cover >> the deepest MIPS pipeline around, then there's superscalar issue to >> consider - so I think it's either unnecessary or insufficient. So >> far, that's all criticism and no solution :/ >> _______________________________________________ >> freebsd-arch@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > > From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 19:11:56 2013 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E9E7FB97; Tue, 4 Jun 2013 19:11:56 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 6189C1777; Tue, 4 Jun 2013 19:11:56 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id r54JBqtd016593; Tue, 4 Jun 2013 22:11:52 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua r54JBqtd016593 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id r54JBq8i016592; Tue, 4 Jun 2013 22:11:52 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 4 Jun 2013 22:11:52 +0300 From: Konstantin Belousov To: Gleb Smirnoff Subject: Re: aio_mlock(2) system call Message-ID: <20130604191152.GW3047@kib.kiev.ua> References: <20130603100618.GH67170@FreeBSD.org> <20130603161255.GM3047@kib.kiev.ua> <20130604113035.GV67170@glebius.int.ru> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Il/7I/LYKbAjm5im" Content-Disposition: inline In-Reply-To: <20130604113035.GV67170@glebius.int.ru> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 19:11:57 -0000 --Il/7I/LYKbAjm5im Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jun 04, 2013 at 03:30:35PM +0400, Gleb Smirnoff wrote: > Updated patch. >=20 I have no further comments. You might want to make the switch of double casts to DEVOLATILE() in the other parts of vfs_aio.c as separate commit. --Il/7I/LYKbAjm5im Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (FreeBSD) iQIcBAEBAgAGBQJRrjv4AAoJEJDCuSvBvK1BeNEP/jrjfDbhflfNBBuxLvBGFuqA ja6mKIdgjWaX45baDtXgzPVrUFIvLnClRxCia6VA6G1nKHYycU2w+09WscFGTaFO xLudmdXjSdOxVps07hsLIIH10wD73ivm5uKcLZhV32r+QVG2Cf5U6zfp65t7TA+k tVJM8Cf0Yi2GuX7vIUBa61dWWwHHmbjrVRvn+TPDVAEoJB8a+eK69n23OtOZfcp+ OcFk36tQkfqmwqMLWolA8Fb3ZK2eZ7VxmjieaswdY0WAUaeQFWQHmjrTwlkrgSLq L0A/re3g8cH3o1muvJqz98obln1fC2bzmSIg7mFFHo3JzvfNtbAywihZ5hmRchay QXZV/HAuE+35h6Ilggx8wK6HKHmjIPnsycwCiMpihIhII3io2T4ZbFGdWNPJHTBw V204aICel/HdmGQsbwSKgqi6sx5fTjEdYE8lWFMMCdGcR2ULDLkpyzCblZM0V++j 9XaDg2m44UQ977/7OZ3Mo5+15QYNlJTdo5Wu+tBcflrlebBJDDIx4QOKg313HuAs muiTbXFCh4nHSAiDa9yABZQwMQMNZk3Sy679plw0wjTrW8BAgH7wb9DTAlDsNP7p Hr4OIBOY2jmISdd+V+7RiI09KCVp5CNu1hE/BNVNccXDwiw1ANY8mkwfSYlv3r8z yg0XJpm4CdqvOvsmZIf9 =Vk/9 -----END PGP SIGNATURE----- --Il/7I/LYKbAjm5im-- From owner-freebsd-arch@FreeBSD.ORG Tue Jun 4 21:29:20 2013 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id BE54F904; Tue, 4 Jun 2013 21:29:20 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (relay02.stack.nl [IPv6:2001:610:1108:5010::104]) by mx1.freebsd.org (Postfix) with ESMTP id 730971CFD; Tue, 4 Jun 2013 21:29:20 +0000 (UTC) Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id 026FF3592DD; Tue, 4 Jun 2013 23:29:17 +0200 (CEST) Received: by snail.stack.nl (Postfix, from userid 1677) id DE0BE28493; Tue, 4 Jun 2013 23:29:17 +0200 (CEST) Date: Tue, 4 Jun 2013 23:29:17 +0200 From: Jilles Tjoelker To: Gleb Smirnoff Subject: Re: aio_mlock(2) system call Message-ID: <20130604212917.GA72412@stack.nl> References: <20130603100618.GH67170@FreeBSD.org> <20130603161255.GM3047@kib.kiev.ua> <20130604113035.GV67170@glebius.int.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130604113035.GV67170@glebius.int.ru> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Konstantin Belousov , arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 21:29:20 -0000 On Tue, Jun 04, 2013 at 03:30:35PM +0400, Gleb Smirnoff wrote: > Updated patch. > [snip] > Index: lib/libc/sys/Symbol.map > =================================================================== > --- lib/libc/sys/Symbol.map (revision 251369) > +++ lib/libc/sys/Symbol.map (working copy) > @@ -378,6 +378,7 @@ FBSD_1.2 { > }; > > FBSD_1.3 { > + aio_mlock; > accept4; > bindat; > cap_fcntls_get; This should probably be in alphabetical order. > Index: lib/libc/sys/aio_mlock.2 > =================================================================== > --- lib/libc/sys/aio_mlock.2 (revision 0) > +++ lib/libc/sys/aio_mlock.2 (working copy) > [snip] > +.Sh PORTABILITY > +The > +.Fn aio_mlock > +system call is a > +.Fx > +extension, and shouldn't be used in portable code. Man pages should not use contractions. > [snip] > Index: sys/sys/aio.h > =================================================================== > --- sys/sys/aio.h (revision 251369) > +++ sys/sys/aio.h (working copy) > @@ -38,6 +38,7 @@ > #ifdef _KERNEL > #define LIO_SYNC 0x3 > #endif > +#define LIO_MLOCK 0x4 Is it intended that the new constant is available to userland, such as for use in lio_listio(2)? > [snip] -- Jilles Tjoelker From owner-freebsd-arch@FreeBSD.ORG Wed Jun 5 00:12:42 2013 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 768E2FD7; Wed, 5 Jun 2013 00:12:42 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id 3E6851844; Wed, 5 Jun 2013 00:12:41 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id BA577420B54; Wed, 5 Jun 2013 09:52:25 +1000 (EST) Date: Wed, 5 Jun 2013 09:52:24 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov Subject: Re: aio_mlock(2) system call In-Reply-To: <20130604191152.GW3047@kib.kiev.ua> Message-ID: <20130605093622.L11224@besplex.bde.org> References: <20130603100618.GH67170@FreeBSD.org> <20130603161255.GM3047@kib.kiev.ua> <20130604113035.GV67170@glebius.int.ru> <20130604191152.GW3047@kib.kiev.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=e/de0tV/ c=1 sm=1 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=sLyT4IyXTxYA:10 a=a3KUWG958rLjXTh994YA:9 a=CjuIK1q_8ugA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 Cc: arch@freebsd.org, Gleb Smirnoff X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Jun 2013 00:12:42 -0000 On Tue, 4 Jun 2013, Konstantin Belousov wrote: > On Tue, Jun 04, 2013 at 03:30:35PM +0400, Gleb Smirnoff wrote: >> Updated patch. >> > I have no further comments. > > You might want to make the switch of double casts to DEVOLATILE() > in the other parts of vfs_aio.c as separate commit. DEVOLATILE() should only be committed to /dev/null. It masks API bugs. An ordinary cast is sufficiently ugly and doesn't break detection of the bugs by -Wcast-qual. If a variable is actually volatile, then casting away its volatile'ness breaks it. The breakage is larger than with casting away const. But I think that with aio, the bug is using the application API in the kernel. The buffer is volatile in userland but isn't really volatile in the kernel (no more than any buffer that may be written to by DMA. Others are mostly not declared volatile). uio has sort of the opposite problem. It is older than const and void, so it cannot use them. More fundamentally, it only has a single i/o pointer so the pointer cannot be const since it is used for input. But when writing, the source buffer may be const or even volatile. Its pointer cannot be assigned to the uio pointer without casting away qualifiers. Bruce From owner-freebsd-arch@FreeBSD.ORG Wed Jun 5 06:52:59 2013 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id DDDB09B9 for ; Wed, 5 Jun 2013 06:52:59 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.69.10]) by mx1.freebsd.org (Postfix) with ESMTP id EC56B1784 for ; Wed, 5 Jun 2013 06:52:58 +0000 (UTC) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.14.6/8.14.6) with ESMTP id r556quPf098384; Wed, 5 Jun 2013 10:52:56 +0400 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.14.6/8.14.6/Submit) id r556quhS098383; Wed, 5 Jun 2013 10:52:56 +0400 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Wed, 5 Jun 2013 10:52:56 +0400 From: Gleb Smirnoff To: Jilles Tjoelker Subject: Re: aio_mlock(2) system call Message-ID: <20130605065256.GZ67170@glebius.int.ru> References: <20130603100618.GH67170@FreeBSD.org> <20130603161255.GM3047@kib.kiev.ua> <20130604113035.GV67170@glebius.int.ru> <20130604212917.GA72412@stack.nl> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2nTeH+t2PBomgucg" Content-Disposition: inline In-Reply-To: <20130604212917.GA72412@stack.nl> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Konstantin Belousov , arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Jun 2013 06:52:59 -0000 --2nTeH+t2PBomgucg Content-Type: text/plain; charset=koi8-r Content-Disposition: inline Jilles, On Tue, Jun 04, 2013 at 11:29:17PM +0200, Jilles Tjoelker wrote: ... J> This should probably be in alphabetical order. ... J> Man pages should not use contractions. Fixed. J> > [snip] J> > Index: sys/sys/aio.h J> > =================================================================== J> > --- sys/sys/aio.h (revision 251369) J> > +++ sys/sys/aio.h (working copy) J> > @@ -38,6 +38,7 @@ J> > #ifdef _KERNEL J> > #define LIO_SYNC 0x3 J> > #endif J> > +#define LIO_MLOCK 0x4 J> J> Is it intended that the new constant is available to userland, such as J> for use in lio_listio(2)? Hmm, I didn't intended such usage and didn't test it. You are right, I'd better hide the constant. Updated patch attached. -- Totus tuus, Glebius. --2nTeH+t2PBomgucg Content-Type: text/x-diff; charset=koi8-r Content-Disposition: attachment; filename="aio_mlock.diff" Index: lib/libc/sys/Makefile.inc =================================================================== --- lib/libc/sys/Makefile.inc (revision 251369) +++ lib/libc/sys/Makefile.inc (working copy) @@ -85,6 +85,7 @@ MAN+= abort2.2 \ adjtime.2 \ aio_cancel.2 \ aio_error.2 \ + aio_mlock.2 \ aio_read.2 \ aio_return.2 \ aio_suspend.2 \ Index: lib/libc/sys/Symbol.map =================================================================== --- lib/libc/sys/Symbol.map (revision 251369) +++ lib/libc/sys/Symbol.map (working copy) @@ -379,6 +379,7 @@ FBSD_1.2 { FBSD_1.3 { accept4; + aio_mlock; bindat; cap_fcntls_get; cap_fcntls_limit; Index: lib/libc/sys/aio_mlock.2 =================================================================== --- lib/libc/sys/aio_mlock.2 (revision 0) +++ lib/libc/sys/aio_mlock.2 (working copy) @@ -0,0 +1,133 @@ +.\" Copyright (c) 2013 Gleb Smirnoff +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.Dd June 3, 2013 +.Dt AIO_MLOCK 2 +.Os +.Sh NAME +.Nm aio_mlock +.Nd asynchronous +.Xr mlock 2 +operation +.Sh LIBRARY +.Lb libc +.Sh SYNOPSIS +.In aio.h +.Ft int +.Fn aio_mlock "struct aiocb *iocb" +.Sh DESCRIPTION +The +.Fn aio_mlock +system call allows the calling process to lock into memory the +physical pages associated with the virtual address range starting at +.Fa iocb->aio_buf +for +.Fa iocb->aio_nbytes +bytes. +The call returns immediately after the locking request has +been enqueued; the operation may or may not have completed at the time +the call returns. +.Pp +The +.Fa iocb +pointer may be subsequently used as an argument to +.Fn aio_return +and +.Fn aio_error +in order to determine return or error status for the enqueued operation +while it is in progress. +.Pp +If the request could not be enqueued (generally due to +.Xr aio 4 +limits), +then the call returns without having enqueued the request. +.Sh RESTRICTIONS +The Asynchronous I/O Control Block structure pointed to by +.Fa iocb +and the buffer that the +.Fa iocb->aio_buf +member of that structure references must remain valid until the +operation has completed. +For this reason, use of auto (stack) variables +for these objects is discouraged. +.Pp +The asynchronous I/O control buffer +.Fa iocb +should be zeroed before the +.Fn aio_mlock +call to avoid passing bogus context information to the kernel. +.Pp +Modifications of the Asynchronous I/O Control Block structure or the +buffer contents after the request has been enqueued, but before the +request has completed, are not allowed. +.Sh RETURN VALUES +.Rv -std aio_mlock +.Sh ERRORS +The +.Fn aio_read +system call will fail if: +.Bl -tag -width Er +.It Bq Er EAGAIN +The request was not queued because of system resource limitations. +.It Bq Er ENOSYS +The +.Fn aio_mlock +system call is not supported. +.El +.Pp +If the request is successfully enqueued, but subsequently cancelled +or an error occurs, the value returned by the +.Fn aio_return +system call is per the +.Xr mlock 2 +system call, and the value returned by the +.Fn aio_error +system call is one of the error returns from the +.Xr mlock 2 +system call, or +.Er ECANCELED +if the request was explicitly cancelled via a call to +.Fn aio_cancel . +.Sh SEE ALSO +.Xr aio_cancel 2 , +.Xr aio_error 2 , +.Xr aio_return 2 , +.Xr aio 4 , +.Xr mlock 2 +.Sh PORTABILITY +The +.Fn aio_mlock +system call is a +.Fx +extension, and should not be used in portable code. +.Sh HISTORY +The +.Fn aio_mlock +system call first appeared in +.Fx 10.0 . +.Sh AUTHORS +The system call was introduced by +.An Gleb Smirnoff Aq glebius@FreeBSD.org . Property changes on: lib/libc/sys/aio_mlock.2 ___________________________________________________________________ Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Index: sys/compat/freebsd32/syscalls.master =================================================================== --- sys/compat/freebsd32/syscalls.master (revision 251369) +++ sys/compat/freebsd32/syscalls.master (working copy) @@ -1044,3 +1044,5 @@ __socklen_t * __restrict anamelen, \ int flags); } 542 AUE_PIPE NOPROTO { int pipe2(int *fildes, int flags); } +543 AUE_NULL NOSTD { int freebsd32_aio_mlock( \ + struct aiocb32 *aiocbp); } Index: sys/kern/syscalls.master =================================================================== --- sys/kern/syscalls.master (revision 251369) +++ sys/kern/syscalls.master (working copy) @@ -977,5 +977,6 @@ __socklen_t * __restrict anamelen, \ int flags); } 542 AUE_PIPE STD { int pipe2(int *fildes, int flags); } +543 AUE_NULL NOSTD { int aio_mlock(struct aiocb *aiocbp); } ; Please copy any additions and changes to the following compatability tables: ; sys/compat/freebsd32/syscalls.master Index: sys/kern/vfs_aio.c =================================================================== --- sys/kern/vfs_aio.c (revision 251369) +++ sys/kern/vfs_aio.c (working copy) @@ -338,7 +338,9 @@ static struct unrhdr *aiod_unr; void aio_init_aioinfo(struct proc *p); static int aio_onceonly(void); static int aio_free_entry(struct aiocblist *aiocbe); -static void aio_process(struct aiocblist *aiocbe); +static void aio_process_rw(struct aiocblist *aiocbe); +static void aio_process_sync(struct aiocblist *aiocbe); +static void aio_process_mlock(struct aiocblist *aiocbe); static int aio_newproc(int *); int aio_aqueue(struct thread *td, struct aiocb *job, struct aioliojob *lio, int type, struct aiocb_ops *ops); @@ -425,6 +427,7 @@ static struct syscall_helper_data aio_syscalls[] = SYSCALL_INIT_HELPER(aio_cancel), SYSCALL_INIT_HELPER(aio_error), SYSCALL_INIT_HELPER(aio_fsync), + SYSCALL_INIT_HELPER(aio_mlock), SYSCALL_INIT_HELPER(aio_read), SYSCALL_INIT_HELPER(aio_return), SYSCALL_INIT_HELPER(aio_suspend), @@ -452,6 +455,7 @@ static struct syscall_helper_data aio32_syscalls[] SYSCALL32_INIT_HELPER(freebsd32_aio_cancel), SYSCALL32_INIT_HELPER(freebsd32_aio_error), SYSCALL32_INIT_HELPER(freebsd32_aio_fsync), + SYSCALL32_INIT_HELPER(freebsd32_aio_mlock), SYSCALL32_INIT_HELPER(freebsd32_aio_read), SYSCALL32_INIT_HELPER(freebsd32_aio_write), SYSCALL32_INIT_HELPER(freebsd32_aio_waitcomplete), @@ -701,7 +705,8 @@ aio_free_entry(struct aiocblist *aiocbe) * at open time, but this is already true of file descriptors in * a multithreaded process. */ - fdrop(aiocbe->fd_file, curthread); + if (aiocbe->fd_file) + fdrop(aiocbe->fd_file, curthread); crfree(aiocbe->cred); uma_zfree(aiocb_zone, aiocbe); AIO_LOCK(ki); @@ -855,15 +860,15 @@ drop: } /* - * The AIO processing activity. This is the code that does the I/O request for - * the non-physio version of the operations. The normal vn operations are used, - * and this code should work in all instances for every type of file, including - * pipes, sockets, fifos, and regular files. + * The AIO processing activity for LIO_READ/LIO_WRITE. This is the code that + * does the I/O request for the non-physio version of the operations. The + * normal vn operations are used, and this code should work in all instances + * for every type of file, including pipes, sockets, fifos, and regular files. * * XXX I don't think it works well for socket, pipe, and fifo. */ static void -aio_process(struct aiocblist *aiocbe) +aio_process_rw(struct aiocblist *aiocbe) { struct ucred *td_savedcred; struct thread *td; @@ -877,23 +882,16 @@ static void int oublock_st, oublock_end; int inblock_st, inblock_end; + KASSERT(aiocbe->uaiocb.aio_lio_opcode == LIO_READ || + aiocbe->uaiocb.aio_lio_opcode == LIO_WRITE, + ("%s: opcode %d", __func__, aiocbe->uaiocb.aio_lio_opcode)); + td = curthread; td_savedcred = td->td_ucred; td->td_ucred = aiocbe->cred; cb = &aiocbe->uaiocb; fp = aiocbe->fd_file; - if (cb->aio_lio_opcode == LIO_SYNC) { - error = 0; - cnt = 0; - if (fp->f_vnode != NULL) - error = aio_fsync_vnode(td, fp->f_vnode); - cb->_aiocb_private.error = error; - cb->_aiocb_private.status = 0; - td->td_ucred = td_savedcred; - return; - } - aiov.iov_base = (void *)(uintptr_t)cb->aio_buf; aiov.iov_len = cb->aio_nbytes; @@ -954,6 +952,41 @@ static void } static void +aio_process_sync(struct aiocblist *aiocbe) +{ + struct thread *td = curthread; + struct ucred *td_savedcred = td->td_ucred; + struct aiocb *cb = &aiocbe->uaiocb; + struct file *fp = aiocbe->fd_file; + int error = 0; + + KASSERT(aiocbe->uaiocb.aio_lio_opcode == LIO_SYNC, + ("%s: opcode %d", __func__, aiocbe->uaiocb.aio_lio_opcode)); + + td->td_ucred = aiocbe->cred; + if (fp->f_vnode != NULL) + error = aio_fsync_vnode(td, fp->f_vnode); + cb->_aiocb_private.error = error; + cb->_aiocb_private.status = 0; + td->td_ucred = td_savedcred; +} + +static void +aio_process_mlock(struct aiocblist *aiocbe) +{ + struct aiocb *cb = &aiocbe->uaiocb; + int error; + + KASSERT(aiocbe->uaiocb.aio_lio_opcode == LIO_MLOCK, + ("%s: opcode %d", __func__, aiocbe->uaiocb.aio_lio_opcode)); + + error = vm_mlock(aiocbe->userproc, aiocbe->cred, + __DEVOLATILE(void *, cb->aio_buf), cb->aio_nbytes); + cb->_aiocb_private.error = error; + cb->_aiocb_private.status = 0; +} + +static void aio_bio_done_notify(struct proc *userp, struct aiocblist *aiocbe, int type) { struct aioliojob *lj; @@ -1024,7 +1057,7 @@ notification_done: } /* - * The AIO daemon, most of the actual work is done in aio_process, + * The AIO daemon, most of the actual work is done in aio_process_*, * but the setup (and address space mgmt) is done in this routine. */ static void @@ -1121,7 +1154,18 @@ aio_daemon(void *_id) ki = userp->p_aioinfo; /* Do the I/O function. */ - aio_process(aiocbe); + switch(aiocbe->uaiocb.aio_lio_opcode) { + case LIO_READ: + case LIO_WRITE: + aio_process_rw(aiocbe); + break; + case LIO_SYNC: + aio_process_sync(aiocbe); + break; + case LIO_MLOCK: + aio_process_mlock(aiocbe); + break; + } mtx_lock(&aio_job_mtx); /* Decrement the active job count. */ @@ -1261,7 +1305,7 @@ aio_qphysio(struct proc *p, struct aiocblist *aioc cb = &aiocbe->uaiocb; fp = aiocbe->fd_file; - if (fp->f_type != DTYPE_VNODE) + if (fp == NULL || fp->f_type != DTYPE_VNODE) return (-1); vp = fp->f_vnode; @@ -1613,6 +1657,9 @@ aio_aqueue(struct thread *td, struct aiocb *job, s case LIO_SYNC: error = fget(td, fd, CAP_FSYNC, &fp); break; + case LIO_MLOCK: + fp = NULL; + break; case LIO_NOP: error = fget(td, fd, CAP_NONE, &fp); break; @@ -1670,7 +1717,8 @@ aio_aqueue(struct thread *td, struct aiocb *job, s error = kqfd_register(kqfd, &kev, td, 1); aqueue_fail: if (error) { - fdrop(fp, td); + if (fp) + fdrop(fp, td); uma_zfree(aiocb_zone, aiocbe); ops->store_error(job, error); goto done; @@ -1687,7 +1735,7 @@ no_kqueue: if (opcode == LIO_SYNC) goto queueit; - if (fp->f_type == DTYPE_SOCKET) { + if (fp && fp->f_type == DTYPE_SOCKET) { /* * Alternate queueing for socket ops: Reach down into the * descriptor to get the socket data. Then check to see if the @@ -2165,6 +2213,13 @@ sys_aio_write(struct thread *td, struct aio_write_ return (aio_aqueue(td, uap->aiocbp, NULL, LIO_WRITE, &aiocb_ops)); } +int +sys_aio_mlock(struct thread *td, struct aio_mlock_args *uap) +{ + + return (aio_aqueue(td, uap->aiocbp, NULL, LIO_MLOCK, &aiocb_ops)); +} + static int kern_lio_listio(struct thread *td, int mode, struct aiocb * const *uacb_list, struct aiocb **acb_list, int nent, struct sigevent *sig, @@ -2907,6 +2962,14 @@ freebsd32_aio_write(struct thread *td, struct free } int +freebsd32_aio_mlock(struct thread *td, struct freebsd32_aio_mlock_args *uap) +{ + + return (aio_aqueue(td, (struct aiocb *)uap->aiocbp, NULL, LIO_MLOCK, + &aiocb32_ops)); +} + +int freebsd32_aio_waitcomplete(struct thread *td, struct freebsd32_aio_waitcomplete_args *uap) { Index: sys/sys/aio.h =================================================================== --- sys/sys/aio.h (revision 251369) +++ sys/sys/aio.h (working copy) @@ -37,6 +37,7 @@ #define LIO_READ 0x2 #ifdef _KERNEL #define LIO_SYNC 0x3 +#define LIO_MLOCK 0x4 #endif /* @@ -124,6 +125,11 @@ int aio_cancel(int, struct aiocb *); */ int aio_suspend(const struct aiocb * const[], int, const struct timespec *); +/* + * Asynchronous mlock + */ +int aio_mlock(struct aiocb *); + #ifdef __BSD_VISIBLE int aio_waitcomplete(struct aiocb **, struct timespec *); #endif Index: sys/vm/vm_extern.h =================================================================== --- sys/vm/vm_extern.h (revision 251369) +++ sys/vm/vm_extern.h (working copy) @@ -90,5 +90,6 @@ struct sf_buf *vm_imgact_map_page(vm_object_t obje void vm_imgact_unmap_page(struct sf_buf *sf); void vm_thread_dispose(struct thread *td); int vm_thread_new(struct thread *td, int pages); +int vm_mlock(struct proc *, struct ucred *, const void *, size_t); #endif /* _KERNEL */ #endif /* !_VM_EXTERN_H_ */ Index: sys/vm/vm_mmap.c =================================================================== --- sys/vm/vm_mmap.c (revision 251369) +++ sys/vm/vm_mmap.c (working copy) @@ -1036,18 +1036,24 @@ sys_mlock(td, uap) struct thread *td; struct mlock_args *uap; { - struct proc *proc; + + return (vm_mlock(td->td_proc, td->td_ucred, uap->addr, uap->len)); +} + +int +vm_mlock(struct proc *proc, struct ucred *cred, const void *addr0, size_t len) +{ vm_offset_t addr, end, last, start; vm_size_t npages, size; vm_map_t map; unsigned long nsize; int error; - error = priv_check(td, PRIV_VM_MLOCK); + error = priv_check_cred(cred, PRIV_VM_MLOCK, 0); if (error) return (error); - addr = (vm_offset_t)uap->addr; - size = uap->len; + addr = (vm_offset_t)addr0; + size = len; last = addr + size; start = trunc_page(addr); end = round_page(last); @@ -1056,7 +1062,6 @@ sys_mlock(td, uap) npages = atop(end - start); if (npages > vm_page_max_wired) return (ENOMEM); - proc = td->td_proc; map = &proc->p_vmspace->vm_map; PROC_LOCK(proc); nsize = ptoa(npages + pmap_wired_count(map->pmap)); --2nTeH+t2PBomgucg-- From owner-freebsd-arch@FreeBSD.ORG Wed Jun 5 09:52:01 2013 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id AC229C81; Wed, 5 Jun 2013 09:52:01 +0000 (UTC) (envelope-from davidxu@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 9F0701ECD; Wed, 5 Jun 2013 09:52:01 +0000 (UTC) Received: from xyf.my.dom (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r559q06g000845; Wed, 5 Jun 2013 09:52:00 GMT (envelope-from davidxu@freebsd.org) Message-ID: <51AF0A62.4040206@freebsd.org> Date: Wed, 05 Jun 2013 17:52:34 +0800 From: David Xu User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:17.0) Gecko/20130416 Thunderbird/17.0.5 MIME-Version: 1.0 To: John Baldwin Subject: Re: [PATCH] Allow atomic sets of non-overlapping CPU sets for a global cpuset References: <201305311216.56558.jhb@freebsd.org> In-Reply-To: <201305311216.56558.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Jun 2013 09:52:01 -0000 On 2013/06/01 00:16, John Baldwin wrote: > So there's an oddity with cpuset I've run into recently at work. Suppose I > have created a new cpuset and want to change the set of CPUs for that set (say > from a mask of just CPU 1 to a mask of just CPU 2). I can't do that > atomically. I have to first set the mask to contain both the old set (CPU 1) > and the new set (CPU 2) and then change it a second time to only contain the > new set (CPU 2). The reason is that cpuset_modify() runs cpuset_testupdate() > on the set it is about to modify, so when I try to change it in a single > operation the new mask doesn't overlap with the old mask and it fails with > EDEADLK. > > % cpuset -c -l 1 /bin/sh > $ cpuset -gi > pid -1 cpuset id: 2 > $ cpuset -g > pid -1 mask: 1 > $ cpuset -l 2 -s 2 > cpuset: setaffinity: Resource deadlock avoided > > I think that the correct logic here is that we should only check descendants > of the set we are changing, but not the set we are about to change. The patch > does this and allows my test case above to work: The patch looks fine to me. From owner-freebsd-arch@FreeBSD.ORG Wed Jun 5 15:13:46 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 80171496 for ; Wed, 5 Jun 2013 15:13:46 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) by mx1.freebsd.org (Postfix) with ESMTP id 5EB161E90 for ; Wed, 5 Jun 2013 15:13:46 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id AA374B984 for ; Wed, 5 Jun 2013 11:13:45 -0400 (EDT) From: John Baldwin To: freebsd-arch@freebsd.org Subject: Re: [PATCH] Allow atomic sets of non-overlapping CPU sets for a global cpuset Date: Wed, 5 Jun 2013 11:13:33 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p25; KDE/4.5.5; amd64; ; ) References: <201305311216.56558.jhb@freebsd.org> In-Reply-To: <201305311216.56558.jhb@freebsd.org> MIME-Version: 1.0 Message-Id: <201306051113.33907.jhb@freebsd.org> Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 05 Jun 2013 11:13:45 -0400 (EDT) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Jun 2013 15:13:46 -0000 On Friday, May 31, 2013 12:16:56 pm John Baldwin wrote: > So there's an oddity with cpuset I've run into recently at work. Suppose I > have created a new cpuset and want to change the set of CPUs for that set (say > from a mask of just CPU 1 to a mask of just CPU 2). I can't do that > atomically. I have to first set the mask to contain both the old set (CPU 1) > and the new set (CPU 2) and then change it a second time to only contain the > new set (CPU 2). The reason is that cpuset_modify() runs cpuset_testupdate() > on the set it is about to modify, so when I try to change it in a single > operation the new mask doesn't overlap with the old mask and it fails with > EDEADLK. > > % cpuset -c -l 1 /bin/sh > $ cpuset -gi > pid -1 cpuset id: 2 > $ cpuset -g > pid -1 mask: 1 > $ cpuset -l 2 -s 2 > cpuset: setaffinity: Resource deadlock avoided Also note that non-overlapping masks work fine if you change the "local" mask of a process: % cpuset -l 1 /bin/sh $ cpuset -g pid -1 mask: 1 $ cpuset -l 2 -p $$ $ cpuset -g pid -1 mask: 2 -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Wed Jun 5 21:50:47 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 74F8D423 for ; Wed, 5 Jun 2013 21:50:47 +0000 (UTC) (envelope-from nparhar@gmail.com) Received: from mail-pb0-x22c.google.com (mail-pb0-x22c.google.com [IPv6:2607:f8b0:400e:c01::22c]) by mx1.freebsd.org (Postfix) with ESMTP id 548EA1614 for ; Wed, 5 Jun 2013 21:50:47 +0000 (UTC) Received: by mail-pb0-f44.google.com with SMTP id wz12so2376835pbc.3 for ; Wed, 05 Jun 2013 14:50:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; bh=cvhLmgidvY839o00IWLXQY5lHX+/Kl6Y7Bz1ZtrKcTM=; b=x/rRrMWnavrQf7Z+o1tAB5Gy7LYDh43J2L3XWfacMPZ01/xLpXHF5Spms30MrPxotq zR1++f5Ps1BBbDdzm3EUZ+QYBwf2NPwAWb+Ko6A+INKUYkO/To9nt3Zh7/dlFnby3dSR zD9+ArNBDlYHI8KZvveVn6qmFun+gT2x4TxKMOZWqjkTS1vFCeURYVRQoPIrYlrHoh3d w7Fktf9KXgl0A0txlggvkVW8MF/uFGS/bJs994qbhKr3x549uU50zj/EMCrfwt4EZGQx 6qqt8jIZLDfdEm3gnyScM9Mvir4Wy7awPvew4DAghbX64rVOAp3VuhG9nLTvFyANq3x1 1x0g== X-Received: by 10.66.122.130 with SMTP id ls2mr35612522pab.128.1370469047155; Wed, 05 Jun 2013 14:50:47 -0700 (PDT) Received: from [10.192.166.0] (stargate.chelsio.com. [67.207.112.58]) by mx.google.com with ESMTPSA id qh4sm74367820pac.8.2013.06.05.14.50.45 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 05 Jun 2013 14:50:46 -0700 (PDT) Sender: Navdeep Parhar Message-ID: <51AFB2B3.5050105@FreeBSD.org> Date: Wed, 05 Jun 2013 14:50:43 -0700 From: Navdeep Parhar User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130522 Thunderbird/17.0.6 MIME-Version: 1.0 To: freebsd-arch@freebsd.org Subject: missing DTrace FBT return probes Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Jun 2013 21:50:47 -0000 A large number of kernel functions have an FBT entry probe but no return probe. I believe this is due to tail call optimization by the compiler. Should we disable this optimization for kernel configs that have DTrace support? The missing return probes make it very difficult to write DTrace scripts that want to set flags etc. at function entry and then clean them up on return. A quick sample from a recent HEAD shows ~4000 out of ~27000 functions are missing return probes. See the list of functions in these files (the ones listed in entry-only.txt do not have return probes). http://people.freebsd.org/~np/entry-only.txt http://people.freebsd.org/~np/entry.txt http://people.freebsd.org/~np/return.txt Regards, Navdeep dtrace -ln fbt:::entry | sed -e 's/.* *\(.*\) entry$/\1/g' | sort > entry.txt dtrace -ln fbt:::return | sed -e 's/.* *\(.*\) return$/\1/g' | sort > return.txt comm -23 entry.txt return.txt > entry-only.txt From owner-freebsd-arch@FreeBSD.ORG Wed Jun 5 22:30:08 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 95F19B43; Wed, 5 Jun 2013 22:30:08 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: from mail-ob0-x22c.google.com (mail-ob0-x22c.google.com [IPv6:2607:f8b0:4003:c01::22c]) by mx1.freebsd.org (Postfix) with ESMTP id 59BE317E8; Wed, 5 Jun 2013 22:30:08 +0000 (UTC) Received: by mail-ob0-f172.google.com with SMTP id wo10so3599411obc.17 for ; Wed, 05 Jun 2013 15:30:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=bHOLE2XP1WgUI1hUFjt2jOCU/viogJ1raRYhaOlV3BM=; b=Ex3Gr6gBxZi3gsx3VDRBzWXiNUTctB9gtR1XbibM7mmITXrUrcGq3ToGT/jSjmTZEM h3k0t+QIkzSk19RHCkJPIk2ODR9w/A+cqutdaj7dW4vnY/1CR0yBUKPXIbQ/W6WCNT6Y 7q1ddL+ad0JBnJ0rALmAY7jwuDxNklURgeaL2flcU+G1x4OvySS4+Afma/4kB99VKswk dPAcVxS1pTkvAGDg6CLT/dbOQtYrZIMo8gm6arnB0NsvNGt45xoRSQF0a9wYgjkPsLIF YqAeKUkwG1QIAqFoWoGqePSTxBYCfCHRVZ3htL8OBAVWfRW+3l1ejW5UudzNDBDO2Tvt lM9Q== MIME-Version: 1.0 X-Received: by 10.60.118.1 with SMTP id ki1mr7709713oeb.44.1370471407952; Wed, 05 Jun 2013 15:30:07 -0700 (PDT) Received: by 10.76.91.163 with HTTP; Wed, 5 Jun 2013 15:30:07 -0700 (PDT) In-Reply-To: <51AFB2B3.5050105@FreeBSD.org> References: <51AFB2B3.5050105@FreeBSD.org> Date: Wed, 5 Jun 2013 18:30:07 -0400 Message-ID: Subject: Re: missing DTrace FBT return probes From: Ryan Stone To: Navdeep Parhar Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Jun 2013 22:30:08 -0000 On Wed, Jun 5, 2013 at 5:50 PM, Navdeep Parhar wrote: > A large number of kernel functions have an FBT entry probe but no return > probe. I believe this is due to tail call optimization by the compiler. > Should we disable this optimization for kernel configs that have DTrace > support? The missing return probes make it very difficult to write > DTrace scripts that want to set flags etc. at function entry and then > clean them up on return. > > A quick sample from a recent HEAD shows ~4000 out of ~27000 functions > are missing return probes. See the list of functions in these files > (the ones listed in entry-only.txt do not have return probes). > > http://people.freebsd.org/~np/entry-only.txt > http://people.freebsd.org/~np/entry.txt > http://people.freebsd.org/~np/return.txt > > Regards, > Navdeep > > > dtrace -ln fbt:::entry | sed -e 's/.* *\(.*\) entry$/\1/g' | sort > > entry.txt > dtrace -ln fbt:::return | sed -e 's/.* *\(.*\) return$/\1/g' | sort > > return.txt > comm -23 entry.txt return.txt > entry-only.txt > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > I would be in favour of turning this on unconditionally, along with -fno-inline-functions-called-once and -fno-omit-frame-pointer. All of the optimizations are of dubious value and significantly impact debugging tools like dtrace and pmc. From owner-freebsd-arch@FreeBSD.ORG Wed Jun 5 22:52:58 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 9F32AF79; Wed, 5 Jun 2013 22:52:58 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (relay02.stack.nl [IPv6:2001:610:1108:5010::104]) by mx1.freebsd.org (Postfix) with ESMTP id 69412188D; Wed, 5 Jun 2013 22:52:58 +0000 (UTC) Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id C7889359320; Thu, 6 Jun 2013 00:52:56 +0200 (CEST) Received: by snail.stack.nl (Postfix, from userid 1677) id A5EF628493; Thu, 6 Jun 2013 00:52:56 +0200 (CEST) Date: Thu, 6 Jun 2013 00:52:56 +0200 From: Jilles Tjoelker To: Navdeep Parhar Subject: Re: missing DTrace FBT return probes Message-ID: <20130605225256.GA88585@stack.nl> References: <51AFB2B3.5050105@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51AFB2B3.5050105@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Jun 2013 22:52:58 -0000 On Wed, Jun 05, 2013 at 02:50:43PM -0700, Navdeep Parhar wrote: > A large number of kernel functions have an FBT entry probe but no return > probe. I believe this is due to tail call optimization by the compiler. > Should we disable this optimization for kernel configs that have DTrace > support? The missing return probes make it very difficult to write > DTrace scripts that want to set flags etc. at function entry and then > clean them up on return. > A quick sample from a recent HEAD shows ~4000 out of ~27000 functions > are missing return probes. See the list of functions in these files > (the ones listed in entry-only.txt do not have return probes). > http://people.freebsd.org/~np/entry-only.txt > http://people.freebsd.org/~np/entry.txt > http://people.freebsd.org/~np/return.txt This list is so long that the impact on kernel stack consumption may be large. Disabling the optimization might cause stack overflows. So it would be best to leave the optimization either on or off for a particular platform. Perhaps the return probe could be inserted before the tail call. This is still wrong but at least entry and return are matched this way. -- Jilles Tjoelker From owner-freebsd-arch@FreeBSD.ORG Thu Jun 6 09:47:30 2013 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 5541566D; Thu, 6 Jun 2013 09:47:30 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail17.syd.optusnet.com.au (mail17.syd.optusnet.com.au [211.29.132.198]) by mx1.freebsd.org (Postfix) with ESMTP id A41A01450; Thu, 6 Jun 2013 09:47:29 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail17.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r569lIrv014403 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 6 Jun 2013 19:47:19 +1000 Date: Thu, 6 Jun 2013 19:47:18 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Ryan Stone Subject: Re: missing DTrace FBT return probes In-Reply-To: Message-ID: <20130606191306.P2408@besplex.bde.org> References: <51AFB2B3.5050105@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=eqSHVfVX c=1 sm=1 a=X5W8Q6W-oCAA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=zqPWLHequagA:10 a=6I5d2MoRAAAA:8 a=jf9xk2ctTmAQp9jb1XEA:9 a=CjuIK1q_8ugA:10 a=SV7veod9ZcQA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 Cc: Navdeep Parhar , freebsd-arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Jun 2013 09:47:30 -0000 On Wed, 5 Jun 2013, Ryan Stone wrote: > On Wed, Jun 5, 2013 at 5:50 PM, Navdeep Parhar wrote: > >> A large number of kernel functions have an FBT entry probe but no return >> probe. I believe this is due to tail call optimization by the compiler. >> Should we disable this optimization for kernel configs that have DTrace >> support? The missing return probes make it very difficult to write >> DTrace scripts that want to set flags etc. at function entry and then >> clean them up on return. >> ... > > I would be in favour of turning this on unconditionally, along with > -fno-inline-functions-called-once and -fno-omit-frame-pointer. Also -O2. But -fno-inline-functions-called-once isn't even supported by clang, and -O for clang is more like -O3 for gcc (it does excessive inlining of even more than functions called once). -fno-omit-frame-pointer is the default for gcc by apparently not for clang. > All of the > optimizations are of dubious value and significantly impact debugging tools > like dtrace and pmc. Also stack traces in panics and debuggers, debuggers generally (they can rarely find variables in inline functions, or even step over an inline function like a non-inline function), and profiling. Bruce