Date: Mon, 3 Jun 2013 14:06:18 +0400 From: Gleb Smirnoff <glebius@FreeBSD.org> To: arch@FreeBSD.org Subject: aio_mlock(2) system call Message-ID: <20130603100618.GH67170@FreeBSD.org>
next in thread | raw e-mail | index | archive | help
--KR/qxknboQ7+Tpez
Content-Type: text/plain; charset=koi8-r
Content-Disposition: inline
Hello!
This patch brings a new system call - aio_mlock(2). The idea is
quite clear from its name: it performs mlock(2), which can take
a long time if pages aren't resident, under aio(4) control.
The patch is quite simple, and non-desctructive. Here it is
for your review.
If no one objects, I'd like to add it to FreeBSD 10.
--
Totus tuus, Glebius.
--KR/qxknboQ7+Tpez
Content-Type: text/x-diff; charset=koi8-r
Content-Disposition: attachment; filename="aio_mlock.diff"
Index: lib/libc/sys/Makefile.inc
===================================================================
--- lib/libc/sys/Makefile.inc (revision 251294)
+++ lib/libc/sys/Makefile.inc (working copy)
@@ -85,6 +85,7 @@ MAN+= abort2.2 \
adjtime.2 \
aio_cancel.2 \
aio_error.2 \
+ aio_mlock.2 \
aio_read.2 \
aio_return.2 \
aio_suspend.2 \
Index: lib/libc/sys/Symbol.map
===================================================================
--- lib/libc/sys/Symbol.map (revision 251294)
+++ lib/libc/sys/Symbol.map (working copy)
@@ -378,6 +378,7 @@ FBSD_1.2 {
};
FBSD_1.3 {
+ aio_mlock;
accept4;
bindat;
cap_fcntls_get;
Index: lib/libc/sys/aio_mlock.2
===================================================================
--- lib/libc/sys/aio_mlock.2 (revision 0)
+++ lib/libc/sys/aio_mlock.2 (working copy)
@@ -0,0 +1,133 @@
+.\" Copyright (c) 2013 Gleb Smirnoff <glebius@FreeBSD.org>
+.\" All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\" notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\" notice, this list of conditions and the following disclaimer in the
+.\" documentation and/or other materials provided with the distribution.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.\" $FreeBSD$
+.\"
+.Dd June 3, 2013
+.Dt AIO_MLOCK 2
+.Os
+.Sh NAME
+.Nm aio_mlock
+.Nd asynchronous
+.Xr mlock 2
+operation
+.Sh LIBRARY
+.Lb libc
+.Sh SYNOPSIS
+.In aio.h
+.Ft int
+.Fn aio_mlock "struct aiocb *iocb"
+.Sh DESCRIPTION
+The
+.Fn aio_mlock
+system call allows the calling process to lock into memory the
+physical pages associated with the virtual address range starting at
+.Fa iocb->aio_buf
+for
+.Fa iocb->aio_nbytes
+bytes.
+The call returns immediately after the locking request has
+been enqueued; the operation may or may not have completed at the time
+the call returns.
+.Pp
+The
+.Fa iocb
+pointer may be subsequently used as an argument to
+.Fn aio_return
+and
+.Fn aio_error
+in order to determine return or error status for the enqueued operation
+while it is in progress.
+.Pp
+If the request could not be enqueued (generally due to
+.Xr aio 4
+limits),
+then the call returns without having enqueued the request.
+.Sh RESTRICTIONS
+The Asynchronous I/O Control Block structure pointed to by
+.Fa iocb
+and the buffer that the
+.Fa iocb->aio_buf
+member of that structure references must remain valid until the
+operation has completed.
+For this reason, use of auto (stack) variables
+for these objects is discouraged.
+.Pp
+The asynchronous I/O control buffer
+.Fa iocb
+should be zeroed before the
+.Fn aio_mlock
+call to avoid passing bogus context information to the kernel.
+.Pp
+Modifications of the Asynchronous I/O Control Block structure or the
+buffer contents after the request has been enqueued, but before the
+request has completed, are not allowed.
+.Sh RETURN VALUES
+.Rv -std aio_mlock
+.Sh ERRORS
+The
+.Fn aio_read
+system call will fail if:
+.Bl -tag -width Er
+.It Bq Er EAGAIN
+The request was not queued because of system resource limitations.
+.It Bq Er ENOSYS
+The
+.Fn aio_mlock
+system call is not supported.
+.El
+.Pp
+If the request is successfully enqueued, but subsequently cancelled
+or an error occurs, the value returned by the
+.Fn aio_return
+system call is per the
+.Xr mlock 2
+system call, and the value returned by the
+.Fn aio_error
+system call is one of the error returns from the
+.Xr mlock 2
+system call, or
+.Er ECANCELED
+if the request was explicitly cancelled via a call to
+.Fn aio_cancel .
+.Sh SEE ALSO
+.Xr aio_cancel 2 ,
+.Xr aio_error 2 ,
+.Xr aio_return 2 ,
+.Xr aio 4 ,
+.Xr mlock 2
+.Sh PORTABILITY
+The
+.Fn aio_mlock
+system call is a
+.Fx
+extension, and shouldn't be used in portable code.
+.Sh HISTORY
+The
+.Fn aio_mlock
+system call first appeared in
+.Fx 10.0 .
+.Sh AUTHORS
+The system call was introduced by
+.An Gleb Smirnoff Aq glebius@FreeBSD.org .
Property changes on: lib/libc/sys/aio_mlock.2
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Index: sys/compat/freebsd32/syscalls.master
===================================================================
--- sys/compat/freebsd32/syscalls.master (revision 251294)
+++ sys/compat/freebsd32/syscalls.master (working copy)
@@ -476,7 +476,8 @@
257 AUE_NULL NOSTD { int freebsd32_lio_listio(int mode, \
struct aiocb32 * const *acb_list, \
int nent, struct sigevent *sig); }
-258 AUE_NULL UNIMPL nosys
+258 AUE_NULL NOSTD { int freebsd32_aio_mlock( \
+ struct aiocb32 *aiocbp); }
259 AUE_NULL UNIMPL nosys
260 AUE_NULL UNIMPL nosys
261 AUE_NULL UNIMPL nosys
Index: sys/kern/syscalls.master
===================================================================
--- sys/kern/syscalls.master (revision 251294)
+++ sys/kern/syscalls.master (working copy)
@@ -480,7 +480,7 @@
257 AUE_NULL NOSTD { int lio_listio(int mode, \
struct aiocb * const *acb_list, \
int nent, struct sigevent *sig); }
-258 AUE_NULL UNIMPL nosys
+258 AUE_NULL NOSTD { int aio_mlock(struct aiocb *aiocbp); }
259 AUE_NULL UNIMPL nosys
260 AUE_NULL UNIMPL nosys
261 AUE_NULL UNIMPL nosys
Index: sys/kern/vfs_aio.c
===================================================================
--- sys/kern/vfs_aio.c (revision 251294)
+++ sys/kern/vfs_aio.c (working copy)
@@ -339,6 +339,8 @@ void aio_init_aioinfo(struct proc *p);
static int aio_onceonly(void);
static int aio_free_entry(struct aiocblist *aiocbe);
static void aio_process(struct aiocblist *aiocbe);
+static void aio_process_sync(struct aiocblist *aiocbe);
+static void aio_process_mlock(struct aiocblist *aiocbe);
static int aio_newproc(int *);
int aio_aqueue(struct thread *td, struct aiocb *job,
struct aioliojob *lio, int type, struct aiocb_ops *ops);
@@ -425,6 +427,7 @@ static struct syscall_helper_data aio_syscalls[] =
SYSCALL_INIT_HELPER(aio_cancel),
SYSCALL_INIT_HELPER(aio_error),
SYSCALL_INIT_HELPER(aio_fsync),
+ SYSCALL_INIT_HELPER(aio_mlock),
SYSCALL_INIT_HELPER(aio_read),
SYSCALL_INIT_HELPER(aio_return),
SYSCALL_INIT_HELPER(aio_suspend),
@@ -452,6 +455,7 @@ static struct syscall_helper_data aio32_syscalls[]
SYSCALL32_INIT_HELPER(freebsd32_aio_cancel),
SYSCALL32_INIT_HELPER(freebsd32_aio_error),
SYSCALL32_INIT_HELPER(freebsd32_aio_fsync),
+ SYSCALL32_INIT_HELPER(freebsd32_aio_mlock),
SYSCALL32_INIT_HELPER(freebsd32_aio_read),
SYSCALL32_INIT_HELPER(freebsd32_aio_write),
SYSCALL32_INIT_HELPER(freebsd32_aio_waitcomplete),
@@ -701,7 +705,8 @@ aio_free_entry(struct aiocblist *aiocbe)
* at open time, but this is already true of file descriptors in
* a multithreaded process.
*/
- fdrop(aiocbe->fd_file, curthread);
+ if (aiocbe->fd_file)
+ fdrop(aiocbe->fd_file, curthread);
crfree(aiocbe->cred);
uma_zfree(aiocb_zone, aiocbe);
AIO_LOCK(ki);
@@ -855,10 +860,10 @@ drop:
}
/*
- * The AIO processing activity. This is the code that does the I/O request for
- * the non-physio version of the operations. The normal vn operations are used,
- * and this code should work in all instances for every type of file, including
- * pipes, sockets, fifos, and regular files.
+ * The AIO processing activity for LIO_READ/LIO_WRITE. This is the code that
+ * does the I/O request for the non-physio version of the operations. The
+ * normal vn operations are used, and this code should work in all instances
+ * for every type of file, including pipes, sockets, fifos, and regular files.
*
* XXX I don't think it works well for socket, pipe, and fifo.
*/
@@ -883,17 +888,6 @@ aio_process(struct aiocblist *aiocbe)
cb = &aiocbe->uaiocb;
fp = aiocbe->fd_file;
- if (cb->aio_lio_opcode == LIO_SYNC) {
- error = 0;
- cnt = 0;
- if (fp->f_vnode != NULL)
- error = aio_fsync_vnode(td, fp->f_vnode);
- cb->_aiocb_private.error = error;
- cb->_aiocb_private.status = 0;
- td->td_ucred = td_savedcred;
- return;
- }
-
aiov.iov_base = (void *)(uintptr_t)cb->aio_buf;
aiov.iov_len = cb->aio_nbytes;
@@ -954,6 +948,35 @@ aio_process(struct aiocblist *aiocbe)
}
static void
+aio_process_sync(struct aiocblist *aiocbe)
+{
+ struct thread *td = curthread;
+ struct ucred *td_savedcred = td->td_ucred;
+ struct aiocb *cb = &aiocbe->uaiocb;
+ struct file *fp = aiocbe->fd_file;
+ int error = 0;
+
+ td->td_ucred = aiocbe->cred;
+ if (fp->f_vnode != NULL)
+ error = aio_fsync_vnode(td, fp->f_vnode);
+ cb->_aiocb_private.error = error;
+ cb->_aiocb_private.status = 0;
+ td->td_ucred = td_savedcred;
+}
+
+static void
+aio_process_mlock(struct aiocblist *aiocbe)
+{
+ struct aiocb *cb = &aiocbe->uaiocb;
+ int error;
+
+ error = vm_mlock(aiocbe->userproc, aiocbe->cred,
+ (void *)(uintptr_t)cb->aio_buf, cb->aio_nbytes);
+ cb->_aiocb_private.error = error;
+ cb->_aiocb_private.status = 0;
+}
+
+static void
aio_bio_done_notify(struct proc *userp, struct aiocblist *aiocbe, int type)
{
struct aioliojob *lj;
@@ -1121,7 +1144,18 @@ aio_daemon(void *_id)
ki = userp->p_aioinfo;
/* Do the I/O function. */
- aio_process(aiocbe);
+ switch(aiocbe->uaiocb.aio_lio_opcode) {
+ case LIO_READ:
+ case LIO_WRITE:
+ aio_process(aiocbe);
+ break;
+ case LIO_SYNC:
+ aio_process_sync(aiocbe);
+ break;
+ case LIO_MLOCK:
+ aio_process_mlock(aiocbe);
+ break;
+ }
mtx_lock(&aio_job_mtx);
/* Decrement the active job count. */
@@ -1261,7 +1295,7 @@ aio_qphysio(struct proc *p, struct aiocblist *aioc
cb = &aiocbe->uaiocb;
fp = aiocbe->fd_file;
- if (fp->f_type != DTYPE_VNODE)
+ if (fp == NULL || fp->f_type != DTYPE_VNODE)
return (-1);
vp = fp->f_vnode;
@@ -1613,6 +1647,9 @@ aio_aqueue(struct thread *td, struct aiocb *job, s
case LIO_SYNC:
error = fget(td, fd, CAP_FSYNC, &fp);
break;
+ case LIO_MLOCK:
+ fp = NULL;
+ break;
case LIO_NOP:
error = fget(td, fd, CAP_NONE, &fp);
break;
@@ -1670,7 +1707,8 @@ aio_aqueue(struct thread *td, struct aiocb *job, s
error = kqfd_register(kqfd, &kev, td, 1);
aqueue_fail:
if (error) {
- fdrop(fp, td);
+ if (fp)
+ fdrop(fp, td);
uma_zfree(aiocb_zone, aiocbe);
ops->store_error(job, error);
goto done;
@@ -1687,7 +1725,7 @@ no_kqueue:
if (opcode == LIO_SYNC)
goto queueit;
- if (fp->f_type == DTYPE_SOCKET) {
+ if (fp && fp->f_type == DTYPE_SOCKET) {
/*
* Alternate queueing for socket ops: Reach down into the
* descriptor to get the socket data. Then check to see if the
@@ -2165,6 +2203,13 @@ sys_aio_write(struct thread *td, struct aio_write_
return (aio_aqueue(td, uap->aiocbp, NULL, LIO_WRITE, &aiocb_ops));
}
+int
+sys_aio_mlock(struct thread *td, struct aio_mlock_args *uap)
+{
+
+ return (aio_aqueue(td, uap->aiocbp, NULL, LIO_MLOCK, &aiocb_ops));
+}
+
static int
kern_lio_listio(struct thread *td, int mode, struct aiocb * const *uacb_list,
struct aiocb **acb_list, int nent, struct sigevent *sig,
@@ -2907,6 +2952,14 @@ freebsd32_aio_write(struct thread *td, struct free
}
int
+freebsd32_aio_mlock(struct thread *td, struct freebsd32_aio_mlock_args *uap)
+{
+
+ return (aio_aqueue(td, (struct aiocb *)uap->aiocbp, NULL, LIO_MLOCK,
+ &aiocb32_ops));
+}
+
+int
freebsd32_aio_waitcomplete(struct thread *td,
struct freebsd32_aio_waitcomplete_args *uap)
{
Index: sys/sys/aio.h
===================================================================
--- sys/sys/aio.h (revision 251294)
+++ sys/sys/aio.h (working copy)
@@ -38,6 +38,7 @@
#ifdef _KERNEL
#define LIO_SYNC 0x3
#endif
+#define LIO_MLOCK 0x4
/*
* LIO modes
@@ -124,6 +125,11 @@ int aio_cancel(int, struct aiocb *);
*/
int aio_suspend(const struct aiocb * const[], int, const struct timespec *);
+/*
+ * Asynchronous mlock
+ */
+int aio_mlock(struct aiocb *);
+
#ifdef __BSD_VISIBLE
int aio_waitcomplete(struct aiocb **, struct timespec *);
#endif
Index: sys/vm/vm_extern.h
===================================================================
--- sys/vm/vm_extern.h (revision 251294)
+++ sys/vm/vm_extern.h (working copy)
@@ -90,5 +90,6 @@ struct sf_buf *vm_imgact_map_page(vm_object_t obje
void vm_imgact_unmap_page(struct sf_buf *sf);
void vm_thread_dispose(struct thread *td);
int vm_thread_new(struct thread *td, int pages);
+int vm_mlock(struct proc *, struct ucred *, const void *, size_t);
#endif /* _KERNEL */
#endif /* !_VM_EXTERN_H_ */
Index: sys/vm/vm_mmap.c
===================================================================
--- sys/vm/vm_mmap.c (revision 251294)
+++ sys/vm/vm_mmap.c (working copy)
@@ -1036,18 +1036,24 @@ sys_mlock(td, uap)
struct thread *td;
struct mlock_args *uap;
{
- struct proc *proc;
+
+ return (vm_mlock(td->td_proc, td->td_ucred, uap->addr, uap->len));
+}
+
+int
+vm_mlock(struct proc *proc, struct ucred *cred, const void *addr0, size_t len)
+{
vm_offset_t addr, end, last, start;
vm_size_t npages, size;
vm_map_t map;
unsigned long nsize;
int error;
- error = priv_check(td, PRIV_VM_MLOCK);
+ error = priv_check_cred(cred, PRIV_VM_MLOCK, 0);
if (error)
return (error);
- addr = (vm_offset_t)uap->addr;
- size = uap->len;
+ addr = (vm_offset_t)addr0;
+ size = len;
last = addr + size;
start = trunc_page(addr);
end = round_page(last);
@@ -1056,7 +1062,6 @@ sys_mlock(td, uap)
npages = atop(end - start);
if (npages > vm_page_max_wired)
return (ENOMEM);
- proc = td->td_proc;
map = &proc->p_vmspace->vm_map;
PROC_LOCK(proc);
nsize = ptoa(npages + pmap_wired_count(map->pmap));
--KR/qxknboQ7+Tpez--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130603100618.GH67170>
