Date: Thu, 7 Jan 2016 15:29:23 +0200 From: Boris Astardzhiev <boris.astardzhiev@gmail.com> To: Luigi Rizzo <rizzo@iet.unipi.it> Cc: Mark Delany <c2h@romeo.emu.st>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org> Subject: Re: Does FreeBSD have sendmmsg or recvmmsg system calls? Message-ID: <CAP=KkTzFUDsZwDDLD3n97xJW0qLVZMPduZGSX%2BeXC3UuLpVjMg@mail.gmail.com> In-Reply-To: <CA%2BhQ2%2Bh4NNz9tgSpjJdv7fXteq5tAR7o3LvjV=u08NHjRLPwmA@mail.gmail.com> References: <alpine.BSF.2.20.1601031833130.84701@localhost.my.domain> <1451841004.10139.34.camel@me.com> <alpine.BSF.2.20.1601031744040.20884@fledge.watson.org> <CAJ-Vmomxcn%2BiYJAzNViL8WnepsCihrkTuHd8=0O6vONKsTExCA@mail.gmail.com> <20160103214720.72014.qmail@f5-external.bushwire.net> <20160104085415.GS3625@kib.kiev.ua> <20160104091108.50654.qmail@f5-external.bushwire.net> <20160104093859.GV3625@kib.kiev.ua> <20160104101747.58347.qmail@f5-external.bushwire.net> <20160104194044.GD3625@kib.kiev.ua> <20160104210741.32812.qmail@f5-external.bushwire.net> <CAP=KkTwfpjec2Tgnm4PRR3u8t4GEqN9Febm5HRcqapifBG-B6g@mail.gmail.com> <CA%2BhQ2%2Bh4NNz9tgSpjJdv7fXteq5tAR7o3LvjV=u08NHjRLPwmA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
First of all thanks for the insightful code review. >>On Thu, Jan 07, 2016 at 12:28:32PM +0200, Boris Astardzhiev wrote: >>See inline comments and final notes at the end. >> >> diff --git a/lib/libc/include/libc_private.h b/lib/libc/include/libc_private.h >> index 5caf9a3..7e2a902 100644 >> --- a/lib/libc/include/libc_private.h >> +++ b/lib/libc/include/libc_private.h >> @@ -200,8 +200,10 @@ enum { >> INTERPOS_pselect, >> INTERPOS_recvfrom, >> INTERPOS_recvmsg, >> + INTERPOS_recvmmsg, >> INTERPOS_select, >> INTERPOS_sendmsg, >> + INTERPOS_sendmmsg, >> INTERPOS_sendto, >> INTERPOS_setcontext, >> INTERPOS_sigaction, >The interposing table must be extended at the end, and not in the middle. >otherwise you introduce too large inconsistence with older libthr. > >That said, you changed libc table, but did not updated libthr. The result >is random segfaults in the multithreaded processes. Thanks for this catch. I'll fix it. >> diff --git a/lib/libc/sys/Symbol.map b/lib/libc/sys/Symbol.map >> index 7b3257c..6cc3c6e 100644 >> --- a/lib/libc/sys/Symbol.map >> +++ b/lib/libc/sys/Symbol.map >> @@ -220,6 +220,7 @@ FBSD_1.0 { >> reboot; >> recvfrom; >> recvmsg; >> + recvmmsg; >> rename; >> revoke; >> rfork; >> @@ -240,6 +241,7 @@ FBSD_1.0 { >> semsys; >> sendfile; >> sendmsg; >> + sendmmsg; >> sendto; >> setaudit; >> setaudit_addr; >The versioning is wrong, new non-private symbols for 11.0 go into _1.4. > >> @@ -851,6 +853,8 @@ FBSDprivate_1.0 { >> __sys_recvfrom; >> _recvmsg; >> __sys_recvmsg; >> + _recvmmsg; >> + __sys_recvmmsg; >> _rename; >> __sys_rename; >> _revoke; >> @@ -891,6 +895,8 @@ FBSDprivate_1.0 { >> __sys_sendfile; >> _sendmsg; >> __sys_sendmsg; >> + _sendmmsg; >> + __sys_sendmmsg; >> _sendto; >> __sys_sendto; >> _setaudit; I see. I'll put everything in FBSD_1.4. >> diff --git a/sys/kern/uipc_syscalls.c b/sys/kern/uipc_syscalls.c >> index c33a2cf..7354b4f 100644 >> --- a/sys/kern/uipc_syscalls.c >> +++ b/sys/kern/uipc_syscalls.c > >> +int >> +sys_recvmmsg(td, uap) >> + struct thread *td; >> + struct recvmmsg_args /* { >> + int s; >> + struct mmsghdr *msgvec; >> + unsigned int vlen; >> + int flags; >> + } */ *uap; >Use ANSI C definitions for new code, please. >I personally do not inline uap declaration, the time has gone. I'll fix it. >> +{ >> + struct mmsghdr *vec, *mmp; >> + struct msghdr *mp, msg; >> + struct iovec *uiov, *iov; >> + unsigned int vlen, len; >> + int i, rcvd = 0, error; >> + struct socket *so = NULL; >Style, no initialization in declaration. I'll fix it. >> + struct file *fp; >> + cap_rights_t rights; >> + >> + if (fget(td, uap->s, cap_rights_init(&rights, CAP_RECV), &fp) != 0) >> + return (EBADF); >> + >> + so = fp->f_data; >What if uap->s filedescriptor does not reference a socket ? kern_sendit/recvit() eventually call getsock_cap(). To me this means that send/recv(m)msg have to be limited only to sockets. I was in a need to grab access to so_error due to the error handling later. Probably I need to use something similar like kern_sendit/recvit's getsock_cap approach. Comments or a better approach to so_error? >>> + >>> + vlen = uap->vlen; >>> + if (vlen > UIO_MAXIOV) >>> + vlen = UIO_MAXIOV; >>> + >>> + len = vlen * sizeof(*uap->msgvec); >>> + vec = malloc(len, M_TEMP, M_WAITOK); >>This is a user-controlled malloc(9), i.e. an easy kernel panic due to >>impossible to satisfy request, or KVA and memory exhaustion. Having a glance in the manpage there it is stated that: M_WAITOK Indicates that it is OK to wait for resources. If the request cannot be immediately fulfilled, the current process is put to sleep to wait for resources to be released by other processes. The malloc(), realloc(), and reallocf() functions cannot return NULL if M_WAITOK is specified. I.e we'll be put to sleep and no NULL will be returned. I don't see how I'll get a panic here. Or you're stressing the malloc type M_TEMP here? More clarifications? >> + >> + error = copyin(uap->msgvec, vec, len); >> + if (error != 0) >> + goto out; >> + >> + for (i = 0; i < vlen; i++) { >> + mmp = &vec[i]; >> + mp = &mmp->msg_hdr; >> + msg = *mp; >> + >> + error = copyiniov(msg.msg_iov, msg.msg_iovlen, &iov, EMSGSIZE); >> + if (error != 0) >> + goto out; >> + >> + msg.msg_flags = uap->flags; >> +#ifdef COMPAT_OLDSOCK >> + msg.msg_flags &= ~MSG_COMPAT; >> +#endif >> + uiov = msg.msg_iov; >> + msg.msg_iov = iov; >> + >> + error = recvit(td, uap->s, &msg, NULL); >> + if (error == 0) { >> + msg.msg_iov = uiov; >> + error = copyout(&msg, &uap->msgvec[i].msg_hdr, sizeof(msg)); >> + if (error != 0) { >> + free(iov, M_IOV); >> + goto out; >> + } >> + error = copyout(&td->td_retval[0], &uap->msgvec[i].msg_len, >> + sizeof(td->td_retval[0])); >> + if (error != 0) { >> + free(iov, M_IOV); >> + goto out; >> + } >> + } >> + free(iov, M_IOV); >> + rcvd++; >> + } >> + >> + td->td_retval[0] = rcvd; >> + >> +out: >> + if (error != 0 && rcvd != 0 && rcvd != vlen) { >> + so->so_error = error; >> + error = 0; >> + td->td_retval[0] = rcvd; >> + } >> + >> + fdrop(fp, td); >> + >> + free(vec, M_TEMP); >> + return (error); >> +} >> + >> /* ARGSUSED */ >> int >> sys_shutdown(td, uap) >> diff --git a/sys/sys/socket.h b/sys/sys/socket.h >> index 18e2de1..d352cd2 100644 >> --- a/sys/sys/socket.h >> +++ b/sys/sys/socket.h >> @@ -414,6 +414,11 @@ struct msghdr { >> int msg_flags; /* flags on received message */ >> }; >> >> +struct mmsghdr { >> + struct msghdr msg_hdr; /* message header */ >> + unsigned int msg_len; /* message length */ >> +}; >> + >> #define MSG_OOB 0x1 /* process out-of-band data */ >> #define MSG_PEEK 0x2 /* peek at incoming message */ >> #define MSG_DONTROUTE 0x4 /* send without using routing tables */ >> @@ -615,10 +620,12 @@ int listen(int, int); >> ssize_t recv(int, void *, size_t, int); >> ssize_t recvfrom(int, void *, size_t, int, struct sockaddr * __restrict, socklen_t * __restrict); >> ssize_t recvmsg(int, struct msghdr *, int); >> +ssize_t recvmmsg(int, struct mmsghdr *, unsigned int, int); >> ssize_t send(int, const void *, size_t, int); >> ssize_t sendto(int, const void *, >> size_t, int, const struct sockaddr *, socklen_t); >> ssize_t sendmsg(int, const struct msghdr *, int); >> +ssize_t sendmmsg(int, const struct mmsghdr *, unsigned int, int); >Why did you put new functions into POSIX namespace ? >They are GNU (and now BSD) extensions, at least I do not see then in SUS. > >> #if __BSD_VISIBLE >> int sendfile(int, int, off_t, size_t, struct sf_hdtr *, off_t *, int); >> int setfib(int); I agree. Guess I'll have to put it ifdef'd under __BSD_VISIBLE since it's not POSIX. >The patch lacks man page update, and lacks the freebsd32 compat shims. I'll work on these later. >What are the reasons to go with the trivial kernel implementation >right now, instead of trivial and simpler libc wrapper ? I think the >speed would be same, but the code _much_ simpler. My initial idea was such that with the kernel approach we get closer to the driver itself and if it has mm optimizations we may have a benefit in future. And thus one day we won't need to redo the work scrubbing the libc-only implementation. Best regards, Boris Astardzhiev On Thu, Jan 7, 2016 at 12:56 PM, Luigi Rizzo <rizzo@iet.unipi.it> wrote: > Hi Boris, > thanks for working on this. > > A few comments: > > - do you have any performance data on calling *mmsg() versus > multiple invocations of the equivalent *msg() ? > > - in the following chunk in recvmmsg.c , there are slight > type differences between the function and the internal cast. > Same in sendmmsg.c > > +ssize_t > +recvmmsg(int s, struct mmsghdr *msgvec, unsigned int vlen, int flags) > +{ > + > + return (((int (*)(int, struct mmsghdr *, int, int)) > + __libc_interposing[INTERPOS_recvmmsg])(s, msgvec, vlen, > flags)); > +} > > > - why did you add a cap_rights_init() to the functions when > neither sendmsg or recvmsg have it (in other words, this is > a worthwhile addition but perhaps should be done later > > - the initial part of the two functions sys_*mmsg() is almost > exactly the same so can you define a function with the common code ? > > - you could probably avoid the repeated malloc/free of the iov > with a first loop that finds the max iov size and does the allocation > only once. copyiniov can then be replaced by a copyin > > - (minor) what is the point of copying the current entry into msg rather > than > just using mp-> ... > > - can you add a comment explaining why you do the following > > error = copyout(&td->td_retval[0], &uap->msgvec[i].msg_len, > sizeof(td->td_retval[0])); > > > cheers > luigi > > On Thu, Jan 7, 2016 at 1:51 AM, Boris Astardzhiev > <boris.astardzhiev@gmail.com> wrote: > > Hello, > > > > Here's my implementation of the two system calls. The patch is against > HEAD > > from a couple of days: > > commit ff9e83788d7ed52342dcba4dff1e62fdf3cc985c > > Author: ngie <ngie@FreeBSD.org> > > Date: Mon Jan 4 03:34:22 2016 +0000 > > > > Remove free'ing of an uninitialized variable > > > > Just remove it completely from the test as it's initialized but > unused > > apart > > from the free(3) call > > > > Differential Revision: https://reviews.freebsd.org/D4769 (part of > > larger diff) > > MFC after: 5 days > > Reported by: cppcheck > > Reviewed by: oshogbo > > Sponsored by: EMC / Isilon Storage Division > > > > I've made brief tests using the examples in the manpages in Linux > skipping > > the timeout stuff: > > http://man7.org/linux/man-pages/man2/sendmmsg.2.html > > http://man7.org/linux/man-pages/man2/recvmmsg.2.html > > > > I've tried to stick to the comments in the thread but any > > suggestions/testings > > are also welcomed so that I can fix the calls accordingly. > > > > Regards, > > Boris Astardzhiev > > > > > > On Mon, Jan 4, 2016 at 11:07 PM, Mark Delany <c2h@romeo.emu.st> wrote: > > > >> > You just repeat arguments for the text in my messages, which you > removed > >> > on reply. > >> > >> My goal is to get you to scruitinize. > >> > >> Thank you for helping. > >> > >> > >> Mark. > >> _______________________________________________ > >> freebsd-net@freebsd.org mailing list > >> https://lists.freebsd.org/mailman/listinfo/freebsd-net > >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > >> > > > > _______________________________________________ > > freebsd-net@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-net > > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > > > -- > -----------------------------------------+------------------------------- > Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. dell'Informazione > http://www.iet.unipi.it/~luigi/ . Universita` di Pisa > TEL +39-050-2217533 . via Diotisalvi 2 > Mobile +39-338-6809875 . 56122 PISA (Italy) > -----------------------------------------+------------------------------- >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAP=KkTzFUDsZwDDLD3n97xJW0qLVZMPduZGSX%2BeXC3UuLpVjMg>