Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 28 Oct 2002 23:07:37 -0800 (PST)
From:      Kelly Yancey <kelly@nttmcl.com>
To:        freebsd-arch@freebsd.org
Subject:   RFC: Exporting number of bytes of protocol data to userland
Message-ID:  <20021028230434.U91753-200000@alicia.nttmcl.com>

next in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]

  The attached patch is rather short so the impatient can probably skip
right to the source.

  The background is that there are at least 3 interfaces which report the
"number of bytes in the socket buffer" to userland:
	ioctl(s, FIONREAD, &len)
	stat(2) via the st_size member of struct stat
	kqueue(2) via data member of struct kevents returned for
		EVFILT_READ filters.

  The problem is that the number of bytes in a receive socket buffer is a
trivial piece of information at best.  Since there is no way for an
application to determine how much of that data is non-protocol data (i.e.
OOB, control, etc), it cannot use the number of anything: all of the
kernel interfaces for reading from a socket buffer take the number of
bytes of *protocol data* as their length parameter (actually, this is a
slight exageration: TCP does well since OOB data is handled differently
than OSI protocols).
  PR 30634 touches on this issue; UDP sockets are particularly visible
examples since they always include 16 bytes of address information in
addition to the datagram received.

  The attached patch adds an additional member to the sockbuf structure to
track the amount of non-protocol data in the buffer so that interfaces can
account for that before reporting to userland.  The patch also updates
stat(2) and kqueue(2) to report the number of bytes of protocol data
rather than the total number of bytes of all data in the socket buffer.
While ioctl(s, FIONREAD, &len) is also a candidate, I didn't dare touch it
because it is an old, well-documented interface.  But the point is that
there needs to be at least one interface which reports a value to userland
that it can actually use.
  As alluded to above, TCP sockets should be unaffected because they have
no non-protocol data in the socket buffer so the majority of socket-using
applications won't notice the change at all.  However, UDP sockets and
non-IP sockets will benefit from knowing how much readable data is
available.

  A version of this patch has been floating around -net for over a week
now with no comments but since it slightly changes API semantics I thought
it best to give it another round of review on -arch before committing.

  Thanks,

  Kelly

--
Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org} -- kelly@nttmcl.com

[-- Attachment #2 --]
Index: lib/libc/sys/kqueue.2
===================================================================
RCS file: /home/ncvs/src/lib/libc/sys/kqueue.2,v
retrieving revision 1.28
diff -u -p -u -r1.28 kqueue.2
--- lib/libc/sys/kqueue.2	2 Jul 2002 21:04:00 -0000	1.28
+++ lib/libc/sys/kqueue.2	29 Oct 2002 03:42:03 -0000
@@ -226,7 +226,7 @@ and specifying the new low water mark in
 .Va data .
 On return,
 .Va data
-contains the number of bytes in the socket buffer.
+contains the number of bytes of protocol data available to read.
 .Pp
 If the read direction of the socket has shutdown, then the filter
 also sets EV_EOF in
Index: sys/kern/sys_socket.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/sys_socket.c,v
retrieving revision 1.46
diff -u -p -u -r1.46 sys_socket.c
--- sys/kern/sys_socket.c	6 Oct 2002 14:39:14 -0000	1.46
+++ sys/kern/sys_socket.c	29 Oct 2002 03:43:37 -0000
@@ -206,7 +206,7 @@ soo_stat(fp, ub, active_cred, td)
 		ub->st_mode |= S_IRUSR | S_IRGRP | S_IROTH;
 	if ((so->so_state & SS_CANTSENDMORE) == 0)
 		ub->st_mode |= S_IWUSR | S_IWGRP | S_IWOTH;
-	ub->st_size = so->so_rcv.sb_cc;
+	ub->st_size = so->so_rcv.sb_cc - so->so_rcv.sb_ctl;
 	ub->st_uid = so->so_cred->cr_uid;
 	ub->st_gid = so->so_cred->cr_gid;
 	return ((*so->so_proto->pr_usrreqs->pru_sense)(so, ub));
Index: sys/kern/uipc_socket.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/uipc_socket.c,v
retrieving revision 1.132
diff -u -p -u -r1.132 uipc_socket.c
--- sys/kern/uipc_socket.c	5 Oct 2002 21:23:46 -0000	1.132
+++ sys/kern/uipc_socket.c	16 Oct 2002 21:32:01 -0000
@@ -1785,6 +1785,7 @@ filt_soread(struct knote *kn, long hint)
 	struct socket *so = (struct socket *)kn->kn_fp->f_data;
 
 	kn->kn_data = so->so_rcv.sb_cc;
+	kn->kn_data -= so->so_rcv.sb_ctl;
 	if (so->so_state & SS_CANTRCVMORE) {
 		kn->kn_flags |= EV_EOF;
 		kn->kn_fflags = so->so_error;
Index: sys/sys/socketvar.h
===================================================================
RCS file: /home/ncvs/src/sys/sys/socketvar.h,v
retrieving revision 1.94
diff -u -p -u -r1.94 socketvar.h
--- sys/sys/socketvar.h	17 Aug 2002 02:36:16 -0000	1.94
+++ sys/sys/socketvar.h	16 Oct 2002 21:34:13 -0000
@@ -105,6 +105,7 @@ struct socket {
 		u_int	sb_hiwat;	/* max actual char count */
 		u_int	sb_mbcnt;	/* chars of mbufs used */
 		u_int	sb_mbmax;	/* max chars of mbufs to use */
+		u_int	sb_ctl;		/* non-data chars in buffer */
 		int	sb_lowat;	/* low water mark */
 		int	sb_timeo;	/* timeout for read/write */
 		short	sb_flags;	/* flags, see below */
@@ -227,6 +228,8 @@ struct xsocket {
 /* adjust counters in sb reflecting allocation of m */
 #define	sballoc(sb, m) { \
 	(sb)->sb_cc += (m)->m_len; \
+	if ((m)->m_type != MT_DATA) \
+		(sb)->sb_ctl += (m)->m_len; \
 	(sb)->sb_mbcnt += MSIZE; \
 	if ((m)->m_flags & M_EXT) \
 		(sb)->sb_mbcnt += (m)->m_ext.ext_size; \
@@ -235,6 +238,8 @@ struct xsocket {
 /* adjust counters in sb reflecting freeing of m */
 #define	sbfree(sb, m) { \
 	(sb)->sb_cc -= (m)->m_len; \
+	if ((m)->m_type != MT_DATA) \
+		(sb)->sb_ctl -= (m)->m_len; \
 	(sb)->sb_mbcnt -= MSIZE; \
 	if ((m)->m_flags & M_EXT) \
 		(sb)->sb_mbcnt -= (m)->m_ext.ext_size; \

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20021028230434.U91753-200000>