Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 12 Sep 1999 23:19:13 -0400 (EDT)
From:      Bosko Milekic <bmilekic@dsuper.net>
To:        Stas Kisel <stas@sonet.crimea.ua>
Cc:        avalon@coombs.anu.edu.au, freebsd-hackers@FreeBSD.ORG, freebsd-security@FreeBSD.ORG
Subject:   Re: mbuf shortage situations (followup)
Message-ID:  <Pine.OSF.4.05.9909122304470.18795-300000@oracle.dsuper.net>
In-Reply-To: <199909091447.SAA24055@sonet.crimea.ua>

next in thread | previous in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]

Hello (again),

On Thu, 9 Sep 1999, Stas Kisel wrote:

!>> From avalon@cheops.anu.edu.au Thu Sep  9 16:17:27 1999
!>
!>> > Probably it is not self-evident why we HAVE to drop this connection.
!>>
!>> So what if someone manages to crash a program due to a DOS attack ?
!>> An easy one that comes to mind is syslogd.  It's often stuck in disk-wait
!>> and can easily be targetted with a large number of packets.
!>
!>1. If ever syslog used (or will use) TCP, it should drop the connection
!>which is logging data too quickly.
!>OS shouldn't kill process, only drop connection. So no crash.
!>More examples?
!>
!>2. udp_drain() may either drop all packets or intellectually select
!>"offending" socket and try to avoid deletion of "right" packets and
!>simplifying spoofing. RFC allows 1st way, but 2-nd can improve OS.
!>
!>3. Another idea. Apart from the *_drain() method. Probably I ever will
!>try to implement it somedays (quite low probability, though).
!>Set TCP window in a packets according to really available kernel
!>memory. Available memory should be distributed non-uniformly
!>between maximum number of sockets. So 1-st socket has window=
!>=64k-still_not_read_data, and 1024-th has window=MIN_WINDOW-
!>-still_not_read_data. 
!>MIN_WINDOW should be determined for max efficiency. About 2k.
!>Distribution can not be linear - it isapproximately like min(NORM*1/x,64k).
!>Exactly it can be determined via functional equation. Something like
!>\integral_0^maxsockets{dist(x)dx}=kernel_memory and several
!>conditions. (sorry for my poor TeX).
!>
!>In a case of attack new sockets will be created with a very small
!>window - about 2k.
!>
!>Please blame me as much as possible - probably I have missed some significant
!>detail.
!>Probably all this math suxx and the best is a "stair" function -
!>somebody already works on lowering TCP window, if I didn't mistaken.
!>
!>
!>--
!>Stas Kisel. UNIX, security, C, TCP/IP, Web. UNIX - the best adventure game
!>http://www.tekmetrics.com/transcript.shtml?pid=20053 http://www.crimea.edu
!>+380(652)510222,230238 ; stas@crimea.edu stas@sonet.crimea.ua ; 2:460/54.4
!>

	These are all interesting ideas.

	However, when I initially posted regarding this, I was
disappointed in seeing that we simply handle MGET()s, MGETHDR()s, and
MCLALLOC()s by storing a null _even_ if we are M_WAIT. What basically
ended up happening (and, the last time I checked, it's like this even in
--CURRENT), is m_retry() -- or m_retryhdr() (this is in the case of no
mbufs beings available) would simply panic().
	I have produced patches (see attached -- they are seperated into
two different patches, mbuf.patch which patches kern/uipc_mbuf.c and
mbuf2.patch which patches sys/mbuf.h) that will basically tsleep() in the
case of an M_WAIT and mbuf or mbuf cluster shortage. I wanted something
that would make sure that we will get a non-NULL result when we called
with M_WAIT. Obviously, this isn't the definite solution to the DoS
problem that seemed to have become the main idea of discussion in this
thread.
	However, I've kept that in mind, and I am now starting work (when
time permits) on some code which will enable us to warn the network
protocol module code that we're out of mbufs (or mbuf clusters) when the
situation occurs. This way, if we can't get anything even with m_reclaim
(which would be called from m_retry if we are M_WAIT), we could have the
protocols figure out what to drop.
	I'm also aware of the possiblity of some people not liking the
fact that we tsleep() forever (e.g. tsleep(x,x,x,0)). Thus, I am open to
modifying the diffs to add a counter and have the tsleep expire every once
in a while so that finally, when the counter would expire, we would return
a deffinate null _even_ if we are M_WAIT, but this can only be implemented
if we make sure that ALL the MGET and company callers check for this
(which would be annoying to do).


Cheers,
Bosko Milekic.


[-- Attachment #2 --]
--- /usr/src/sys/kern.old/uipc_mbuf.c	Wed Sep  8 20:45:50 1999
+++ /usr/src/sys/kern/uipc_mbuf.c	Sun Sep 12 22:44:23 1999
@@ -60,6 +60,8 @@
 int	max_hdr;
 int	max_datalen;
 
+static int m_mballoc_wid = 0, m_clalloc_wid = 0;
+
 SYSCTL_INT(_kern_ipc, KIPC_MAX_LINKHDR, max_linkhdr, CTLFLAG_RW,
 	   &max_linkhdr, 0, "");
 SYSCTL_INT(_kern_ipc, KIPC_MAX_PROTOHDR, max_protohdr, CTLFLAG_RW,
@@ -153,6 +155,57 @@
 	return (1);
 }
 
+/*
+ * Function used for waiting on some mbuf to be freed and, upon wakeup,
+ * to go get that mbuf and use it.
+ */
+struct mbuf *
+m_mballoc_wait(caller, type)
+	int caller;
+	int type;
+{
+	struct mbuf *p;
+
+RetryFetch:
+	/* Sleep here until something's available. */
+	m_mballoc_wid++;
+	tsleep(&m_mballoc_wid, PVM, "mballc", 0);
+	
+	/*
+	 * Now that we (think) that we've got something, we will redo an
+	 * MGET, but avoid getting into another instance of m_mballoc_wait()
+	 * We do this by defining this function as null.
+	 */
+#define m_mballoc_wait(caller,type) (struct mbuf *)0 
+	if (caller == MGET_C) {
+		MGET(p,M_WAIT,type);
+	} else {
+		MGETHDR(p,M_WAIT,type);
+	}
+#undef m_mballoc_wait
+ 
+	/*
+	 * If we fail yet again, go back to sleeping.
+	 * XXX Perhaps we should implement a limit somewhere here. 
+	 */
+	if (p == NULL)
+		goto RetryFetch;
+
+	return (p);
+}
+
+/*
+ * Function used to wakeup sleepers waiting for mbufs...
+ */
+void
+m_mballoc_wakeup(void)
+{
+	if (m_mballoc_wid) {
+		m_mballoc_wid = 0;
+		wakeup(&m_mballoc_wid);
+	}
+}
+
 #if MCLBYTES > PAGE_SIZE
 static int i_want_my_mcl;
 
@@ -242,6 +295,53 @@
 }
 
 /*
+ * This function will be used to sleep and wait until we have a free
+ * mbuf cluster. This is for callers with M_WAIT who'd like to avoid
+ * returning NULL and take the heat, waiting (which is logically what
+ * should happen anyway with an M_WAIT).
+ */
+caddr_t
+m_clalloc_wait(void)
+{
+	caddr_t p;
+
+RetryClust:
+	/* Sleep here until something's available. */
+	m_clalloc_wid++;
+	tsleep(&m_clalloc_wid, PVM, "mclalc", 0);
+
+	/*
+	 * Now that we (think) that we've got something, we will redo and
+	 * MGET, but avoid getting into another instance of m_clalloc_wait()
+	 * We do this by defining this function as null.
+	 */
+#define m_clalloc_wait() (caddr_t)0
+	MCLALLOC(p,M_WAIT);
+#undef m_clalloc_wait
+
+	/*
+	 * If we fail yet again, go back to sleeping.
+	 * XXX Perhaps we should implement a limit somewhere here.
+	 */
+	if (p == NULL)
+		goto RetryClust;
+
+	return (p);
+}
+
+/*
+ * Function used to wakeup sleepers waiting for mbuf clusters...
+ */
+void
+m_clalloc_wakeup(void)
+{
+	if (m_clalloc_wid) {
+		m_clalloc_wid = 0;
+		wakeup(&m_clalloc_wid);
+	}
+}
+
+/*
  * When MGET fails, ask protocols to free space when short of memory,
  * then re-attempt to allocate an mbuf.
  */
@@ -261,12 +361,9 @@
 #undef m_retry
 	if (m != NULL) {
 		mbstat.m_wait++;
-	} else {
-		if (i == M_DONTWAIT)
-			mbstat.m_drops++;
-		else
-			panic("Out of mbuf clusters");
-	}
+	} else 
+		mbstat.m_drops++;
+
 	return (m);
 }
 
@@ -289,12 +386,9 @@
 #undef m_retryhdr
 	if (m != NULL) {
 		mbstat.m_wait++;
-	} else {
-		if (i == M_DONTWAIT)
-			mbstat.m_drops++;
-		else
-			panic("Out of mbuf clusters");
-	}
+	} else 
+		mbstat.m_drops++;
+
 	return (m);
 }
 

[-- Attachment #3 --]
--- /usr/src/sys/sys.old/mbuf.h		Sat Sep 11 19:10:44 1999
+++ /usr/src/sys/sys/mbuf.h		Sun Sep 12 22:44:44 1999
@@ -153,6 +153,13 @@
 #define	M_DONTWAIT	1
 #define	M_WAIT		0
 
+/* 
+ * Flags to pass to the *_wait functions (when we have to wait for an
+ * mbuf to be freed).
+ */
+#define MGET_C		1
+#define MGETHDR_C	2
+
 /* Freelists:
  *
  * Normal mbuf clusters are normally treated as character arrays
@@ -203,7 +210,8 @@
 		splx(_ms); \
 	} else { \
 		splx(_ms); \
-		(m) = m_retry((how), (type)); \
+		if (((m)=m_retry((how), (type)))==NULL && (how)==M_WAIT) \
+			(m) = m_mballoc_wait(MGET_C,(type)); \
 	} \
 }
 
@@ -223,7 +231,8 @@
 		splx(_ms); \
 	} else { \
 		splx(_ms); \
-		(m) = m_retryhdr((how), (type)); \
+		if (((m)=m_retryhdr((how),(type)))==NULL && (how)==M_WAIT) \
+			(m) = m_mballoc_wait(MGETHDR_C,(type)); \
 	} \
 }
 
@@ -235,16 +244,20 @@
  * MCLFREE releases a reference to a cluster allocated by MCLALLOC,
  * freeing the cluster if the reference count has reached 0.
  */
-#define	MCLALLOC(p, how) \
-	MBUFLOCK( \
-	  if (mclfree == 0) \
+#define	MCLALLOC(p, how) { \
+	int _ms = splimp(); \
+	if (mclfree == 0) \
 		(void)m_clalloc(1, (how)); \
-	  if (((p) = (caddr_t)mclfree) != 0) { \
+	if (((p) = (caddr_t)mclfree) != 0) { \
 		++mclrefcnt[mtocl(p)]; \
 		mbstat.m_clfree--; \
 		mclfree = ((union mcluster *)(p))->mcl_next; \
-	  } \
-	)
+		splx(_ms); \
+ 	} else if ((how) == M_WAIT) { \
+		splx(_ms); \
+		(p) = m_clalloc_wait(); \
+	} \
+}
 
 #define	MCLGET(m, how) \
 	{ MCLALLOC((m)->m_ext.ext_buf, (how)); \
@@ -263,6 +276,7 @@
 		((union mcluster *)(p))->mcl_next = mclfree; \
 		mclfree = (union mcluster *)(p); \
 		mbstat.m_clfree++; \
+		(void)m_clalloc_wakeup(); \
 	  } \
 	)
 
@@ -284,6 +298,7 @@
 				((union mcluster *)(p))->mcl_next = mclfree; \
 				mclfree = (union mcluster *)(p); \
 				mbstat.m_clfree++; \
+				(void)m_clalloc_wakeup(); \
 			} \
 		} \
 	  } \
@@ -292,6 +307,7 @@
 	  mbstat.m_mtypes[MT_FREE]++; \
 	  (m)->m_next = mmbfree; \
 	  mmbfree = (m); \
+	  (void)m_mballoc_wakeup(); \
 	)
 
 /*
@@ -408,16 +424,20 @@
 struct	mbuf *m_gethdr __P((int, int));
 struct	mbuf *m_prepend __P((struct mbuf *,int,int));
 struct	mbuf *m_pullup __P((struct mbuf *, int));
+struct	mbuf *m_mballoc_wait __P((int,int));
 struct	mbuf *m_retry __P((int, int));
 struct	mbuf *m_retryhdr __P((int, int));
 struct	mbuf *m_split __P((struct mbuf *,int,int));
 void	m_adj __P((struct mbuf *, int));
 void	m_cat __P((struct mbuf *,struct mbuf *));
+void	m_mballoc_wakeup __P((void));
+void	m_clalloc_wakeup __P((void));
 int	m_mballoc __P((int, int));
 int	m_clalloc __P((int, int));
 void	m_copyback __P((struct mbuf *, int, int, caddr_t));
 void	m_copydata __P((struct mbuf *,int,int,caddr_t));
 void	m_freem __P((struct mbuf *));
+caddr_t	m_clalloc_wait __P((void));
 #endif /* KERNEL */
 
 #endif /* !_SYS_MBUF_H_ */

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.OSF.4.05.9909122304470.18795-300000>