Date: Fri, 13 Mar 1998 12:31:13 +0200 (SAT) From: Reinier Bezuidenhout <rbezuide@oskar.nanoteq.co.za> To: freebsd-hackers@FreeBSD.ORG Subject: 2.2.5 PANIC when out of mbufs Message-ID: <199803131032.MAA04804@oskar.nanoteq.co.za>
next in thread | raw e-mail | index | archive | help
Hi ...
It seems that there is something a-foot in uipc_mbuf.c
We have an application program that basically does a relay-ing
function at the user level. I have the following test setup
PII-266/64MB <-100 Mbit-> P166/128MB <-100 Mbit-> PII-266/64MB
connected with X-over cat 5 cables.
I am using ttcp on the one PII to connect to the "relay" on
the P166 that reconnects me to a small get-and-dump server
on the other PII.
Between the PII's I can start 400 of these sessions.
I try to start 300 sessions through the relay by starting
them 1 at a time with a 1sec delay between them, each traqnsferring
about 26M of data.
The MBUF clusters start to increase on the P166 to about
2942 mbufs in use
2703/2714 mbuf clusters in use
( the P166 kernel has 128 max users = (512 + 128 * 16) = 2560
mbuf clusters according to param.c )
The kernel panics with the following on screen message
Fatal trap 12: page fault while in kernel mode
fault virtual address = 0x18
fault code = supervisor write, page not present
instruction pointer = 0x8:0xf0122d1d
stack pointer = 0x10:0xefbffe94
frame pointer = 0x10:0xefbffeb8
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 998 (tcpr)
interrupt mask =
panic: page fault
I have recreated this about 8 time with the ip always being 0xf0122d1d
nm /kernel | sort
---- cut ------
f0122528 T _solisten
f01225d4 T _sofree
f01226a0 T _soclose
f01227c0 T _soabort
f01227e8 T _soaccept
f0122850 T _soconnect
f01228d8 T _soconnect2
f0122924 T _sodisconnect
f0122990 T _sosend <------ in here
f012304c T _soreceive
f01239c8 T _soshutdown
f0123a04 T _sorflush
f0123ad0 T _sosetopt
f0123d98 T _sogetopt
----- cur ------
Hi then had a look at the files being used, and saw the following
gdb -k kernel /a/tmp/vmcore.2
(kgdb) bt
#0 0xf010e7d3 in boot ()
#1 0xf010ea92 in panic ()
#2 0xf0192fb6 in trap_fatal ()
#3 0xf0192aa4 in trap_pfault ()
#4 0xf019277f in trap ()
#5 0xf0122d1d in sosend (so=0xf13d4500, addr=0x0, uio=0xefbffef4, top=0x0,
control=0x0, flags=0) at ../../kern/uipc_socket.c:427
#6 0xf0125a81 in sendit ()
#7 0xf0125b60 in sendto ()
#8 0xf019324f in syscall ()
#9 0x200c8bd1 in ?? ()
#10 0xc55e in ?? ()
#11 0xcbf0 in ?? ()
#12 0x2419 in ?? ()
#13 0x2374 in ?? ()
#14 0x1095 in ?? ()
(kgdb)
#5 0xf0122d1d in sosend (so=0xf13d4500, addr=0x0, uio=0xefbffef4, top=0x0,
control=0x0, flags=0) at ../../kern/uipc_socket.c:427
427 mlen = MHLEN;
(kgdb) li
422 if (flags & MSG_EOR)
423 top->m_flags |= M_EOR;
424 } else do {
425 if (top == 0) {
426 MGETHDR(m, M_WAIT, MT_DATA);
427 mlen = MHLEN;
428 m->m_pkthdr.len = 0;
429 m->m_pkthdr.rcvif = (struct ifnet *)0;
430 } else {
431 MGET(m, M_WAIT, MT_DATA);
(kgdb) p m
$1 = (struct mbuf *) 0x0
(kgdb)
I then had a look in sys/sys/mbuf.h and saw the following
#define MGETHDR(m, how, type) { \
int _ms = splimp(); \
if (mmbfree == 0) \
(void)m_mballoc(1, (how)); \
if (((m) = mmbfree) != 0) { \
mmbfree = (m)->m_next; \
mbstat.m_mtypes[MT_FREE]--; \
(m)->m_type = (type); \
mbstat.m_mtypes[type]++; \
(m)->m_next = (struct mbuf *)NULL; \
(m)->m_nextpkt = (struct mbuf *)NULL; \
(m)->m_data = (m)->m_pktdat; \
(m)->m_flags = M_PKTHDR; \
splx(_ms); \
} else { \
splx(_ms); \
(m) = m_retryhdr((how), (type)); \
} \
}
say it goes to the else because no mbufs are available, then it will
call m_retryhdr
in uipc_mbuf.c
struct mbuf *
m_retryhdr(i, t)
int i, t;
{
register struct mbuf *m;
m_reclaim();
#define m_retryhdr(i, t) (struct mbuf *)0
MGETHDR(m, i, t);
#undef m_retryhdr
if (m != NULL)
mbstat.m_wait++;
else
mbstat.m_drops++;
return (m);
}
say the m_reclaim doesn't free anything because everything is in use ..
Then it will make m_retryhdr(i, t) null and recall MGETHDR(m, i, t) who
still can't allocate anything and then does
(m) = m_retryhdr((how), (type)); which has now been defined as 0x0 ...
MGETHDR (with M_WAIT) defined now happily returns m = 0x0 and no
one checks for that.
It then causes the kernel to panic.
:) would it not have been easier to call panic from withing the else
in m_retryhdr :) instead of waiting for the mbuf to be referenced :)
Am I missing anything obvious here ???
Thanx
Reinier
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199803131032.MAA04804>
