Date: Fri, 13 Mar 1998 12:31:13 +0200 (SAT) From: Reinier Bezuidenhout <rbezuide@oskar.nanoteq.co.za> To: freebsd-hackers@FreeBSD.ORG Subject: 2.2.5 PANIC when out of mbufs Message-ID: <199803131032.MAA04804@oskar.nanoteq.co.za>
next in thread | raw e-mail | index | archive | help
Hi ... It seems that there is something a-foot in uipc_mbuf.c We have an application program that basically does a relay-ing function at the user level. I have the following test setup PII-266/64MB <-100 Mbit-> P166/128MB <-100 Mbit-> PII-266/64MB connected with X-over cat 5 cables. I am using ttcp on the one PII to connect to the "relay" on the P166 that reconnects me to a small get-and-dump server on the other PII. Between the PII's I can start 400 of these sessions. I try to start 300 sessions through the relay by starting them 1 at a time with a 1sec delay between them, each traqnsferring about 26M of data. The MBUF clusters start to increase on the P166 to about 2942 mbufs in use 2703/2714 mbuf clusters in use ( the P166 kernel has 128 max users = (512 + 128 * 16) = 2560 mbuf clusters according to param.c ) The kernel panics with the following on screen message Fatal trap 12: page fault while in kernel mode fault virtual address = 0x18 fault code = supervisor write, page not present instruction pointer = 0x8:0xf0122d1d stack pointer = 0x10:0xefbffe94 frame pointer = 0x10:0xefbffeb8 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 998 (tcpr) interrupt mask = panic: page fault I have recreated this about 8 time with the ip always being 0xf0122d1d nm /kernel | sort ---- cut ------ f0122528 T _solisten f01225d4 T _sofree f01226a0 T _soclose f01227c0 T _soabort f01227e8 T _soaccept f0122850 T _soconnect f01228d8 T _soconnect2 f0122924 T _sodisconnect f0122990 T _sosend <------ in here f012304c T _soreceive f01239c8 T _soshutdown f0123a04 T _sorflush f0123ad0 T _sosetopt f0123d98 T _sogetopt ----- cur ------ Hi then had a look at the files being used, and saw the following gdb -k kernel /a/tmp/vmcore.2 (kgdb) bt #0 0xf010e7d3 in boot () #1 0xf010ea92 in panic () #2 0xf0192fb6 in trap_fatal () #3 0xf0192aa4 in trap_pfault () #4 0xf019277f in trap () #5 0xf0122d1d in sosend (so=0xf13d4500, addr=0x0, uio=0xefbffef4, top=0x0, control=0x0, flags=0) at ../../kern/uipc_socket.c:427 #6 0xf0125a81 in sendit () #7 0xf0125b60 in sendto () #8 0xf019324f in syscall () #9 0x200c8bd1 in ?? () #10 0xc55e in ?? () #11 0xcbf0 in ?? () #12 0x2419 in ?? () #13 0x2374 in ?? () #14 0x1095 in ?? () (kgdb) #5 0xf0122d1d in sosend (so=0xf13d4500, addr=0x0, uio=0xefbffef4, top=0x0, control=0x0, flags=0) at ../../kern/uipc_socket.c:427 427 mlen = MHLEN; (kgdb) li 422 if (flags & MSG_EOR) 423 top->m_flags |= M_EOR; 424 } else do { 425 if (top == 0) { 426 MGETHDR(m, M_WAIT, MT_DATA); 427 mlen = MHLEN; 428 m->m_pkthdr.len = 0; 429 m->m_pkthdr.rcvif = (struct ifnet *)0; 430 } else { 431 MGET(m, M_WAIT, MT_DATA); (kgdb) p m $1 = (struct mbuf *) 0x0 (kgdb) I then had a look in sys/sys/mbuf.h and saw the following #define MGETHDR(m, how, type) { \ int _ms = splimp(); \ if (mmbfree == 0) \ (void)m_mballoc(1, (how)); \ if (((m) = mmbfree) != 0) { \ mmbfree = (m)->m_next; \ mbstat.m_mtypes[MT_FREE]--; \ (m)->m_type = (type); \ mbstat.m_mtypes[type]++; \ (m)->m_next = (struct mbuf *)NULL; \ (m)->m_nextpkt = (struct mbuf *)NULL; \ (m)->m_data = (m)->m_pktdat; \ (m)->m_flags = M_PKTHDR; \ splx(_ms); \ } else { \ splx(_ms); \ (m) = m_retryhdr((how), (type)); \ } \ } say it goes to the else because no mbufs are available, then it will call m_retryhdr in uipc_mbuf.c struct mbuf * m_retryhdr(i, t) int i, t; { register struct mbuf *m; m_reclaim(); #define m_retryhdr(i, t) (struct mbuf *)0 MGETHDR(m, i, t); #undef m_retryhdr if (m != NULL) mbstat.m_wait++; else mbstat.m_drops++; return (m); } say the m_reclaim doesn't free anything because everything is in use .. Then it will make m_retryhdr(i, t) null and recall MGETHDR(m, i, t) who still can't allocate anything and then does (m) = m_retryhdr((how), (type)); which has now been defined as 0x0 ... MGETHDR (with M_WAIT) defined now happily returns m = 0x0 and no one checks for that. It then causes the kernel to panic. :) would it not have been easier to call panic from withing the else in m_retryhdr :) instead of waiting for the mbuf to be referenced :) Am I missing anything obvious here ??? Thanx Reinier To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199803131032.MAA04804>