Date: Sun, 18 Feb 2007 16:17:54 +0200 (EET) From: Dmitry Pryanishnikov <dmitry@atlantis.dp.ua> To: FreeBSD-gnats-submit@FreeBSD.org Subject: kern/109277: kernel ppp(4) botches clist reservation in RELENG_6 Message-ID: <200702181417.l1IEHsCJ001879@homelynx.homenet> Resent-Message-ID: <200702181440.l1IEe59K038282@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 109277 >Category: kern >Synopsis: kernel ppp(4) botches clist reservation in RELENG_6 >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sun Feb 18 14:40:05 GMT 2007 >Closed-Date: >Last-Modified: >Originator: Dmitry Pryanishnikov >Release: FreeBSD 6.2-STABLE i386 >Organization: Atlantis ISP >Environment: System: FreeBSD homelynx.homenet 6.2-STABLE FreeBSD 6.2-STABLE #0: Sun Feb 18 05:55:06 EET 2007 root@homelynx.homenet:/usr/obj/usr/RELENG_6/src/sys/lynx i386 Hardware: Intel D845EBG2 mainboard + Pentium(R) 4 CPU 2.80GHz + RAM 512Mb, ECC check+correction enabled. System is rock-stable when NOT using ppp(4). >Description: Very rare (maybe, once a month) spontaneous crashes occur during the active simultaneous use of kernel ppp and system console. When console is in X.org mode, system just silently reboots. OTOH, there is a certain chance to get valid crash dump when system console is in text mode. Last such a crash was "panic: clist reservation botch" (see cblock_alloc() function in /sys/kern/tty_subr.c), this was RELENG_6 as of 1-Feb-2007, backtrace was: panic(c05f55c8,0,c04cd3ee,20,38,...) at 0xc049a8a4 = panic+0xa8 b_to_q(c37fd6a8,24,c36d6838,c36d6838,0,...) at 0xc04cd60e = b_to_q+0xce pppasyncstart(c62bfc00,c36cd50c,0,c05f9daf,3e3) at 0xc0508ff4 = pppasyncstart+0x 108 pppoutput(c36cd400,c37fd600,c39b7a70,c39debdc,0,...) at 0xc0506a36 = pppoutput+0 x326 ip_output(c37fd600,0,d9bc79b8,0,0,c3a7e654) at 0xc0526ab4 = ip_output+0xa64 tcp_output(c3a81cb0) at 0xc052eee5 = tcp_output+0xe05 tcp_input(c37fde00,14,d9bc7b80,0,0,...) at 0xc052d467 = tcp_input+0x28df ip_input(c37fde00,c37fde74,0,8c,c37fde00,...) at 0xc05248ad = ip_input+0x75d div_send(c3a826f4,0,c37fde00,c6a27120,0,...) at 0xc079bc1b = div_send+0x17b sosend(c3a826f4,c6a27120,d9bc7c40,c37fde00,0,0,c382c000) at 0xc04d1fd3 = sosend+ 0x5eb kern_sendit(c382c000,3,d9bc7cbc,0,0,0) at 0xc04d71a4 = kern_sendit+0x104 sendit(c382c000,3,d9bc7cbc,0,bfbdebfc,...) at 0xc04d7077 = sendit+0x147 sendto(c382c000,d9bc7d04) at 0xc04d72d5 = sendto+0x4d syscall(3b,3b,bfbe003b,1,8c,...) at 0xc05c62c7 = syscall+0x22f Xint0x80_syscall() at 0xc05b495f = Xint0x80_syscall+0x1f I've decided to look thru closed PRs and found kern/25632, which describes a similar problem (yes, that was RELENG_4 kernel vs. USB stack interaction, but the result - bothched clist reservation - was the same). So there's apparently a lack of proper locking during the operations with clist in kernel ppp within modern (at least RELENG_6) kernel. >How-To-Repeat: I've shamelessly stolen the idea of cblock_alloc() recursion detection for the kern/25632: --- tty_subr.c.orig Fri Jan 7 01:35:40 2005 +++ tty_subr.c Sun Feb 18 14:37:29 2007 @@ -94,17 +94,30 @@ * Remove a cblock from the cfreelist queue and return a pointer * to it. */ +static int someone_here = 0; +#define N1MAX 100000 static __inline struct cblock * cblock_alloc() { struct cblock *cblockp; + int n1; + for (n1=0; n1<N1MAX; n1++) + if (someone_here != 0) panic("cblock_alloc recursion a"); + someone_here++; + for (n1=0; n1<N1MAX; n1++) + if (someone_here != 1) panic("cblock_alloc recursion b"); cblockp = cfreelist; if (cblockp == NULL) panic("clist reservation botch"); cfreelist = cblockp->c_next; cblockp->c_next = NULL; cfreecount -= CBSIZE; + for (n1=0; n1<N1MAX; n1++) + if (someone_here != 1) panic("cblock_alloc recursion c"); + someone_here--; + for (n1=0; n1<N1MAX; n1++) + if (someone_here != 0) panic("cblock_alloc recursion d"); return (cblockp); } With the kernel patched this way I've got the "cblock_alloc recursion a" panic almost immediately after setting up ppp(4) connection and pinging remote peer with 'ping -f' and simultaneous trampling upon the keyboard: #11 0xc049abd3 in panic (fmt=0xc05f6208 "cblock_alloc recursion a") at /usr/RELENG_6/src/sys/kern/kern_shutdown.c:549 #12 0xc04cd7b7 in putc (chr=39, clistp=0xc36ef000) at /usr/RELENG_6/src/sys/kern/tty_subr.c:106 #13 0xc04c6b6b in ttyinput (c=39, tp=0xc36ef000) at /usr/RELENG_6/src/sys/kern/tty.c:657 #14 0xc05a81e9 in sckbdevent (thiskbd=0xc064f440, event=0, arg=0xc0667000) at linedisc.h:122 #15 0xc05974ed in atkbd_intr (kbd=0xc064f440, arg=0x0) at /usr/RELENG_6/src/sys/dev/atkbdc/atkbd.c:503 #16 0xc059860a in atkbdintr (arg=0xc1015000) at /usr/RELENG_6/src/sys/dev/atkbdc/atkbd_atkbdc.c:174 #17 0xc0487712 in ithread_execute_handlers (p=0xc36aa000, ie=0xc35afc00) at /usr/RELENG_6/src/sys/kern/kern_intr.c:682 #18 0xc0487836 in ithread_loop (arg=0xc36e2660) at /usr/RELENG_6/src/sys/kern/kern_intr.c:765 #19 0xc0486980 in fork_exit (callout=0xc04877d0 <ithread_loop>, arg=0xc36e2660, frame=0xd5633d38) at /usr/RELENG_6/src/sys/kern/kern_fork.c:821 #20 0xc05b551c in fork_trampoline () at /usr/RELENG_6/src/sys/i386/i386/exception.s:208 Looks like ppp(4) enters cblock_alloc(), then gets preempted, then ttyinput() reenters cblock_alloc(). >Fix: I'm ready to provide further debugging information on this issue. Unfortunately, I'm not familiar enough with the locking concepts in modern FreeBSD kernels (and in tty subsystem particularly) in order to make the fix myself. >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200702181417.l1IEHsCJ001879>