From owner-freebsd-bugs@FreeBSD.ORG Wed Nov 9 13:20:26 2005 Return-Path: X-Original-To: freebsd-bugs@hub.freebsd.org Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 10C3A16A41F for ; Wed, 9 Nov 2005 13:20:26 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 17A7843D5E for ; Wed, 9 Nov 2005 13:20:20 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.3/8.13.3) with ESMTP id jA9DKJkT041903 for ; Wed, 9 Nov 2005 13:20:19 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.3/8.13.1/Submit) id jA9DKJgk041902; Wed, 9 Nov 2005 13:20:19 GMT (envelope-from gnats) Resent-Date: Wed, 9 Nov 2005 13:20:19 GMT Resent-Message-Id: <200511091320.jA9DKJgk041902@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Victor Snezhko Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 961AA16A41F for ; Wed, 9 Nov 2005 13:19:50 +0000 (GMT) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [216.136.204.117]) by mx1.FreeBSD.org (Postfix) with ESMTP id 48BEF43D46 for ; Wed, 9 Nov 2005 13:19:50 +0000 (GMT) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.13.1/8.13.1) with ESMTP id jA9DJnKJ050267 for ; Wed, 9 Nov 2005 13:19:49 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.13.1/8.13.1/Submit) id jA9DJnlB050266; Wed, 9 Nov 2005 13:19:49 GMT (envelope-from nobody) Message-Id: <200511091319.jA9DJnlB050266@www.freebsd.org> Date: Wed, 9 Nov 2005 13:19:49 GMT From: Victor Snezhko To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-2.3 Cc: Subject: kern/88725: netinet6 updates in -CURRENT cause panic when using user-level ppp X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Nov 2005 13:20:26 -0000 >Number: 88725 >Category: kern >Synopsis: netinet6 updates in -CURRENT cause panic when using user-level ppp >Confidential: no >Severity: serious >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Wed Nov 09 13:20:19 GMT 2005 >Closed-Date: >Last-Modified: >Originator: Victor Snezhko >Release: 7.0-CURRENT >Organization: IndorSoft Ltd. >Environment: FreeBSD freebsd.indorsoft.ru 7.0-CURRENT FreeBSD 7.0-CURRENT #12: Sat Nov 5 19:24:55 NOVT 2005 root@freebsd.indorsoft.ru:/home/vvs/obj/usr/src/sys/VVS i386 cvsupped on 2005.10.21.16.25.00, on 2005.11.06 problem is still here. I use custom config but in the GENERIC problem remains. The problem is reproducible at least on i386 (including virtual machine) and on amd64. >Description: The changes to netinet6 committed on 2005.10.21.16.23.01 break user-level ppp. After these changes, when I start /usr/sbin/ppp, I experience panic. Here is the backtrace analysis: /var/crash # kgdb /usr/obj/usr/src/sys/VVS/kernel /var/crash/vmcore.27 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd". Unread portion of the kernel message buffer: kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xdeadc0e6 fault code = supervisor read, page not present instruction pointer = 0x20:0xc066c182 stack pointer = 0x28:0xc6082cc0 frame pointer = 0x28:0xc6082ce8 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = resume, IOPL = 0 current process = 36 (swi4: clock sio) panic: from debugger cpuid = 0 Uptime: 1m25s Dumping 63 MB (3 chunks) chunk 0: 1MB (159 pages) ... ok chunk 1: 62MB (15856 pages) 46 30 14 ... ok chunk 2: 1MB (256 pages) #0 doadump () at pcpu.h:165 165 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:165 #1 0xc0660824 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399 #2 0xc0660b39 in panic (fmt=0xc0856f00 "from debugger") at /usr/src/sys/kern/kern_shutdown.c:555 #3 0xc046cee1 in db_panic (addr=-1067007614, have_addr=0, count=-1, modif=0xc6082abc "") at /usr/src/sys/ddb/db_command.c:434 #4 0xc046ce78 in db_command (last_cmdp=0xc0947984, cmd_table=0x0, aux_cmd_tablep=0xc08bd97c, aux_cmd_tablep_end=0xc08bd998) at /usr/src/sys/ddb/db_command.c:403 #5 0xc046cf40 in db_command_loop () at /usr/src/sys/ddb/db_command.c:454 #6 0xc046eb59 in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_main.c:221 #7 0xc06793a4 in kdb_trap (type=12, code=0, tf=0xc6082c80) at /usr/src/sys/kern/subr_kdb.c:473 #8 0xc0821ac8 in trap_fatal (frame=0xc6082c80, eva=3735929062) at /usr/src/sys/i386/i386/trap.c:846 #9 0xc0821152 in trap (frame= {tf_fs = 8, tf_es = 40, tf_ds = 40, tf_edi = -1054618496, tf_esi = -1054756736, tf_ebp = -972542744, tf_isp = -972542804, tf_ebx = 1, tf_edx = -1030106232, tf_ecx = -559038242, tf_eax = 83559, tf_trapno = 12, tf_err = 0, tf_eip = -1067007614, tf_cs = 32, tf_eflags = 589826, tf_esp = -1054618496, tf_ss = 0}) at /usr/src/sys/i386/i386/trap.c:269 ---Type to continue, or q to quit--- #10 0xc080ec2a in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #11 0xc066c182 in softclock (dummy=0x0) at /usr/src/sys/kern/kern_timeout.c:220 #12 0xc064e260 in ithread_loop (arg=0xc121b080) at /usr/src/sys/kern/kern_intr.c:547 #13 0xc064d668 in fork_exit (callout=0xc064e118 , arg=0xc121b080, frame=0xc6082d38) at /usr/src/sys/kern/kern_fork.c:789 #14 0xc080ec8c in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:208 (kgdb) up 11 #11 0xc066c182 in softclock (dummy=0x0) at /usr/src/sys/kern/kern_timeout.c:220 220 if (c->c_time != curticks) { (kgdb) list 215 curticks = softticks; 216 bucket = &callwheel[curticks & callwheelmask]; 217 c = TAILQ_FIRST(bucket); 218 while (c) { 219 depth++; 220 if (c->c_time != curticks) { 221 c = TAILQ_NEXT(c, c_links.tqe); 222 ++steps; 223 if (steps >= MAX_SOFTCLOCK_STEPS) { 224 nextsoftcheck = c; (kgdb) print c $1 = (struct callout *) 0xdeadc0de (kgdb) print *bucket $2 = {tqh_first = 0xc1644020, tqh_last = 0xc1644020} (kgdb) print steps $3 = 1 (kgdb) print *(bucket->tqh_first) $4 = {c_links = {sle = {sle_next = 0xdeadc0de}, tqe = {tqe_next = 0xdeadc0de, tqe_prev = 0xdeadc0de}}, c_time = -559038242, c_arg = 0xdeadc0de, c_func = 0xdeadc0de, c_mtx = 0xdeadc0de, c_flags = -559038242} The following patch from John Baldwin (intended for testing only) doesn't help - symptoms remain the same: Index: nd6.c =================================================================== RCS file: /usr/cvs/src/sys/netinet6/nd6.c,v retrieving revision 1.62 diff -u -r1.62 nd6.c --- nd6.c 22 Oct 2005 05:07:16 -0000 1.62 +++ nd6.c 3 Nov 2005 19:56:42 -0000 @@ -398,7 +398,7 @@ if (tick < 0) { ln->ln_expire = 0; ln->ln_ntick = 0; - callout_stop(&ln->ln_timer_ch); + callout_drain(&ln->ln_timer_ch); } else { ln->ln_expire = time_second + tick / hz; if (tick > INT_MAX) { ====================================================================== I have tried 2 attempts to find a cause of the callwheel corruption: 1) I wrote a checking function that searched corrupted entries in a callwheel and panics if any. This function was called from every place in kern/kern_timeout.c that could modify the callwheel. No success - callwheel is modified elsewhere. 2) I tried to extend trash_dtor() in vm/uma_dbg.c in the following way to find what element of the callwheel is freed before being disarmed. (Warning: this patch may be not 64bit-ready in the pointer casts/comparisons) --- uma_dbg.c.orig Mon Nov 7 23:05:09 2005 +++ uma_dbg.c Tue Nov 8 17:37:24 2005 @@ -41,6 +41,8 @@ #include #include #include +#include +#include #include #include @@ -86,8 +88,33 @@ { int cnt; u_int32_t *p; + struct callout *c; + struct callout_tailq *bucket; + int i; cnt = size / sizeof(uma_junk); + + mtx_lock_spin(&callout_lock); + + for (i = 0; i < callwheelsize; ++i) { + bucket = &callwheel[i]; + for (c = TAILQ_FIRST(bucket); c != NULL; + c = TAILQ_NEXT(c, c_links.tqe)) { + long c2 = (long)c; + long mem2 = (long)mem; + if ((u_int32_t)c == uma_junk) { + kdb_enter("trash_dtor: uma_junk found in a "\ + "callwheel element"); + break; + } + if (c2 >= mem2 && c2 < mem2 + size) { + kdb_enter("trash_dtor: found invalid "\ + "callwhel element"); + } + } + } + + mtx_unlock_spin(&callout_lock); for (p = mem; cnt > 0; cnt--, p++) *p = uma_junk; ====================================================================== and kdb_enter is called here: if ((u_int32_t)c == uma_junk) { ==> kdb_enter("trash_dtor: uma_junk found in a "\ "callwheel element"); I.e. this check founds a callwheel element that was already freed and filled with uma_junks. There is a side effect: applying the last patch causes the panic to be much less reproducible. When panic doesn't occur, ppp works. >How-To-Repeat: cvsup to the -CURRENT as of 2005.10.21.16.25.00 or later, recompile and install the kernel using GENERIC config. With a new kernel, start /usr/sbin/ppp. A few seconds (up to 3 on my Celeron-600) after start, when the callwheel in kern/kern_timeout.c is cycled over, the panic will occur. >Fix: There is only a workaround: disabling INET6 in the kernel config helps. >Release-Note: >Audit-Trail: >Unformatted: