From owner-freebsd-current@FreeBSD.ORG Wed Nov 10 00:30:36 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 12DC716A4D0 for ; Wed, 10 Nov 2004 00:30:36 +0000 (GMT) Received: from rproxy.gmail.com (rproxy.gmail.com [64.233.170.199]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1533843D49 for ; Wed, 10 Nov 2004 00:30:35 +0000 (GMT) (envelope-from vincepoy@gmail.com) Received: by rproxy.gmail.com with SMTP id a36so121630rnf for ; Tue, 09 Nov 2004 16:30:34 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=ATlQq0aOmbZKgxsByYm59HBfv5DukPPeJaB2gg2UjOAV7gdeBPs7knA4vaV/868cWH3f49E2dXdSzdsdBQNH2AFz4H7tE70h33a+qIvRy+Lz9miq/PyDjcphg3P5zbBo2E33hQpgF6thY/UCwo4m52hexas2xyg0ab6aeXvMVow= Received: by 10.38.73.9 with SMTP id v9mr405376rna; Tue, 09 Nov 2004 16:30:34 -0800 (PST) Received: by 10.38.14.49 with HTTP; Tue, 9 Nov 2004 16:30:34 -0800 (PST) Message-ID: <429af92e0411091630439e22e4@mail.gmail.com> Date: Tue, 9 Nov 2004 16:30:34 -0800 From: Vincent Poy To: Robert Watson In-Reply-To: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit References: <429af92e0411082025680d70e3@mail.gmail.com> cc: current@freebsd.org Subject: Re: latest -CURRENT kernel panic X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Vincent Poy List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Nov 2004 00:30:36 -0000 On Tue, 9 Nov 2004 13:39:47 +0000 (GMT), Robert Watson wrote: > > On Mon, 8 Nov 2004, Vincent Poy wrote: > > > I updated from a March 6, 2004 -CURRENT which had been running without > > issues to October 19, 2004 -CURRENT. All -CURRENT since October 19, > > 2004 to the latest November 5, 2004 -CURRENT have been panicing with the > > following: > > Looks like it's likely a bug in the if_xl device driver. Looks like > Garance has already pointed you at the debugging guide in the handbook, in > particular using a serial console via a second box to avoid having to > type. In addition to that useful stuff, it would be useful to have you > try the following: A second box would work except this is running on a notebook sitting on the floor due to space issues. The kernel crash dump method seems to be what I'm looking for except my swap device is 256MB while I have 2048MB of RAM. What I did yesterday was unplugged the notebook and took it next to another computer so I can type the output and hopefully the notebook wouldn't reboot or shutoff before I finish typing. Guess the other alternative is to just take a picture of the screen's output with a digital camera and then type it out. =) Is it possible to set up the serial console but only plug the serial cable in after a panic? > - if debug.mpsafenet=1, please try temporarily setting debug.mpsafenet=0 > in loader.conf and rebooting to see if the problem goes away. This > might suggest a locking problem in the xl driver, for example. You're right that it's on debug.mpsafenet=1 since from reading the -current list, it seems that the dc or was it de driver that required the debug.mpsafenet=0 and I didn't hear anyone complain about panics with the xl driver. I've added debug.mpsafenet=0 to /boot/loader.conf and will try it on the next reboot. What I noticed was the problem is not exactly identical each time. The one before the one posted below IIRC said something about cannot lock free mbuf. > - Could you use gdb or addr2line to convert xl_intr+0x262 below into a > source line number? How exactly do I do the above since I noticed ddb doesn't have a addr2line command in it? > - Could you run "show pcpu", "show locks", and "ps" and provide that > output (serial console highly recommended). Will do that on the next panic if it occurs after the debug.mpsafenet=1 setting. > - Could you tell me what sort of load the system is experiencing -- in > particular, does it have multiple network interfaces, is it performing > web serving, packet forwarding, bridging, XWindows, etc? It has a xl0, fwe0, ath0, plip0, lo0, pflog0, pfsync0 interface according to ifconfig -a but I'm only using the xl0 interface as the ath0 is not configured. The xl0 is basically doing ipfw/dummynet for outgoing traffic shaping of 0-80 byte TCP Ack packets as well as acting as a outgoing router for the LAN of 2 windows machines. It is performing httpd except during the last panic, my apache wasn't working as I am in the process of going from 1.2 to 2.0. I do have x.org as well as KDE installed but I'm only running it as needed which I haven't ran for the last year until about 2 hours ago after converting from XFree86 to X.Org and then to kde3.3. Typical load looks like this: USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND root 27 0.5 0.0 0 12 ?? RL 2:22PM 1:13.28 [swi4: clock sio] root 0 0.0 0.0 0 4 ?? DLs 2:22PM 0:00.02 [swapper] root 1 0.0 0.0 720 344 ?? ILs 2:22PM 0:00.01 /sbin/init -- root 2 0.0 0.0 0 12 ?? DL 2:22PM 0:00.61 [g_event] root 3 0.0 0.0 0 12 ?? DL 2:22PM 0:04.22 [g_up] root 4 0.0 0.0 0 12 ?? DL 2:22PM 0:03.61 [g_down] root 5 0.0 0.0 0 12 ?? DL 2:22PM 0:00.00 [thread taskq] root 6 0.0 0.0 0 12 ?? DL 2:22PM 0:00.00 [kqueue taskq] root 7 0.0 0.0 0 12 ?? IL 2:22PM 0:00.00 [acpi_task0] root 8 0.0 0.0 0 12 ?? IL 2:22PM 0:00.00 [acpi_task1] root 9 0.0 0.0 0 12 ?? IL 2:22PM 0:00.00 [acpi_task2] root 10 0.0 0.0 0 12 ?? DL 2:22PM 0:00.00 [ktrace] root 11 0.0 0.0 0 12 ?? RL 2:22PM 4:41.92 [idle: cpu0] root 12 0.0 0.0 0 12 ?? WL 2:22PM 0:00.00 [irq0: clk] root 13 0.0 0.0 0 12 ?? WL 2:22PM 0:00.00 [irq1: atkbd0] root 14 0.0 0.0 0 12 ?? WL 2:22PM 0:00.00 [irq3:] root 15 0.0 0.0 0 12 ?? WL 2:22PM 0:00.00 [irq4: sio0] root 16 0.0 0.0 0 12 ?? WL 2:22PM 0:00.00 [irq5: pcm0] root 17 0.0 0.0 0 12 ?? WL 2:22PM 0:00.00 [irq6: fdc0] root 18 0.0 0.0 0 12 ?? WL 2:22PM 0:00.00 [irq7: ppc0] root 19 0.0 0.0 0 12 ?? WL 2:22PM 0:00.00 [irq8: rtc] root 20 0.0 0.0 0 12 ?? WL 2:22PM 0:00.00 [irq9: acpi0] root 21 0.0 0.0 0 12 ?? WL 2:22PM 0:00.00 [irq10:] root 22 0.0 0.0 0 12 ?? WL 2:22PM 0:01.57 [irq11: cbb0 cbb1 root 23 0.0 0.0 0 12 ?? WL 2:22PM 0:00.00 [irq12: psm0] root 24 0.0 0.0 0 12 ?? WL 2:22PM 0:00.00 [irq13:] root 25 0.0 0.0 0 12 ?? WL 2:22PM 0:01.17 [irq14: ata0] root 26 0.0 0.0 0 12 ?? WL 2:22PM 0:00.09 [irq15: ata1] root 28 0.0 0.0 0 12 ?? WL 2:22PM 0:00.00 [swi3: vm] root 29 0.0 0.0 0 12 ?? WL 2:22PM 0:00.87 [swi1: net] root 30 0.0 0.0 0 12 ?? DL 2:22PM 0:00.54 [yarrow] root 31 0.0 0.0 0 12 ?? WL 2:22PM 0:00.00 [swi5:+] root 32 0.0 0.0 0 12 ?? WL 2:22PM 0:00.21 [swi6:+] root 33 0.0 0.0 0 12 ?? WL 2:22PM 0:00.00 [swi6: task queue root 34 0.0 0.0 0 12 ?? WL 2:22PM 0:00.00 [swi6: acpitaskq] root 35 0.0 0.0 0 12 ?? WL 2:22PM 0:00.00 [swi2: cambio] root 36 0.0 0.0 0 12 ?? DL 2:22PM 0:07.74 [acpi_thermal] root 37 0.0 0.0 0 12 ?? DL 2:22PM 0:00.00 [usb0] root 38 0.0 0.0 0 12 ?? DL 2:22PM 0:00.00 [usbtask] root 39 0.0 0.0 0 12 ?? DL 2:22PM 0:00.00 [usb1] root 40 0.0 0.0 0 12 ?? DL 2:22PM 0:00.00 [cbb0] root 41 0.0 0.0 0 12 ?? DL 2:22PM 0:00.00 [cbb1] root 42 0.0 0.0 0 12 ?? DL 2:22PM 0:00.03 [fdc0] root 43 0.0 0.0 0 12 ?? WL 2:22PM 0:00.00 [swi0: sio] root 44 0.0 0.0 0 12 ?? DL 2:22PM 0:00.01 [pagedaemon] root 45 0.0 0.0 0 12 ?? DL 2:22PM 0:00.00 [vmdaemon] root 46 0.0 0.0 0 12 ?? RL 2:22PM 0:06.67 [pagezero] root 47 0.0 0.0 0 12 ?? DL 2:22PM 0:17.49 [bufdaemon] root 48 0.0 0.0 0 12 ?? DL 2:22PM 0:04.29 [syncer] root 49 0.0 0.0 0 12 ?? DL 2:22PM 0:00.04 [vnlru] root 50 0.0 0.0 0 12 ?? DL 2:22PM 0:00.02 [hpt_wt] root 51 0.0 0.0 0 12 ?? IL 2:22PM 0:00.00 [nfsiod 0] root 52 0.0 0.0 0 12 ?? IL 2:22PM 0:00.00 [nfsiod 1] root 53 0.0 0.0 0 12 ?? IL 2:22PM 0:00.00 [nfsiod 2] root 54 0.0 0.0 0 12 ?? IL 2:22PM 0:00.00 [nfsiod 3] root 55 0.0 0.0 0 12 ?? DL 2:22PM 0:00.60 [schedcpu] root 145 0.0 0.0 1204 628 ?? Is 2:23PM 0:00.00 adjkerntz -i root 215 0.0 0.0 1440 900 ?? Ss 2:23PM 0:00.23 /sbin/natd -n xl0 root 295 0.0 0.0 488 364 ?? Is 2:23PM 0:00.00 /sbin/devd root 372 0.0 0.0 1328 876 ?? Ss 2:23PM 0:00.32 /usr/sbin/syslogd bind 391 0.0 0.2 4360 3544 ?? Ss 2:23PM 0:00.43 /usr/sbin/named - root 405 0.0 0.1 1408 1044 ?? Ss 2:23PM 0:00.01 /usr/sbin/rpcbind root 528 0.0 0.0 1304 880 ?? Ss 2:23PM 0:00.45 diskcheckd: ad2 8 root 539 0.0 0.0 1284 928 ?? Is 2:23PM 0:00.00 /usr/sbin/lpd root 572 0.0 0.0 1244 788 ?? Ss 2:23PM 0:00.01 /usr/sbin/usbd root 592 0.0 0.1 3348 2568 ?? Is 2:23PM 0:00.16 /usr/sbin/sshd root 597 0.0 0.1 3568 2824 ?? Ss 2:23PM 0:00.39 sendmail: accepti smmsp 603 0.0 0.1 3444 2700 ?? Ss 2:23PM 0:00.12 sendmail: Queue r root 624 0.0 0.1 1384 1088 ?? Ss 2:23PM 0:00.03 /usr/sbin/cron -s root 829 0.0 0.1 6080 3120 ?? Is 2:24PM 0:00.05 sshd: vince [priv vince 842 0.0 0.2 6080 3492 ?? S 2:24PM 0:00.63 sshd: vince@ttyp0 root 870 0.0 0.0 1304 880 ?? Ss 2:24PM 0:00.44 diskcheckd: ad2 8 uucp 884 0.0 0.1 2224 1732 ?? Is 2:24PM 0:00.00 /usr/local/sbin/f uucp 887 0.0 0.1 2232 1896 ?? Is 2:24PM 0:00.01 /usr/local/sbin/h root 955 0.0 0.3 6880 6188 ?? Ss 2:24PM 0:00.12 /usr/local/bin/pe root 967 0.0 0.3 7100 6444 ?? Ss 2:24PM 0:00.11 /usr/local/bin/pe root 986 0.0 0.0 1240 732 ?? Is 2:24PM 0:00.00 /usr/sbin/moused root 1015 0.0 0.1 1472 1156 ?? Is 2:24PM 0:00.02 /usr/sbin/inetd - root 1055 0.0 0.2 6080 3128 ?? Is 2:26PM 0:00.05 sshd: vince [priv vince 1057 0.0 0.2 6080 3480 ?? S 2:26PM 0:01.06 sshd: vince@ttyp1 nobody 7012 0.0 0.1 3104 1832 ?? Is 3:16PM 0:00.02 identd -w -t120 vince 843 0.0 0.1 2628 2212 p0 Ss 2:24PM 0:00.11 -tcsh (tcsh) vince 961 0.0 0.5 11484 9380 p0 T 2:24PM 0:01.83 pine root 1029 0.0 0.0 1296 944 v0 Is+ 2:24PM 0:00.00 /usr/libexec/gett root 1030 0.0 0.0 1296 944 v1 Is+ 2:24PM 0:00.00 /usr/libexec/gett root 1031 0.0 0.0 1296 944 v2 Is+ 2:24PM 0:00.00 /usr/libexec/gett root 1032 0.0 0.0 1296 944 v3 Is+ 2:24PM 0:00.00 /usr/libexec/gett root 1033 0.0 0.0 1296 944 v4 Is+ 2:24PM 0:00.00 /usr/libexec/gett root 1034 0.0 0.0 1296 944 v5 Is+ 2:24PM 0:00.00 /usr/libexec/gett root 1035 0.0 0.0 1296 944 v6 Is+ 2:24PM 0:00.00 /usr/libexec/gett root 1036 0.0 0.0 1296 944 v7 Is+ 2:24PM 0:00.00 /usr/libexec/gett root 1037 0.0 0.0 1296 944 v8 Is+ 2:24PM 0:00.00 /usr/libexec/gett root 1038 0.0 0.0 1296 944 v9 Is+ 2:24PM 0:00.00 /usr/libexec/gett root 1039 0.0 0.0 1296 944 va Is+ 2:24PM 0:00.00 /usr/libexec/gett root 1040 0.0 0.0 1296 944 vb Is+ 2:24PM 0:00.00 /usr/libexec/gett > - Are you running with any tweaked network settings, such as > net.isr.enable? My kernel config does have maxusers=512 and 65536 nmbclusters. IPFIREWALL, IPDIVERT, DUMMYNET, BRIDGE, HZ=1000 are also in the kernel config. There is also this as a temporary workaround for the kmem_map too small panic issue ever since it started about 9 or so months ago. Don't know if this issue was fixed. options VM_KMEM_SIZE_MAX=(384*1048576) options VM_KMEM_SIZE_SCALE=2 For sysctl, these are the changes: net.inet.icmp.bmcastecho 1 -> 0 net.inet.tcp.log_in_vain=0 -> 1 net.inet.udp.log_in_vain=0 -> 1 kern.maxfiles=16424 -> 65536 kern.maxfilesperproc=14781 -> 32768 net.inet.tcp.sendspace=32768 -> 65536 kern.ipc.somaxconn=128 -> 8192 net.inet.icmp.drop_redirect=0 -> 1 net.inet.icmp.log_redirect=0 -> 1 /sbin/sysctl -w kern.ipc.maxsockbuf=262144 -> 8388608 Thanks! Cheers, Vince > > Slab at 0xc3d75fa8, freei 14 = 0. > > panic: Duplicate free of item 0xc3d75e00 from zone 0xc143e9a0(Mbuf) > > cpuid = 0 > > KDB: enter: panic > > [thread 100019] > > Stopped at kdb_enter+0x2c: leave > > db> trace > > kdb_enter(c0852d05,100,c38b0980,e,c3d75000) at kdb_enter+0x2c > > panic(c086ce77,c3d75e00,c143e9a0,c08515e0,c086ce5b) at panic+0x17f > > uma_dbg_free(c143e9a0,0,c3d75e00) at uma_dbg_free+0x10e > > uma_zfree_arg(c143e9a0,c3d75e00,0) at uma_zfree_arg+0x1f6 > > m_freem(c3d75e00,c3a0b21c,c3a0b2f4,c3a0cd80,c38b7400) at m_freem+0x3e > > xl_intr(c3a09000,0,c38b3c00,0,c0612120) at xl_intr+0x262 > > ithread_loop(c38b7400,e9f95d48,c38b7400,c0612120,0) at ithread_loop+0x186 > > fork_exit(c0612120,c38b7400,e9f95d48) at fork_exit+0x7e > > fork_trampoline() at fork_trampoline+0x8 > > --- trap 0x1, eip = 0, esp = 0xe9f95d7c, ebp = 0 --- > > db> > > > > Any ideas what could be causing this? Is there anyway to write > > everything above to a file instead of manually typing everything? > > Thanks. > > > > Cheers, > > Vince > > _______________________________________________ > > freebsd-current@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-current > > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > > > >