Date: Tue, 23 Dec 2008 01:18:21 -0500 From: "Dylan Cochran" <a134qaed@gmail.com> To: freebsd-stable <freebsd-stable@freebsd.org> Subject: Re: Hard lock on 7.1-RC1 Message-ID: <bdf82f800812222218j3c77f564gf9b8e48bbbda7ca2@mail.gmail.com> In-Reply-To: <bdf82f800812211205k1a90c926q473f468126d58d7f@mail.gmail.com> References: <bdf82f800812211205k1a90c926q473f468126d58d7f@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Dec 21, 2008 at 3:05 PM, Dylan Cochran <a134qaed@gmail.com> wrote: > I'm hitting a strange lockup on 7.1-RC1, where some socket operations > seem to stall, as well as basic file operations. The only reproducable > way I have of triggering it is by doing multiple inserts into > phpmyadmin on lighttpd+fastcgi php5 + mysql51-server, though this > isn't the only thing which triggers it, just the only one which is > semi reliable. I've also reproduced this on another machine, set up > specifically to rule out any machine specific problems (as they have > different drive controllers, one uses gjournal, etc). > > I inititially built a kernel with SW_WATCHDOG, and attempted to use > watchdogd and DDB to get an output from show locks, but the watchdogd > hasn't panicked the machine, so at least devfs is still unlocked; I'm > not able to get physical access to the machine until monday. > > The bug was introduced as far as I can tell, between 7.1-BETA2 and 7.1-RC= 1. > > Any suggestions on what I can test for tommorow? > I updated the kernel source to RELENG_7_1 as of a few hours ago, and built with DEBUG_VFS_LOCKS as well. Luckily the backtrace included the operating it was at before the watchdog, which seems to be kern_sendfile(). I'm no expert at kernel debugging, so any assistance on tracking this down further would be greatly appreciated. And, as promised, here is the output of script after the watchdog induced p= anic: Script started on Tue Dec 23 01:05:56 2008 # cu -l cua01 interrupt total irq4: sio0 623 irq6: fdc0 1 irq17: fwohci0 3 irq18: rl0 uhci2++ 60718 irq23: rl1 ehci0 206 cpu0: timer 514596 Total 576147 KDB: stack backtrace: db_trace_self_wrapper(c0b55b52,e66e0ae0,c07615e9,c0b50617,8ca93,...) at db_trace_self_wrapper+0x26 kdb_backtrace(c0b50617,8ca93,0,c41a7690,2,...) at kdb_backtrace+0x29 hardclock(0,c07ff29d,0,0,4,...) at hardclock+0x1f9 lapic_handle_timer(e66e0b08) at lapic_handle_timer+0x9c Xtimerint() at Xtimerint+0x1f --- interrupt, eip =3D 0xc07ff29d, esp =3D 0xe66e0b48, ebp =3D 0xe66e0c34 -= -- kern_sendfile(c41a7690,e66e0cfc,0,0,0,...) at kern_sendfile+0x90d do_sendfile(e66e0d2c,c0aba265,c41a7690,e66e0cfc,20,...) at do_sendfile+0xb1 sendfile(c41a7690,e66e0cfc,20,16,e66e0d2c,...) at sendfile+0x13 syscall(e66e0d38) at syscall+0x335 Xint0x80_syscall() at Xint0x80_syscall+0x20 --- syscall (393, FreeBSD ELF32, sendfile), eip =3D 0x282cb0cb, esp =3D 0xbfbfc7cc, ebp =3D 0xbfbfe848 --- KDB: enter: watchdog timeout [thread pid 1288 tid 100060 ] Stopped at kdb_enter_why+0x3a: movl $0,kdb_why db> show lock db> p=08 =08show all proc pid ppid pgrp uid state wmesg wchan cmd 1600 902 902 0 R watchdogd 1470 1469 1470 0 S+ ttyin 0xc418fc10 csh 1469 1 1469 0 Ss+ wait 0xc46032b8 login 1468 1 1468 0 Ss+ ttyin 0xc41ac810 getty 1427 1 1427 0 Ss nanslp 0xc0c7dc44 cron 1420 1 1420 0 Ss select 0xc0c88eb8 sshd 1419 1289 1289 80 SJ accept 0xc445ab9a php-cgi 1418 1289 1289 80 SJ accept 0xc445ab9a php-cgi 1417 1289 1289 80 SJ accept 0xc445ab9a php-cgi 1416 1289 1289 80 SJ accept 0xc445ab9a php-cgi 1415 1289 1289 80 SJ accept 0xc445ab9a php-cgi 1414 1289 1289 80 SJ accept 0xc445ab9a php-cgi 1413 1289 1289 80 SJ accept 0xc445ab9a php-cgi 1412 1289 1289 80 SJ accept 0xc445ab9a php-cgi 1411 1289 1289 80 SJ accept 0xc445ab9a php-cgi 1410 1289 1289 80 SJ accept 0xc445ab9a php-cgi 1409 1289 1289 80 SJ accept 0xc445ab9a php-cgi 1408 1289 1289 80 SJ accept 0xc445ab9a php-cgi 1407 1289 1289 80 SJ accept 0xc445ab9a php-cgi --More-- 1406 1289 1289 80 SJ accept 0xc445ab9a php-cgi --More-- 1405 1289 1289 80 SJ accept 0xc445ab9a php-cgi 1404 1289 1289 80 SJ accept 0xc445ab9a php-cgi 1403 1300 1300 80 SJ accept 0xc445a6ba php-cgi 1402 1300 1300 80 SJ accept 0xc445a6ba php-cgi 1401 1300 1300 80 SJ accept 0xc445a6ba php-cgi 1400 1300 1300 80 SJ accept 0xc445a6ba php-cgi 1399 1300 1300 80 RJ php-cgi 1398 1300 1300 80 SJ accept 0xc445a6ba php-cgi 1397 1300 1300 80 SJ accept 0xc445a6ba php-cgi 1396 1300 1300 80 SJ accept 0xc445a6ba php-cgi 1395 1300 1300 80 SJ accept 0xc445a6ba php-cgi 1394 1300 1300 80 SJ accept 0xc445a6ba php-cgi 1393 1300 1300 80 SJ accept 0xc445a6ba php-cgi 1392 1300 1300 80 SJ accept 0xc445a6ba php-cgi 1391 1300 1300 80 SJ accept 0xc445a6ba php-cgi 1390 1300 1300 80 SJ accept 0xc445a6ba php-cgi 1389 1300 1300 80 SJ accept 0xc445a6ba php-cgi 1388 1300 1300 80 SJ accept 0xc445a6ba php-cgi 1322 1 1322 25 SsJ pause 0xc4562b40 sendmail 1316 1 1316 0 RsJ sendmail --More-- 1306 1 1306 0 SsJ select 0xc0c88eb8 sshd 1300 1288 1300 80 SsJ wait 0xc43beae0 initial thread 1289 1288 1289 80 SsJ wait 0xc4564000 initial thread 1288 1 1287 80 RJ CPU 0 lighttpd 1286 1173 1170 88 R+J (threaded) mysqld 100114 S sigwait 0xe67a2be0 mysqld 100090 S ucond 0xc4359580 mysqld 100092 RunQ mysqld 100091 RunQ mysqld 100089 S ucond 0xc43598c0 mysqld 100088 S ucond 0xc4358c80 mysqld 100087 S ucond 0xc4094780 mysqld 100086 S ucond 0xc4359140 mysqld 100049 S select 0xc0c88eb8 initial thread 1173 1 1170 88 S+J wait 0xc43be2b8 sh 920 1 920 0 R+ (threaded) venti 100072 S ucond 0xc4358300 venti 100071 S accept 0xc445b03a venti 100070 S ucond 0xc4358200 venti 100069 S ucond 0xc4095980 venti --More-- 100068 S ucond 0xc41b1e00 venti 100067 S ucond 0xc4358100 venti 100066 S ucond 0xc4358280 venti 100065 S ucond 0xc41a45c0 venti 100064 S accept 0xc445a1da venti 100063 RunQ venti 100053 RunQ venti 902 1 902 0 Ss wait 0xc4343570 watchdogd 891 1 891 102 Rs gmond 770 1 770 0 Ss select 0xc0c88eb8 devd 338 1 338 65 Ss select 0xc0c88eb8 dhclient 322 1 49 0 S+ select 0xc0c88eb8 dhclient 48 0 0 0 SL sdflush 0xc0c95228 [softdepflush] 47 0 0 0 RL [syncer] 46 0 0 0 SL vlruwt 0xc4342000 [vnlru] 45 0 0 0 SL psleep 0xc0c89324 [bufdaemon] 44 0 0 0 SL pgzero 0xc0c95e00 [pagezero] 43 0 0 0 SL psleep 0xc0c95a18 [vmdaemon] 42 0 0 0 SL psleep 0xc0c959e0 [pagedaemon] 41 0 0 0 SL waiting_ 0xc0c8b0b4 [sctp_iterator] --More-- 40 0 0 0 WL [irq7: ppbus0 ppc0] 39 0 0 0 WL [irq1: atkbd0] 38 0 0 0 WL [irq15: ata1] 37 0 0 0 WL [irq14: ata0] 36 0 0 0 WL [swi0: sio] 35 0 0 0 SL - 0xc418d83c [fdc0] 34 0 0 0 SL - 0xc4097000 [fw0_probe] 33 0 0 0 SL - 0xc408c200 [fw0_taskq] 32 0 0 0 SL usbevt 0xc4013210 [usb4] 31 0 0 0 WL [irq23: rl1 ehci0] 30 0 0 0 SL usbevt 0xc407f210 [usb3] 29 0 0 0 SL usbevt 0xc406c210 [usb2] 28 0 0 0 WL [irq18: rl0 uhci2++] 27 0 0 0 SL usbevt 0xc405c210 [usb1] 26 0 0 0 WL [irq19: uhci1] 25 0 0 0 SL usbtsk 0xc0c7b234 [usbtask-dr] 24 0 0 0 SL usbtsk 0xc0c7b220 [usbtask-hc] 23 0 0 0 SL usbevt 0xc4024210 [usb0] 22 0 0 0 WL [irq16: uhci0 uhci3] 21 0 0 0 WL [irq9: acpi0] --More-- 20 0 0 0 WL [swi6: task queue] 19 0 0 0 WL [swi6: Giant taskq] 18 0 0 0 SL - 0xc3fe3700 [thread taskq] 17 0 0 0 WL [swi5: +] 9 0 0 0 SL - 0xc3fe3880 [acpi_task_2] 8 0 0 0 SL - 0xc3fe3880 [acpi_task_1] 7 0 0 0 SL - 0xc3fe3880 [acpi_task_0] 16 0 0 0 WL [swi2: cambio] 6 0 0 0 SL ccb_scan 0xc0c4b054 [xpt_thrd] 5 0 0 0 SL - 0xc3fe3a80 [kqueue taskq] 15 0 0 0 SL - 0xc0c7da74 [yarrow] 4 0 0 0 SL - 0xc0c7b62c [g_down] 3 0 0 0 SL - 0xc0c7b628 [g_up] 2 0 0 0 SL - 0xc0c7b620 [g_event] 14 0 0 0 WL [swi3: vm] 13 0 0 0 WL [swi4: clock sio] 12 0 0 0 WL [swi1: net] 11 0 0 0 RL [idle: cpu0] 1 0 1 0 SLs wait 0xc3f16ae0 [init] 10 0 0 0 SL audit_wo 0xc0c94c64 [audit] --More-- 0 0 0 0 SLs sched 0xc0c7b6e0 [swapper] db> Killed
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bdf82f800812222218j3c77f564gf9b8e48bbbda7ca2>