From owner-freebsd-stable@FreeBSD.ORG Fri Feb 6 19:26:31 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1B26316A4CE for ; Fri, 6 Feb 2004 19:26:31 -0800 (PST) Received: from book.riviera.org.uk (book.riviera.org.uk [146.101.136.75]) by mx1.FreeBSD.org (Postfix) with SMTP id 7BEAB43D4C for ; Fri, 6 Feb 2004 19:25:15 -0800 (PST) (envelope-from elliot@devnull.org.uk) Received: (qmail 5390 invoked by uid 0); 7 Feb 2004 03:25:12 -0000 Received: from eddie.riviera.org.uk (HELO ?192.168.254.200?) (elliot@devnull.org.uk@213.208.108.167) by book.riviera.org.uk with SMTP; 7 Feb 2004 03:25:12 -0000 Mime-Version: 1.0 (Apple Message framework v612) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <481C8DB1-591D-11D8-8420-000A95765552@devnull.org.uk> Content-Transfer-Encoding: 7bit From: Elliot Moore Date: Sat, 7 Feb 2004 03:25:30 +0000 To: freebsd-stable@freebsd.org X-Mailer: Apple Mail (2.612) Subject: FreeBSD4.9 - panic: timeout table full X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 07 Feb 2004 03:26:31 -0000 Hello all, I have a repetitive kernel panic on FreeBSD-4.9 [fresh installed from CD - no CVS upgrades] ========================= panic: timeout table full syncing disks... panic: timeout table full Uptime: 1d0h18m7s Automatic reboot in 15 seconds - press a key on the console to abort Rebooting... ========================= This has happened almost once every day at random times since the machine was put into action There are no nasty errors or warnings in the syslog all.log/dmesg There are no obvious patterns or chain of events that build up beforehand It happens when running GENERIC kernel and on different custom kernels It happens when disks in PIO4 Mode or in UDMA mode It happens - randomly - when busy or idle This is the first time it as auto-rebooted - previous times it just hangs @ syncing disks... and needs power-cycling. The disks were in PIO4 mode for this first auto-reboot So I will enable crash-dumps - I guess I may be able to find the culprit with gdb SoftUpdates anabled, ata write cache off, ata tagged Q off maxusers=512 NMBCLUSERS=32768 I have read about these timeout panics in 'man crash' and looking at the kernel source, a timeout can be called by many things and that when there are none free, it panics :( I have worked out the value of ncallouts on my system its: ncallout = 16 + maxproc + maxfiles; = 73764 16 + 8212 + 65536 * [Q] Does anyone know what are the likely reasons that would case BSD to runout of timeouts/ncallouts - 73764 worth? man 9 timeout: timeout -- execute a function after a specified length of time. * [Q] ??: either the number of free ncallouts is depleating over time or something has stopped responding, causing a rapid increase in the number of timeouts called or something has stopped clearing its timeout handles - a bad driver? * [Q] Does somebody know of a method to ask the kernel how many timeouts are assigned and what called them? To be able to find out how many are left/being used and therefore workout the rate of depletion would be helpful in debugging - AND to 'throw in the towel' and reboot safely before it dies! Can this be done? [some inquiry code or a kernel patch] Is there something already in FreeBSD that can do this? The only quirk i see at boot is this in dmesg: pci0: (vendor=0x8086, dev=0x24c3) at 31.3 irq 7 And sometimes (note: not all the time) this message after boot or midway thru the day: stray irq 7 * [Q] This unknown card at irq7 I imagine from vendor this is the onboard Intel SMBus/I2C bridge. Could this play a part in this timeout panic? * [Q] is my kernel config at fault? (though GENERIC still paniced) * [Q] I have a 70 gig UFS+S filesystem (27067418 used inodes) is it normal for it to take an hour to fsck after the panic? I welcome suggestions to debug/fix this and i can supply more info it it helps ells... About the system ----------------- FreeBSD book 4.9-RELEASE FreeBSD 4.9-RELEASE #0: Wed Feb 4 12:33:00 GMT 2004 Mem 512Mb Intel P4 on a Tyan Trinity i845E [S2099GNNR] 2 x 120mb IDE Seagate ST3120026A/3.06 on Promise TX2 ATA133 RAID1 Inbuilt ethernet devices: Intel fxp and em Hardware is new Runs apache/php/djbdns/qmail+spamc+clamav/Courier-IMAP/proftpd Its very fast and the load average 0.02 + rarely goes over 1.0 + runs with approx 300Mb free Its not busy - i can compile kernels and packages at ease networking: no errors in stats BSD on the system - dmesg + kernel + sysctl output --------------------------------------------------- FreeBSD 4.9-RELEASE #0: Wed Feb 4 12:33:00 GMT 2004 root@book:/usr/src/sys/compile/BOOKV2 Timecounter "i8254" frequency 1193182 Hz CPU: Intel(R) Pentium(R) 4 CPU 2.66GHz (2659.10-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf29 Stepping = 9 Features=0xbfebfbff real memory = 536805376 (524224K bytes) avail memory = 518729728 (506572K bytes) Preloaded elf kernel "kernel" at 0xc02ef000. Warning: Pentium 4 CPU: PSE disabled Pentium Pro MTRR support enabled md0: Malloc disk Using $PIR table, 12 entries at 0xc00fde90 npx0: on motherboard npx0: INT 16 interface pcib0: on motherboard pci0: on pcib0 agp0: mem 0xe0000000-0xe3ffffff at device 0.0 on pci0 pcib1: at device 1.0 on pci0 pci1: on pcib1 pcib2: at device 30.0 on pci0 pci2: on pcib2 pci2: at 1.0 irq 12 atapci0: port 0xa400-0xa40f,0xa000-0xa003,0x9c00-0x9c07,0x9800-0x9803,0x9400-0x9407 mem 0xe6040000-0xe6043fff irq 15 at device 2.0 on pci2 ata2: at 0x9400 on atapci0 ata3: at 0x9c00 on atapci0 fxp0: port 0xa800-0xa83f mem 0xe6045000-0xe6045fff irq 11 at device 8.0 on pci2 fxp0: Ethernet address 00:e0:81:60:8e:49 inphy0: on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto em0: port 0xac00-0xac3f mem 0xe6020000-0xe603ffff,0xe6000000-0xe601ffff irq 10 at device 10.0 on pci2 em0: Speed:N/A Duplex:N/A isab0: at device 31.0 on pci0 isa0: on isab0 atapci1: port 0xf000-0xf00f,0-0x3,0-0x7,0-0x3,0-0x7 irq 0 at device 31.1 on pci0 ata0: at 0x1f0 irq 14 on atapci1 ata1: at 0x170 irq 15 on atapci1 pci0: (vendor=0x8086, dev=0x24c3) at 31.3 irq 7 orm0: