Date: Thu, 10 Apr 2003 13:50:48 +0200 (CEST) From: Oliver Fromme <olli@secnetix.de> To: freebsd-stable@FreeBSD.ORG, freebsd-hackers@FreeBSD.ORG Subject: panic: vinvalbuf: flush failed Message-ID: <200304101150.h3ABom0O034933@lurza.secnetix.de>
next in thread | raw e-mail | index | archive | help
Hi, We have a pretty serious problem with our news server crashing during the expire cronjob. This happened with 4.7-RELEASE, so we upgraded to 4.8-RELEASE recently, hoping that the problem might be fixed, but it isn't. The machine is a Compaq DL360-G2. I've searched the PR database as well as the mailing list archives for the panic string, but didn't find anything. What makes the problem even worse is the fact that the machine freezes after the "syncing disks" output. Normally it should reboot, because we have DDB_UNATTENDED in the kernel, but it doesn't work. The crash always happens shortly after INN's expire cronjob starts, shortly after 1:00am in the night (at that time, the CPU load and NFS traffic increases noticably). But it doesn't always happen, only every 3 to 4 days. On the other days, the expire job finishes without problems. The machine has pretty good network traffic (about 40 - 50 Mbit/s constantly), half of which is NNTP, and the other half is NFS. That's during normal operation -- during the expire job, the NFS traffic is even higher. The news spool and INN's overview database are on an NFS mount (a NetApp filer), as well as binaries, logfiles and everything else. The NFS mounts are v3+UDP, as far as I can tell (that should be the default). The network interface is a Broadcom BCM5701 gigabit one, connected to a Cisco switch with a VLAN trunk (there are several virtual VLAN interfaces on this trunk). If it matters, we're using IPFilter for packet filtering, plus IPFW+Dummynet for traffic shaping. The load on the machine is moderate (usually below 1.0). As far as I can tell, there is no resource shortage. There's plenty of RAM, free file descriptors, mbufs / mbuf clusters. Well, at least during normal operation. Maybe it is a bit different during the expire run at night. This is the console output: panic: vinvalbuf: flush failed syncing disks... At this point, the machine freezes completely, it does not display any numbers nor "done". It just sits there for hours. When I come in the morning, I break into DDB (fortunately that still works): Stopped at siointr1+0xf2: movl $0,brk_state2.757 db> db> trace siointr1(c884b000,e9bcaaa8,c033bb86,c884b000,e9bc0010) at siointr1+0xf2 siointr(c884b000) at siointr+0xb Xfastintr4(e94cba7c,110,c0393145,0) at Xfastintr4+0x16 nfs_asyncio(d51b97d0,0,0) at nfs_asyncio+0xf4 nfs_strategy(e9bcab14) at nfs_strategy+0x59 nfs_writebp(d51b97d0,1,e95bc1a0,e9bcabfc,c02a7040) at nfs_writebp+0xdc nfs_bwrite(e9bcaba0) at nfs_bwrite+0x16 nfs_flush(e9bb0600,c27f5900,2,c0434be0,1) at nfs_flush+0x68c nfs_fsync(e9bcac34) at nfs_fsync+0x19 nfs_sync(c8c1f400,2,c27f5900,c0434be0,c8c1f400) at nfs_sync+0x99 sync(c0434be0,0,c03861ec,c038aabc,100) at sync+0x63 boot(100,e94ef2a0,68c0c0,e9bcad00,c021ba99) at boot+0x8a panic(c038aabc,e95002c0,7d0,1,68c0c0) at panic+0x79 vinvalbuf(e95262c0,1,c8c3e280,e95bc1a0,100,0) at vinvalbuf+0x395 nfs_vinvalbuf(e95262c0,1,c8c3e280,e95bc1a0,1) at nfs_vinvalbuf+0x108 nfs_open(e9bcadfc,0,c911b440,e9bcaf80,e95262c0) at nfs_open+0xf5 vn_open(e9bcaec8,1,0,e95bc1a0,3) at vn_open+0x3d7 open(e95bc1a0,e9bcaf80,38229dac,0,0) at open+0xb8 syscall2(81c002f,2f,bfbf002f,0,0) at syscall2+0x1f5 Xint0x80_syscall() at Xint0x80_syscall+0x25 db> ps pid proc addr uid ppid pgrp flag stat wmesg wchan cmd 37692 e95bc1a0 e9bc8000 8 727 727 4004004 2 nnrpd 37657 e9bd2e00 e9bd3000 8 37656 37407 004005 2 expireover 37656 e95bf0c0 e9abf000 8 37410 37407 000084 3 wait e95bf0c0 sh 37410 e95bd860 e9b40000 8 37407 37407 004084 3 wait e95bd860 sh 37407 e95bfdc0 e991f000 0 37403 37407 004084 3 wait e95bfdc0 sh 37403 e95bd520 e9b5e000 0 103 103 000084 3 piperd e9440540 cron 31158 e95bc680 e9ba8000 8 727 727 004085 2 overchan 31157 e95bcea0 e9b68000 8 727 727 004084 3 sbwait e554e888 perl 31156 e95c0ac0 e95e2000 8 727 727 004484 2 innfeed 30033 e95bc340 e9bc1000 8 30031 30031 004086 3 ttyin c882b430 zsh 30031 e95c02a0 e9907000 0 30021 30031 004086 3 wait e95c02a0 sh 30021 e33fe380 e95b6000 0 30019 30021 2004086 3 pause e95b6260 zsh 30019 e95bc820 e9b99000 0 992 30019 000584 2 sshd 6279 e33fe040 e95c4000 0 1778 6279 004086 2 zsh 1778 e33ffbe0 e94dc000 0 1 1778 004186 3 wait e33ffbe0 login 1369 e33ff220 e950c000 0 1 1369 004086 3 ttyin c8c3c710 getty 1368 e34012a0 e943c000 0 1 1368 004086 3 ttyin c8c3b110 getty 1367 e33ffd80 e94c8000 0 1 1367 004086 3 ttyin c8c41f10 getty 1366 e3400400 e94ad000 0 1 1366 004086 3 ttyin c8c41b10 getty 1365 e33fff20 e94c4000 0 1 1365 004086 3 ttyin c8c44310 getty 1364 e33ff3c0 e94fd000 0 1 1364 004086 3 ttyin c8c32410 getty 1363 e33feee0 e9527000 0 1 1363 004086 3 ttyin c8a42b10 getty 1362 e3401440 e9433000 0 1 1362 004086 3 ttyin c885cd10 getty 1358 e33fe6c0 e9576000 0 1 1358 000084 2 syslogd 992 e33ff560 e9505000 0 1 992 000184 2 sshd 991 e33ff8a0 e94f5000 0 1 991 000084 2 snmpd 727 e33ffa40 e94e1000 8 1 727 000005 2 innd 109 e34000c0 e94b7000 25 1 109 2000184 2 sendmail 106 e3400260 e94b2000 0 1 106 000584 2 sendmail 103 e3400c20 e9499000 0 1 103 000484 2 cron 96 e34005a0 e94a9000 0 1 91 000084 2 nfsiod 95 e3400740 e94a5000 0 1 91 000084 2 nfsiod 94 e34008e0 e94a1000 0 1 91 000084 2 nfsiod 93 e3400a80 e949d000 0 1 91 000084 2 nfsiod 55 e3401100 e9444000 0 1 55 000484 2 ipmon 19 e3400f60 e9448000 0 1 19 000084 3 mfsidl e7ba6000 mount_mfs 6 e34015e0 e7bb4000 0 0 0 000204 2 syncer 5 e3401780 e7bb1000 0 0 0 000604 2 vnlru 4 e3401920 e7bae000 0 0 0 000604 2 bufdaemon 3 e3401ac0 e7bab000 0 0 0 000204 3 psleep c042af20 vmdaemon 2 e3401c60 e7ba8000 0 0 0 000604 2 pagedaemon 1 e3401e00 e3406000 0 0 1 004284 3 wait e3401e00 init 0 c0434be0 c04de000 0 0 0 000204 3 sched c0434be0 swapper db> panic panic: from debugger Uptime: 2d23h10m42s dumping to dev #da/0x20001, offset 3670056 dump 1279 1278 1277 1276 1275 1274 1273 1272 1271 1270 1269 1268 [...] 12 11 10 9 8 7 6 5 4 3 2 1 0 succeeded Automatic reboot in 15 seconds - press a key on the console to abort BIOS drive A: is disk0 BIOS drive C: is disk1 BIOS 637kB/1309676kB available memory FreeBSD/i386 bootstrap loader, Revision 0.8 (olli@monos.secnetix.net, Mon Mar 31 01:11:35 CEST 2003) Loading /boot/defaults/loader.conf /kernel text=0x2bb060 data=0x46b88+0x3842c syms=[0x4+0x3caa0+0x4+0x44979] Hit [Enter] to boot immediately, or any other key for command prompt. Booting [kernel]... Copyright (c) 1992-2003 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.8-RELEASE #0: Mon Mar 31 11:27:28 CEST 2003 olli@monos.secnetix.net:/usr/src/sys/compile/FARM Timecounter "i8254" frequency 1193182 Hz CPU: Intel(R) Pentium(R) III CPU family 1400MHz (1396.45-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x6b1 Stepping = 1 Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE> real memory = 1342156800 (1310700K bytes) avail memory = 1300033536 (1269564K bytes) Preloaded elf kernel "kernel" at 0xc04be000. Pentium Pro MTRR support enabled md0: Malloc disk npx0: <math processor> on motherboard npx0: INT 16 interface pcib1: <ServerWorks host to PCI bridge> on motherboard pci1: <PCI bus> on pcib1 ciss0: <Compaq Smart Array 5i> port 0x3000-0x30ff mem 0xf7ef0000-0xf7ef3fff,0xf7fc0000-0xf7ffffff irq 11 at device 4.0 on pci1 ciss0: using 256 of 1024 available commands ciss0: 1 logical drive configured ciss0: firmware 1.80 ciss0: 2 SCSI channels ciss0: signature 'CISS' ciss0: valence 1 ciss0: supported I/O methods 0xe<simple,performant,MEMQ> ciss0: active I/O method 0x3<simple> ciss0: 4G page base 0x00000000 ciss0: interrupt coalesce delay 1000us ciss0: interrupt coalesce count 16 ciss0: max outstanding commands 1024 ciss0: bus types 0x2<ultra3> ciss0: server name '' ciss0: heartbeat 0x30000033 ciss0: 1 logical drive ciss0: logical drive 1: RAID 0, 16896MB online bge0: <Broadcom BCM5701 Gigabit Ethernet, ASIC rev. 0x105> mem 0xf7fb0000-0xf7fbffff irq 5 at device 5.0 on pci1 bge0: Ethernet address: 00:08:02:a0:c5:06 miibus0: <MII bus> on bge0 brgphy0: <BCM5701 10/100/1000baseTX PHY> on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bge1: <Broadcom BCM5701 Gigabit Ethernet, ASIC rev. 0x105> mem 0xf7fa0000-0xf7faffff irq 10 at device 6.0 on pci1 bge1: Ethernet address: 00:08:02:a0:c5:07 miibus1: <MII bus> on bge1 brgphy1: <BCM5701 10/100/1000baseTX PHY> on miibus1 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto pcib0: <ServerWorks host to PCI bridge> on motherboard pci0: <PCI bus> on pcib0 pci0: <ATI Mach64-GR graphics accelerator> at 3.0 irq 7 pci0: <unknown card> (vendor=0x0e11, dev=0xb203) at 5.0 irq 3 pci0: <unknown card> (vendor=0x0e11, dev=0xb204) at 5.2 irq 15 isab0: <PCI to ISA bridge (vendor=1166 device=0201)> at device 15.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <ServerWorks CSB5 ATA100 controller> port 0-0x3,0x2000-0x200f,0x374-0x377,0x170-0x177,0x3f4-0x3f7,0x1f0-0x1f7 at device 15.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 ohci0: <OHCI (generic) USB controller> mem 0xf5ef0000-0xf5ef0fff irq 7 at device 15.2 on pci0 usb0: OHCI version 1.0, legacy support usb0: SMM does not respond, resetting usb0: <OHCI (generic) USB controller> on ohci0 usb0: USB revision 1.0 uhub0: (0x1166) OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 4 ports with 4 removable, self powered pcib2: <ServerWorks host to PCI bridge> on motherboard pci2: <PCI bus> on pcib2 pcib7: <ServerWorks host to PCI bridge> on motherboard pci7: <PCI bus> on pcib7 pcib3: <Host to PCI bridge> on motherboard pci3: <PCI bus> on pcib3 eisa0: <EISA bus> on motherboard mainboard0: <CPQ0724 (System Board)> on eisa0 slot 0 orm0: <Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xcbfff,0xee000-0xeffff on isa0 fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: <1440-KB 3.5" drive> on fdc0 drive 0 atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0 vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x100> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A, console sio1: configured irq 3 not in bitmap of probed irqs 0 ppc0: parallel port not found. DUMMYNET initialized (011031) ipfw2 initialized, divert disabled, rule-based forwarding enabled, default to accept, logging disabled IP Filter: v3.4.31 initialized. Default = pass all, Logging = enabled acd0: CDROM <CRN-8245B> at ata0-master PIO4 Mounting root from ufs:/dev/da0s1a da0 at ciss0 bus 0 target 0 lun 0 da0: <COMPAQ RAID 0 VOLUME OK> Fixed Direct Access SCSI-0 device da0: 135.168MB/s transfers da0: 17359MB (35553120 512 byte sectors: 255H 32S/T 4357C) WARNING: / was not properly dismounted bge0: gigabit link up bge0: gigabit link up The kernel config is derived from GENERIC. I've removed devices which are not needed, and added the following: options MAXDSIZ="(768*1024*1024)" options MAXSSIZ="(256*1024*1024)" options DFLDSIZ="(512*1024*1024)" options NMBCLUSTERS=32768 options INCLUDE_CONFIG_FILE # Include this file in kernel options CPU_ENABLE_SSE # Enable SSE/MMX2 instructions support. options USER_LDT # Allow user-level control of i386 LDT. options DDB # Enable the kernel debugger. options DDB_UNATTENDED # Don't drop into DDB for a panic. options KTRACE # Enable system-call tracing facility. pseudo-device vlan 1 # VLAN support pseudo-device stf # 6to4 IPv6 over IPv4 encapsulation options IPFIREWALL options IPFIREWALL_DEFAULT_TO_ACCEPT options IPFW2 # Use next-generation IPFW. options DUMMYNET options IPFILTER options IPFILTER_LOG options DEVICE_POLLING options HZ=1000 pseudo-device vn # Vnode driver, see vnconfig(8) options MSGBUF_SIZE=81920 options AUTO_EOI_1 options MAXCONS=16 # number of virtual consoles options SC_HISTORY_SIZE=400 # number of history buffer lines options ALT_BREAK_TO_DEBUGGER # <ENTER> ~ Ctrl-B device smbus device intpm device alpm device ichsmb device viapm device smb device iicbus device iicbb device ic device iic device iicsmb I can send the whole config if required, but there's really nothing else which is special. Also, if there's any more information that I should send, please let me know. I still have the crash dump from the manual "panic", in case I can do anything with it. I don't think it's a hardware problem, because we moved the news service to a different machine (identically equipped DL360-G2) without a difference. Does anyone have an idea what might cause the problem, and -- even more important -- how to fix it, or at least work around it? We can't really afford to have a dead news server for several hours every few days. Thanks a bunch! Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co KG, Oettingenstr. 2, 80538 München Any opinions expressed in this message may be personal to the author and may not necessarily reflect the opinions of secnetix in any way.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200304101150.h3ABom0O034933>