From owner-freebsd-hackers@FreeBSD.ORG Thu Apr 10 04:50:52 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2B58B37B401; Thu, 10 Apr 2003 04:50:52 -0700 (PDT) Received: from lurza.secnetix.de (lurza.secnetix.de [212.66.1.130]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0DA7443FA3; Thu, 10 Apr 2003 04:50:51 -0700 (PDT) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (localhost [IPv6:::1]) by lurza.secnetix.de (8.12.6/8.12.5) with ESMTP id h3ABondK034934; Thu, 10 Apr 2003 13:50:49 +0200 (CEST) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.12.6/8.12.5/Submit) id h3ABom0O034933; Thu, 10 Apr 2003 13:50:48 +0200 (CEST) Date: Thu, 10 Apr 2003 13:50:48 +0200 (CEST) Message-Id: <200304101150.h3ABom0O034933@lurza.secnetix.de> From: Oliver Fromme To: freebsd-stable@FreeBSD.ORG, freebsd-hackers@FreeBSD.ORG X-Newsgroups: list.freebsd-stable User-Agent: tin/1.5.4-20000523 ("1959") (UNIX) (FreeBSD/4.7-RELEASE (i386)) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Subject: panic: vinvalbuf: flush failed X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Apr 2003 11:50:52 -0000 Hi, We have a pretty serious problem with our news server crashing during the expire cronjob. This happened with 4.7-RELEASE, so we upgraded to 4.8-RELEASE recently, hoping that the problem might be fixed, but it isn't. The machine is a Compaq DL360-G2. I've searched the PR database as well as the mailing list archives for the panic string, but didn't find anything. What makes the problem even worse is the fact that the machine freezes after the "syncing disks" output. Normally it should reboot, because we have DDB_UNATTENDED in the kernel, but it doesn't work. The crash always happens shortly after INN's expire cronjob starts, shortly after 1:00am in the night (at that time, the CPU load and NFS traffic increases noticably). But it doesn't always happen, only every 3 to 4 days. On the other days, the expire job finishes without problems. The machine has pretty good network traffic (about 40 - 50 Mbit/s constantly), half of which is NNTP, and the other half is NFS. That's during normal operation -- during the expire job, the NFS traffic is even higher. The news spool and INN's overview database are on an NFS mount (a NetApp filer), as well as binaries, logfiles and everything else. The NFS mounts are v3+UDP, as far as I can tell (that should be the default). The network interface is a Broadcom BCM5701 gigabit one, connected to a Cisco switch with a VLAN trunk (there are several virtual VLAN interfaces on this trunk). If it matters, we're using IPFilter for packet filtering, plus IPFW+Dummynet for traffic shaping. The load on the machine is moderate (usually below 1.0). As far as I can tell, there is no resource shortage. There's plenty of RAM, free file descriptors, mbufs / mbuf clusters. Well, at least during normal operation. Maybe it is a bit different during the expire run at night. This is the console output: panic: vinvalbuf: flush failed syncing disks... At this point, the machine freezes completely, it does not display any numbers nor "done". It just sits there for hours. When I come in the morning, I break into DDB (fortunately that still works): Stopped at siointr1+0xf2: movl $0,brk_state2.757 db> db> trace siointr1(c884b000,e9bcaaa8,c033bb86,c884b000,e9bc0010) at siointr1+0xf2 siointr(c884b000) at siointr+0xb Xfastintr4(e94cba7c,110,c0393145,0) at Xfastintr4+0x16 nfs_asyncio(d51b97d0,0,0) at nfs_asyncio+0xf4 nfs_strategy(e9bcab14) at nfs_strategy+0x59 nfs_writebp(d51b97d0,1,e95bc1a0,e9bcabfc,c02a7040) at nfs_writebp+0xdc nfs_bwrite(e9bcaba0) at nfs_bwrite+0x16 nfs_flush(e9bb0600,c27f5900,2,c0434be0,1) at nfs_flush+0x68c nfs_fsync(e9bcac34) at nfs_fsync+0x19 nfs_sync(c8c1f400,2,c27f5900,c0434be0,c8c1f400) at nfs_sync+0x99 sync(c0434be0,0,c03861ec,c038aabc,100) at sync+0x63 boot(100,e94ef2a0,68c0c0,e9bcad00,c021ba99) at boot+0x8a panic(c038aabc,e95002c0,7d0,1,68c0c0) at panic+0x79 vinvalbuf(e95262c0,1,c8c3e280,e95bc1a0,100,0) at vinvalbuf+0x395 nfs_vinvalbuf(e95262c0,1,c8c3e280,e95bc1a0,1) at nfs_vinvalbuf+0x108 nfs_open(e9bcadfc,0,c911b440,e9bcaf80,e95262c0) at nfs_open+0xf5 vn_open(e9bcaec8,1,0,e95bc1a0,3) at vn_open+0x3d7 open(e95bc1a0,e9bcaf80,38229dac,0,0) at open+0xb8 syscall2(81c002f,2f,bfbf002f,0,0) at syscall2+0x1f5 Xint0x80_syscall() at Xint0x80_syscall+0x25 db> ps pid proc addr uid ppid pgrp flag stat wmesg wchan cmd 37692 e95bc1a0 e9bc8000 8 727 727 4004004 2 nnrpd 37657 e9bd2e00 e9bd3000 8 37656 37407 004005 2 expireover 37656 e95bf0c0 e9abf000 8 37410 37407 000084 3 wait e95bf0c0 sh 37410 e95bd860 e9b40000 8 37407 37407 004084 3 wait e95bd860 sh 37407 e95bfdc0 e991f000 0 37403 37407 004084 3 wait e95bfdc0 sh 37403 e95bd520 e9b5e000 0 103 103 000084 3 piperd e9440540 cron 31158 e95bc680 e9ba8000 8 727 727 004085 2 overchan 31157 e95bcea0 e9b68000 8 727 727 004084 3 sbwait e554e888 perl 31156 e95c0ac0 e95e2000 8 727 727 004484 2 innfeed 30033 e95bc340 e9bc1000 8 30031 30031 004086 3 ttyin c882b430 zsh 30031 e95c02a0 e9907000 0 30021 30031 004086 3 wait e95c02a0 sh 30021 e33fe380 e95b6000 0 30019 30021 2004086 3 pause e95b6260 zsh 30019 e95bc820 e9b99000 0 992 30019 000584 2 sshd 6279 e33fe040 e95c4000 0 1778 6279 004086 2 zsh 1778 e33ffbe0 e94dc000 0 1 1778 004186 3 wait e33ffbe0 login 1369 e33ff220 e950c000 0 1 1369 004086 3 ttyin c8c3c710 getty 1368 e34012a0 e943c000 0 1 1368 004086 3 ttyin c8c3b110 getty 1367 e33ffd80 e94c8000 0 1 1367 004086 3 ttyin c8c41f10 getty 1366 e3400400 e94ad000 0 1 1366 004086 3 ttyin c8c41b10 getty 1365 e33fff20 e94c4000 0 1 1365 004086 3 ttyin c8c44310 getty 1364 e33ff3c0 e94fd000 0 1 1364 004086 3 ttyin c8c32410 getty 1363 e33feee0 e9527000 0 1 1363 004086 3 ttyin c8a42b10 getty 1362 e3401440 e9433000 0 1 1362 004086 3 ttyin c885cd10 getty 1358 e33fe6c0 e9576000 0 1 1358 000084 2 syslogd 992 e33ff560 e9505000 0 1 992 000184 2 sshd 991 e33ff8a0 e94f5000 0 1 991 000084 2 snmpd 727 e33ffa40 e94e1000 8 1 727 000005 2 innd 109 e34000c0 e94b7000 25 1 109 2000184 2 sendmail 106 e3400260 e94b2000 0 1 106 000584 2 sendmail 103 e3400c20 e9499000 0 1 103 000484 2 cron 96 e34005a0 e94a9000 0 1 91 000084 2 nfsiod 95 e3400740 e94a5000 0 1 91 000084 2 nfsiod 94 e34008e0 e94a1000 0 1 91 000084 2 nfsiod 93 e3400a80 e949d000 0 1 91 000084 2 nfsiod 55 e3401100 e9444000 0 1 55 000484 2 ipmon 19 e3400f60 e9448000 0 1 19 000084 3 mfsidl e7ba6000 mount_mfs 6 e34015e0 e7bb4000 0 0 0 000204 2 syncer 5 e3401780 e7bb1000 0 0 0 000604 2 vnlru 4 e3401920 e7bae000 0 0 0 000604 2 bufdaemon 3 e3401ac0 e7bab000 0 0 0 000204 3 psleep c042af20 vmdaemon 2 e3401c60 e7ba8000 0 0 0 000604 2 pagedaemon 1 e3401e00 e3406000 0 0 1 004284 3 wait e3401e00 init 0 c0434be0 c04de000 0 0 0 000204 3 sched c0434be0 swapper db> panic panic: from debugger Uptime: 2d23h10m42s dumping to dev #da/0x20001, offset 3670056 dump 1279 1278 1277 1276 1275 1274 1273 1272 1271 1270 1269 1268 [...] 12 11 10 9 8 7 6 5 4 3 2 1 0 succeeded Automatic reboot in 15 seconds - press a key on the console to abort BIOS drive A: is disk0 BIOS drive C: is disk1 BIOS 637kB/1309676kB available memory FreeBSD/i386 bootstrap loader, Revision 0.8 (olli@monos.secnetix.net, Mon Mar 31 01:11:35 CEST 2003) Loading /boot/defaults/loader.conf /kernel text=0x2bb060 data=0x46b88+0x3842c syms=[0x4+0x3caa0+0x4+0x44979] Hit [Enter] to boot immediately, or any other key for command prompt. Booting [kernel]... Copyright (c) 1992-2003 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.8-RELEASE #0: Mon Mar 31 11:27:28 CEST 2003 olli@monos.secnetix.net:/usr/src/sys/compile/FARM Timecounter "i8254" frequency 1193182 Hz CPU: Intel(R) Pentium(R) III CPU family 1400MHz (1396.45-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x6b1 Stepping = 1 Features=0x383fbff real memory = 1342156800 (1310700K bytes) avail memory = 1300033536 (1269564K bytes) Preloaded elf kernel "kernel" at 0xc04be000. Pentium Pro MTRR support enabled md0: Malloc disk npx0: on motherboard npx0: INT 16 interface pcib1: on motherboard pci1: on pcib1 ciss0: port 0x3000-0x30ff mem 0xf7ef0000-0xf7ef3fff,0xf7fc0000-0xf7ffffff irq 11 at device 4.0 on pci1 ciss0: using 256 of 1024 available commands ciss0: 1 logical drive configured ciss0: firmware 1.80 ciss0: 2 SCSI channels ciss0: signature 'CISS' ciss0: valence 1 ciss0: supported I/O methods 0xe ciss0: active I/O method 0x3 ciss0: 4G page base 0x00000000 ciss0: interrupt coalesce delay 1000us ciss0: interrupt coalesce count 16 ciss0: max outstanding commands 1024 ciss0: bus types 0x2 ciss0: server name '' ciss0: heartbeat 0x30000033 ciss0: 1 logical drive ciss0: logical drive 1: RAID 0, 16896MB online bge0: mem 0xf7fb0000-0xf7fbffff irq 5 at device 5.0 on pci1 bge0: Ethernet address: 00:08:02:a0:c5:06 miibus0: on bge0 brgphy0: on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bge1: mem 0xf7fa0000-0xf7faffff irq 10 at device 6.0 on pci1 bge1: Ethernet address: 00:08:02:a0:c5:07 miibus1: on bge1 brgphy1: on miibus1 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto pcib0: on motherboard pci0: on pcib0 pci0: at 3.0 irq 7 pci0: (vendor=0x0e11, dev=0xb203) at 5.0 irq 3 pci0: (vendor=0x0e11, dev=0xb204) at 5.2 irq 15 isab0: at device 15.0 on pci0 isa0: on isab0 atapci0: port 0-0x3,0x2000-0x200f,0x374-0x377,0x170-0x177,0x3f4-0x3f7,0x1f0-0x1f7 at device 15.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 ohci0: mem 0xf5ef0000-0xf5ef0fff irq 7 at device 15.2 on pci0 usb0: OHCI version 1.0, legacy support usb0: SMM does not respond, resetting usb0: on ohci0 usb0: USB revision 1.0 uhub0: (0x1166) OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 4 ports with 4 removable, self powered pcib2: on motherboard pci2: on pcib2 pcib7: on motherboard pci7: on pcib7 pcib3: on motherboard pci3: on pcib3 eisa0: on motherboard mainboard0: on eisa0 slot 0 orm0: