From owner-freebsd-hackers Wed Mar 15 22:51:20 2000 Delivered-To: freebsd-hackers@freebsd.org Received: from mw2.texas.net (mw2.texas.net [206.127.30.12]) by hub.freebsd.org (Postfix) with ESMTP id BC30037BD53 for ; Wed, 15 Mar 2000 22:51:15 -0800 (PST) (envelope-from doug@texas.net) Received: from staff3.texas.net (staff3.texas.net [207.207.0.40]) by mw2.texas.net (2.4/2.4) with ESMTP id AAA17575; Thu, 16 Mar 2000 00:51:09 -0600 (CST) Received: (from doug@localhost) by staff3.texas.net (8.9.3/8.9.2) id AAA02918; Thu, 16 Mar 2000 00:51:08 -0600 (CST) (envelope-from doug@texas.net) Date: Thu, 16 Mar 2000 00:51:08 -0600 From: Douglas Swarin To: freebsd-hackers@freebsd.org Subject: NFS Panic Problem Message-ID: <20000316005107.A2883@staff.texas.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Recently one of the FreeBSD machines where I work has been crashing on a semi-regular basis, once or twice a day. The dmesg for the machine is at the bottom of this post. These crashes started very recently, less than a week ago. Before that, the machine had been very reliable (several 100 day uptimes). The machine used to be running FreeBSD 3.1-STABLE as of mid-April 1999. Since I know many NFS bugs have been fixed since then, the box was on Tuesday upgraded to 3.4-STABLE (a completely fresh installation). This, however, did not fix the panics. I believe the problem to be related to one of these two PRs: [1998/06/23] kern/7028 http://www.freebsd.org/cgi/query-pr.cgi?pr=7028 panic in vinvalbuf when appending/looking at tail of NFS file [2000/03/08] misc/17272 http://www.freebsd.org/cgi/query-pr.cgi?pr=17272 deleting a file that a program has open causes vinvalbuf: flush failed Basically, it's: panic: vinvalbuf: flush failed And appears to be triggered by a 'tail -f' on a growing, very large log file over NFS. The NFS host on the other end is running Solaris 2.6 on a sparc. The actual mount is kind of weird; it is indirected through a different NFS mount off a NetApp through a symlink (the NetApp-mounted FS is basically a symlink farm with a few real directories). Basically: netapp:/home on /home sun:/logs on /sun/logs /home/logs@ -> /sun/logs and we are doing 'tail -f /home/logs/largelogfile' (there are good historical reasons for this setup) We have made no significant changes to the other machines in this setup, although the logfile in question has been growing in size over time. We rotate the logfile on the Sun daily as well. No executable files for the BSD machine are stored on the Sun. I have compiled a debug kernel and will provide a traceback and/or dump to anyone who is interested once it happens again. If I find a way to reliably reproduce it, I will post that too. For the meantime, are there any quick patches or other solutions I could use? Thanks in advance for your time and advice, Doug Below is dmesg: Copyright (c) 1992-1999 FreeBSD Inc. Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. FreeBSD 3.4-STABLE #2: Tue Mar 14 23:21:39 CST 2000 doug@xxx:/usr/src/sys/compile/XXX Timecounter "i8254" frequency 1193182 Hz Timecounter "TSC" frequency 347664663 Hz CPU: Pentium II/Xeon/Celeron (347.66-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x652 Stepping = 2 Features=0x183fbff real memory = 536870912 (524288K bytes) avail memory = 519360512 (507188K bytes) Preloaded elf kernel "kernel" at 0xc0309000. Preloaded userconfig_script "/boot/kernel.conf" at 0xc030909c. Pentium Pro MTRR support enabled Probing for devices on PCI bus 0: chip0: rev 0x03 on pci0.0.0 chip1: rev 0x03 on pci0.1.0 chip2: rev 0x03 on pci0.2.0 chip3: rev 0x02 on pci0.7.0 chip4: rev 0x02 on pci0.7.3 fxp0: rev 0x05 int a irq 14 on pci0.8.0 fxp0: Ethernet address 00:90:27:45:ee:ae Probing for devices on PCI bus 1: vga0: rev 0x5c on pci1.0.0 Probing for devices on PCI bus 2: ahc0: rev 0x00 int a irq 11 on pci2.4.0 ahc0: aic7890/91 Wide Channel A, SCSI Id=7, 16/255 SCBs ahc1: rev 0x03 int a irq 11 on pci2.6.0 ahc1: aic7860 Single Channel A, SCSI Id=7, 3/255 SCBs Probing for devices on the ISA bus: sc0 on isa sc0: VGA color <16 virtual consoles, flags=0x0> atkbdc0 at 0x60-0x6f on motherboard atkbd0 irq 1 on isa psm0 not found sio0 at 0x3f8-0x3ff irq 4 flags 0x30 on isa sio0: type 16550A, console sio1 at 0x2f8-0x2ff irq 3 on isa sio1: type 16550A fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa fdc0: FIFO enabled, 8 bytes threshold fd0: 1.44MB 3.5in ppc0 at 0x378 irq 7 on isa ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/8 bytes threshold lpt0: on ppbus 0 lpt0: Interrupt-driven port ppi0: on ppbus 0 lppps0: on ppbus 0 plip0: on ppbus 0 vga0 at 0x3b0-0x3df maddr 0xa0000 msize 131072 on isa npx0 on motherboard npx0: INT 16 interface Waiting 8 seconds for SCSI devices to settle chcd0 at ahc1 bus 0 target 5 lun 0 cd0: Removable CD-ROM SCSI-2 device cd0: 20.000MB/s transfers (20.000MHz, offset 15) cd0: Attempt to query device size failed: NOT READY, Medium not present da1 at ahc0 bus 0 target 1 lun 0 da1: Fixed Direct Access SCSI-2 device da1: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing Enabled da1: 8715MB (17850000 512 byte sectors: 255H 63S/T 1111C) da0 at ahc0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-2 device da0: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing Enabled da0: 8715MB (17850000 512 byte sectors: 255H 63S/T 1111C) changing root device to da0s1a To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message