Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 10 Apr 2000 18:09:03 -0500
From:      Douglas Swarin <doug@staff.texas.net>
To:        freebsd-hackers@FreeBSD.ORG
Subject:   Re: NFS Panic Problem
Message-ID:  <20000410180903.A934@staff.texas.net>
In-Reply-To: <20000316005107.A2883@staff.texas.net>; from doug@staff.texas.net on Thu, Mar 16, 2000 at 12:51:08AM -0600
References:  <20000316005107.A2883@staff.texas.net>

next in thread | previous in thread | raw e-mail | index | archive | help
I finally managed to obtain a crash dump of the problem quoted at the end
of this message. Strangely enough, it was crashing daily until the debug
code went in, at which point it stopped crashing. Just today, though, we
had the same panic. The machine is running -STABLE as of Mar 16 2000.
Here's what I got from gdb -k on the dump:

(kgdb) set width 80
(kgdb) symbol-file kernel.debug
Reading symbols from kernel.debug...done.
(kgdb) exec-file /var/crash/kernel.0
(kgdb) core-file /var/crash/vmcore.0
IdlePTD 3239936
initial pcb at 280874
panicstr: vinvalbuf: flush failed
panic messages:
---
panic: vinvalbuf: flush failed

syncing disks... 8 8 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 done

[dump countdown snipped]
---
#0  boot (howto=256) at ../../kern/kern_shutdown.c:285
285                     dumppcb.pcb_cr3 = rcr3();
(kgdb) where
#0  boot (howto=256) at ../../kern/kern_shutdown.c:285
#1  0xc0156315 in panic (fmt=0xc024dcbf "vinvalbuf: flush failed")
    at ../../kern/kern_shutdown.c:446
#2  0xc017ac5e in vinvalbuf (vp=0xd05502c0, flags=1, cred=0xc222df80,
    p=0xcfe57780, slpflag=0, slptimeo=0) at ../../kern/vfs_subr.c:677
#3  0xc01b8410 in nfs_vinvalbuf (vp=0xd05502c0, flags=1, cred=0xc222df80,
    p=0xcfe57780, intrflg=1) at ../../nfs/nfs_bio.c:979
#4  0xc01b6d1b in nfs_bioread (vp=0xd05502c0, uio=0xcfe6ef00, ioflag=8323072,
    cred=0xc222df80, getpages=0) at ../../nfs/nfs_bio.c:345
#5  0xc01dc814 in nfs_read (ap=0xcfe6eeb8) at ../../nfs/nfs_vnops.c:963
#6  0xc018167f in vn_read (fp=0xc1e16800, uio=0xcfe6ef00, cred=0xc222df80,
    flags=0) at vnode_if.h:303
#7  0xc0160d41 in dofileread (p=0xcfe57780, fp=0xc1e16800, fd=3,
    buf=0x804d000, nbyte=512, offset=-1, flags=0)
    at ../../kern/sys_generic.c:179
#8  0xc0160c4b in read (p=0xcfe57780, uap=0xcfe6ef84)
    at ../../kern/sys_generic.c:111
#9  0xc022da2b in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi = 672022196,
      tf_esi = 672022196, tf_ebp = -1077945848, tf_isp = -806948908,
      tf_ebx = 671965656, tf_edx = 3, tf_ecx = -1, tf_eax = 3, tf_trapno = 7,
      tf_err = 2, tf_eip = 671715344, tf_cs = 31, tf_eflags = 534,
      tf_esp = -1077945880, tf_ss = 39}) at ../../i386/i386/trap.c:1100
#10 0xc022257c in Xint0x80_syscall ()
#11 0x280bbe69 in ?? ()
#12 0x280bbd32 in ?? ()
#13 0x8048cb9 in ?? ()
#14 0x8049bcf in ?? ()
#15 0x80489c1 in ?? ()

(kgdb) source /usr/src/sys/modules/vinum/.gdbinit.kernel
(kgdb) ps
  pid    proc    addr    uid   ppid  pgrp   flag stat  comm  wchan
41612 cffa32e0 cffee000 211914 41600 41611  004086  3  grep  piperd cfd88b20
41611 d05247e0 d0683000 211914 41600 41611  004006  2  tail
41600 d0524ec0 d0665000 211914 41599 41600  004086  3  bash  wait d0524ec0

I'm guessing the three processes listed above are the important ones. A
'tail' on a large NFS-mounted logfile piped to 'grep' caused the crash. I
can provide other information from the dump if anyone needs it.

Thanks in advance for any help,
Doug

On Thu, Mar 16, 2000 at 12:51:08AM -0600, Douglas Swarin wrote:
> Recently one of the FreeBSD machines where I work has been crashing on a
> semi-regular basis, once or twice a day. The dmesg for the machine is at
> the bottom of this post. These crashes started very recently, less than
> a week ago. Before that, the machine had been very reliable (several 100
> day uptimes).
> 
> The machine used to be running FreeBSD 3.1-STABLE as of mid-April 1999.
> Since I know many NFS bugs have been fixed since then, the box was on
> Tuesday upgraded to 3.4-STABLE (a completely fresh installation). This,
> however, did not fix the panics. I believe the problem to be related to
> one of these two PRs:
> 
>   [1998/06/23] kern/7028  http://www.freebsd.org/cgi/query-pr.cgi?pr=7028
>   panic in vinvalbuf when appending/looking at tail of NFS file
> 
>   [2000/03/08] misc/17272 http://www.freebsd.org/cgi/query-pr.cgi?pr=17272
>   deleting a file that a program has open causes vinvalbuf: flush failed
> 
> Basically, it's:
> 
>   panic: vinvalbuf: flush failed
> 
> And appears to be triggered by a 'tail -f' on a growing, very large log
> file over NFS. The NFS host on the other end is running Solaris 2.6 on a
> sparc. The actual mount is kind of weird; it is indirected through a
> different NFS mount off a NetApp through a symlink (the NetApp-mounted
> FS is basically a symlink farm with a few real directories). Basically:
> 
>   netapp:/home on /home
>   sun:/logs on /sun/logs
> 
>   /home/logs@ -> /sun/logs
>   and we are doing 'tail -f /home/logs/largelogfile'
>   (there are good historical reasons for this setup)
> 
> We have made no significant changes to the other machines in this setup,
> although the logfile in question has been growing in size over time. We
> rotate the logfile on the Sun daily as well. No executable files for the
> BSD machine are stored on the Sun.
> 
> I have compiled a debug kernel and will provide a traceback and/or dump
> to anyone who is interested once it happens again. If I find a way to
> reliably reproduce it, I will post that too. For the meantime, are there
> any quick patches or other solutions I could use?
> 
> Thanks in advance for your time and advice,
> Doug
> 
> Below is dmesg:
> 
> Copyright (c) 1992-1999 FreeBSD Inc.
> Copyright (c) 1982, 1986, 1989, 1991, 1993
>         The Regents of the University of California. All rights reserved.
> FreeBSD 3.4-STABLE #2: Tue Mar 14 23:21:39 CST 2000
>     doug@xxx:/usr/src/sys/compile/XXX
> Timecounter "i8254"  frequency 1193182 Hz
> Timecounter "TSC"  frequency 347664663 Hz
> CPU: Pentium II/Xeon/Celeron (347.66-MHz 686-class CPU)
>   Origin = "GenuineIntel"  Id = 0x652  Stepping = 2
>   Features=0x183fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR>
> real memory  = 536870912 (524288K bytes)
> avail memory = 519360512 (507188K bytes)
> Preloaded elf kernel "kernel" at 0xc0309000.
> Preloaded userconfig_script "/boot/kernel.conf" at 0xc030909c.
> Pentium Pro MTRR support enabled
> Probing for devices on PCI bus 0:
> chip0: <Intel 82443BX host to PCI bridge> rev 0x03 on pci0.0.0
> chip1: <Intel 82443BX host to AGP bridge> rev 0x03 on pci0.1.0
> chip2: <PCI to PCI bridge (vendor=1011 device=0024)> rev 0x03 on pci0.2.0
> chip3: <Intel 82371AB PCI to ISA bridge> rev 0x02 on pci0.7.0
> chip4: <Intel 82371AB Power management controller> rev 0x02 on pci0.7.3
> fxp0: <Intel EtherExpress Pro 10/100B Ethernet> rev 0x05 int a irq 14 on pci0.8.0
> fxp0: Ethernet address 00:90:27:45:ee:ae
> Probing for devices on PCI bus 1:
> vga0: <ATI model 4744 graphics accelerator> rev 0x5c on pci1.0.0
> Probing for devices on PCI bus 2:
> ahc0: <Adaptec aic7890/91 Ultra2 SCSI adapter> rev 0x00 int a irq 11 on pci2.4.0
> ahc0: aic7890/91 Wide Channel A, SCSI Id=7, 16/255 SCBs
> ahc1: <Adaptec aic7860 SCSI adapter> rev 0x03 int a irq 11 on pci2.6.0
> ahc1: aic7860 Single Channel A, SCSI Id=7, 3/255 SCBs
> Probing for devices on the ISA bus:
> sc0 on isa
> sc0: VGA color <16 virtual consoles, flags=0x0>
> atkbdc0 at 0x60-0x6f on motherboard
> atkbd0 irq 1 on isa
> psm0 not found
> sio0 at 0x3f8-0x3ff irq 4 flags 0x30 on isa
> sio0: type 16550A, console
> sio1 at 0x2f8-0x2ff irq 3 on isa
> sio1: type 16550A
> fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa
> fdc0: FIFO enabled, 8 bytes threshold
> fd0: 1.44MB 3.5in
> ppc0 at 0x378 irq 7 on isa
> ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
> ppc0: FIFO with 16/16/8 bytes threshold
> lpt0: <generic printer> on ppbus 0
> lpt0: Interrupt-driven port
> ppi0: <generic parallel i/o> on ppbus 0
> lppps0: <Pulse per second Timing Interface> on ppbus 0
> plip0: <PLIP network interface> on ppbus 0
> vga0 at 0x3b0-0x3df maddr 0xa0000 msize 131072 on isa
> npx0 on motherboard
> npx0: INT 16 interface
> Waiting 8 seconds for SCSI devices to settle
> chcd0 at ahc1 bus 0 target 5 lun 0
> cd0: <NEC CD-ROM DRIVE:465 1.03> Removable CD-ROM SCSI-2 device 
> cd0: 20.000MB/s transfers (20.000MHz, offset 15)
> cd0: Attempt to query device size failed: NOT READY, Medium not present
> da1 at ahc0 bus 0 target 1 lun 0
> da1: <IBM DDRS-39130D DC1B> Fixed Direct Access SCSI-2 device 
> da1: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing Enabled
> da1: 8715MB (17850000 512 byte sectors: 255H 63S/T 1111C)
> da0 at ahc0 bus 0 target 0 lun 0
> da0: <IBM DDRS-39130D DC1B> Fixed Direct Access SCSI-2 device 
> da0: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing Enabled
> da0: 8715MB (17850000 512 byte sectors: 255H 63S/T 1111C)
> changing root device to da0s1a
> 
> 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000410180903.A934>