Date: Fri, 30 Mar 2001 11:17:24 +0100 (BST) From: Andrew Gordon <arg@arg1.demon.co.uk> To: freebsd-stable@freebsd.org Subject: NFS problems in 4.3-RC (maybe Vinum?) Message-ID: <Pine.BSF.4.21.0103301006100.1651-100000@server.arg.sj.co.uk>
next in thread | raw e-mail | index | archive | help
On Wednesday, I upgraded an NFS server to 4.3-RC. This machine has an IDE
drive with the system partitions and a Vinum RAID5 on 5 SCSI drives for
/home which is the main NFS export, plus a single SCSI drive (non-Vinum)
exported as /cd. Soft updates are enabled everywhere except the Vinum
volume.
The server had been running 4.2-STABLE without problems
since mid-January (at which time there were some Vinum-related panics, but
nothing like the current behaviour).
Since the upgrade, it has failed 4 times:
1) Apparently stopped serving NFS to one client - tcpdump showed
incoming UDP from that client but no replies. Server rebooted
cleanly and problem went away.
2) Stopped providing NFS service to any clients. On reboot,
"syncing disks... 5 1 1 1 1 1 1 1 1 1 1 1 1 1 giving up on 1 buffers"
The automatic fsck on all the filesystems threw up one error on
/home (INCORRECT BLOCK COUNT I=12634345 (2 should be 0)),
suggesting that the un-flushed block was in the Vinum volume.
3) Stopped serving NFS. This time I noticed on ps that the nfsd
processes were all stuck:
0 523 1 0 2 0 360 180 accept Is ?? 0:00.00 nfsd: master
0 525 523 0 -2 0 352 172 getblk D ?? 0:06.24 nfsd: server
0 526 523 0 -14 0 352 172 inode D ?? 0:00.07 nfsd: server
0 527 523 0 -14 0 352 172 inode D ?? 0:00.01 nfsd: server
0 528 523 0 -14 0 352 172 inode D ?? 0:00.01 nfsd: server
A reboot hung the machine: ctrl-T gave:
load: 0.00 cmd: reboot 62014 [inode] 0.00u 0.00s 0% 252k
After a hard reset, the fsck gave three "incorrect block count"
errors on /home (also one unref file in /var), but again came up
without needing manual fsck.
4) As for 2), except that this time the fsck found nothing wrong
on /home, but a load of unref files on /var. A 'ps' before
doing the reboot showed the nfsd processes stuck again:
0 264 1 0 2 0 360 132 accept Is ?? 0:00.00 nfsd: master
0 266 264 0 -14 0 352 124 inode D ?? 0:06.15 nfsd: server
0 267 264 0 -14 0 352 124 inode D ?? 0:00.26 nfsd: server
0 268 264 0 -14 0 352 124 inode D ?? 0:00.02 nfsd: server
0 269 264 0 -14 0 352 124 inode D ?? 0:00.04 nfsd: server
The load on the machine would have been much lower than usual, since most
of the users are on holiday (which is why I did the upgrade in the first
place). The only thing that has changed apart from the upgrade is that
the /cd filesystem, while present on the machine for some time and full of
data, would not have been used until this week as various clients were
re-configured to use it; however it doesn't seem particularly involved
(and also one of the failures happened around 02:00 when all of the
machines mounting /cd were powered off: there would only have been me
(logged into another machine that mounts /home) and various cron jobs
active at the time.
I say "maybe Vinum?" in the subject since the main NFS export is on a
Vinum RAID5, but there isn't really any evidence to suggest Vinum is to
blame.
I re-cvsuped this morning in case a fix had appeared; I haven't rebuilt
yet, but none of the diffs look at all relevant:
U contrib/sendmail/FREEBSD-upgrade
U lib/libc/gen/glob.c
U release/sysinstall/main.c
U sys/dev/vinum/vinumconfig.c
U sys/net/if.c
U sys/net/if_vlan.c
U sys/netinet/if_ether.c
U sys/netinet/ip_icmp.c
U sys/netinet/tcp_subr.c
U usr.bin/fetch/fetch.c
U usr.bin/netstat/if.c
U usr.sbin/ppp/bundle.c
U usr.sbin/ppp/ether.c
U usr.sbin/ppp/iface.c
U usr.sbin/ppp/iface.h
U usr.sbin/ppp/ppp.8
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0103301006100.1651-100000>
