Date: Wed, 17 Jan 2001 09:20:53 +1030 From: Greg Lehey <grog@lemis.com> To: Andrew Gordon <arg@arg1.demon.co.uk> Cc: freebsd-stable@freebsd.org Subject: Re: Vinum incidents. Message-ID: <20010117092053.C16555@wantadilla.lemis.com> In-Reply-To: <Pine.BSF.4.21.0101161413490.64349-100000@server.arg.sj.co.uk>; from arg@arg1.demon.co.uk on Tue, Jan 16, 2001 at 03:23:40PM %2B0000 References: <Pine.BSF.4.21.0101161413490.64349-100000@server.arg.sj.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday, 16 January 2001 at 15:23:40 +0000, Andrew Gordon wrote: > > I have a server with 5 identical SCSI drives, arranged as a single RAID-5 > volume using vinum (and softupdates). This is exported with > NFS/Samba/Netatalk/Econet to clients of various types; the root,usr,var > partitions are on a small IDE drive (there are no local users or > application processes). The machine has a serial console. > > This has been working reliably for a couple of months, running -stable > from around the time of 4.2-release. On 1st January, I took advantage of > the low load to do an upgrade to the latest -stable. > > Since then, there have been two incidents (probably not in fact related to > the upgrade) where vinum has not behaved as expected: > > 1) Phantom disc error > --------------------- > > Vinum logged: > > Jan 2 01:59:26 serv20 /kernel: home.p0.s0: fatal write I/O error > Jan 2 01:59:26 serv20 /kernel: vinum: home.p0.s0 is stale by force > Jan 2 01:59:26 serv20 /kernel: vinum: home.p0 is degraded > > However, there was no evidence of any actual disc error - nothing was > logged on the console, in dmesg or any other log files. The system would > have been substantially idle at that time of night, except that the daily > cron jobs would just been starting at that time. Hmm. In view of what followed, I'd still suspect that you've had some kind of I/O error. > > A "vinum start home.p0.s0" some time later successfully revived the plex > and the system then ran uninterrupted for two weeks. > > Does this suggest some sort of out-of-range block number bug somewhere? Not a priori. I think I'll get Vinum to report the block number, and that will give us more information. > 2) Recovery problems > -------------------- > > <snip> I've never seen anything like this before. Can you take a look at http://www.vinumvm.org/vinum/how-to-debug.html and send me the information I asked for there? I know you've supplied some of it, but the uninterrupted logs over the complete time would help a lot. > On booting to multi-user mode, I noticed that all the drives were > marked as 'down', even though the volume and most of the subdisks > were 'up' (and a quick check in the console scroll-back showed that > it was also in this state before the previous attempt to revive: This in particular puzzles me. My best bet is that Vinum reacted in an unfriendly manner to something you said to it. The Vinum history will help there. > This time, I used 'vinum start' on drive[0-4] before doing vinum start on > home.p0.s3, and this time it successfully revived, taking 10 minutes or > so. Some minutes later, the machine paniced (this time saving a dump): > > IdlePTD 3166208 > initial pcb at 282400 > panicstr: softdep_lock: locking against myself > panic messages: > --- > panic: softdep_setup_inomapdep: found inode > (kgdb) where > #0 0xc014dd1a in dumpsys () > #1 0xc014db3b in boot () > #2 0xc014deb8 in poweroff_wait () > #3 0xc01e6b49 in acquire_lock () > #4 0xc01eae02 in softdep_fsync_mountdev () > #5 0xc01eef0e in ffs_fsync () > #6 0xc01edc16 in ffs_sync () > #7 0xc017b42b in sync () > #8 0xc014d916 in boot () > #9 0xc014deb8 in poweroff_wait () > #10 0xc01e792c in softdep_setup_inomapdep () > #11 0xc01e44a4 in ffs_nodealloccg () > #12 0xc01e352b in ffs_hashalloc () > #13 0xc01e3186 in ffs_valloc () > #14 0xc01f4e6f in ufs_makeinode () > #15 0xc01f2824 in ufs_create () > #16 0xc01f5029 in ufs_vnoperate () > #17 0xc01b1e43 in nfsrv_create () > #18 0xc01c6b2e in nfssvc_nfsd () > #19 0xc01c6483 in nfssvc () > #20 0xc022b949 in syscall2 () > #21 0xc02207b5 in Xint0x80_syscall () > #22 0x8048135 in ?? () This is a soft updates panic. It's not beyond the bounds of possibility that it's caused by Vinum, but the fact that you've had these problems since the upgrade suggests that it could also be a result of some recent commits to the soft update code. Please keep the dump, just in case. Have you compiled the kernel with debug symbols? Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010117092053.C16555>