Date: Mon, 2 Apr 2001 18:29:09 +0930 From: Greg Lehey <grog@lemis.com> To: Andrew Gordon <arg@arg1.demon.co.uk> Cc: freebsd-stable@freebsd.org Subject: Re: 4.3-RC processes stuck sleeping on "inode" (?vinum) problem update Message-ID: <20010402182909.A75576@wantadilla.lemis.com> In-Reply-To: <20010402094208.D73090@wantadilla.lemis.com>; from grog@lemis.com on Mon, Apr 02, 2001 at 09:42:08AM %2B0930 References: <Pine.BSF.4.21.0104020008080.9790-100000@server.arg.sj.co.uk> <20010402094208.D73090@wantadilla.lemis.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Monday, 2 April 2001 at 9:42:08 +0930, Greg Lehey wrote:
> On Monday, 2 April 2001 at 0:36:31 +0100, Andrew Gordon wrote:
>>
>> Further to my previous report:
>>
>> - This is definitely a problem in 4.3RC: I rolled back to 31st Jan
>> sources (world & kernel), and the system has now been up for 36 hours
>> (as opposed to at most 6 hours running 4.3RC).
>>
>> - New evidence makes me lean towards thinking that Vinum is responsible
>> (though this is by no means conclusive):
>>
>> 1. I had previously only had my nfsd processes getting stuck
>> (plus the 'reboot' process itself if I tried to reboot),
>> however, while doing a 'cvs checkout' onto the vinum filesystem
>> to build my jan31 world, the cvs process got stuck in "inode" too.
>>
>> 2. That same cvs checkout completed OK on a non-vinum filesystem.
>>
>> 3. I have just noticed in my console logs, that in the "ps"
>> output showing the nfsd processes stuck in "inode",
>> the "(syncer)" process is stuck in "vrlock" which is a
>> vinum wait channel.
>
> Hmm. This is pretty conclusive. It's a deadlock.
>
> Tor Egge reported a possible cause of this kind of deadlock. I've
> been testing a fix, but I'm not sure it doesn't have side effects.
> Try this (in /usr/src/sys/dev/vinum), then rebuild the kernel module
> (in /usr/src/sys/modules/vinum), stop and restart vinum, and see if it
> helps:
>
> RCS file: /home/ncvs/src/sys/dev/vinum/vinumlock.c,v
> retrieving revision 1.18.2.2
> diff -w -u -r1.18.2.2 vinumlock.c
> --- vinumlock.c 2001/03/13 02:59:43 1.18.2.2
> +++ vinumlock.c 2001/04/02 00:09:53
> @@ -169,7 +169,7 @@
> #endif
> plex->lockwaits++; /* waited one more time */
> tsleep(lock, PRIBIO, "vrlock", 0);
> - lock = plex->lock; /* start again */
> + lock = &plex->lock[-1]; /* start again */
> foundlocks = 0;
> pos = NULL;
> }
OK. I've tried this change, and indeed I still ended up with
problems. It seems that from time to time a wakeup gets lost, causing
things to hang. I've now made a workaround, and things seem to be
working stably. Try this fix instead (or apply the other line if
you've already made a change). I'm relatively confident that this
will fix the problem. In view of the code freeze, please let me know
as soon as possible whether this fixes your problem.
RCS file: /home/ncvs/src/sys/dev/vinum/vinumlock.c,v
retrieving revision 1.18.2.2
diff -w -u -r1.18.2.2 vinumlock.c
--- vinumlock.c 2001/03/13 02:59:43 1.18.2.2
+++ vinumlock.c 2001/04/02 08:56:26
@@ -168,8 +168,8 @@
}
#endif
plex->lockwaits++; /* waited one more time */
- tsleep(lock, PRIBIO, "vrlock", 0);
- lock = plex->lock; /* start again */
+ tsleep(lock, PRIBIO, "vrlock", hz);
+ lock = &plex->lock [-1]; /* start again */
foundlocks = 0;
pos = NULL;
}
Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010402182909.A75576>
