From owner-freebsd-stable Sun Apr 1 17:12:14 2001 Delivered-To: freebsd-stable@freebsd.org Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80]) by hub.freebsd.org (Postfix) with ESMTP id 4515637B71B for ; Sun, 1 Apr 2001 17:12:10 -0700 (PDT) (envelope-from grog@lemis.com) Received: by wantadilla.lemis.com (Postfix, from userid 1004) id 828156ACB7; Mon, 2 Apr 2001 09:42:08 +0930 (CST) Date: Mon, 2 Apr 2001 09:42:08 +0930 From: Greg Lehey To: Andrew Gordon Cc: freebsd-stable@freebsd.org Subject: Re: 4.3-RC processes stuck sleeping on "inode" (?vinum) problem update Message-ID: <20010402094208.D73090@wantadilla.lemis.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from arg@arg1.demon.co.uk on Mon, Apr 02, 2001 at 12:36:31AM +0100 Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-418-838-708 WWW-Home-Page: http://www.lemis.com/~grog X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF 13 24 52 F8 6D A4 95 EF Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Monday, 2 April 2001 at 0:36:31 +0100, Andrew Gordon wrote: > > Further to my previous report: > > - This is definitely a problem in 4.3RC: I rolled back to 31st Jan > sources (world & kernel), and the system has now been up for 36 hours > (as opposed to at most 6 hours running 4.3RC). > > - New evidence makes me lean towards thinking that Vinum is responsible > (though this is by no means conclusive): > > 1. I had previously only had my nfsd processes getting stuck > (plus the 'reboot' process itself if I tried to reboot), > however, while doing a 'cvs checkout' onto the vinum filesystem > to build my jan31 world, the cvs process got stuck in "inode" too. > > 2. That same cvs checkout completed OK on a non-vinum filesystem. > > 3. I have just noticed in my console logs, that in the "ps" > output showing the nfsd processes stuck in "inode", > the "(syncer)" process is stuck in "vrlock" which is a > vinum wait channel. Hmm. This is pretty conclusive. It's a deadlock. Tor Egge reported a possible cause of this kind of deadlock. I've been testing a fix, but I'm not sure it doesn't have side effects. Try this (in /usr/src/sys/dev/vinum), then rebuild the kernel module (in /usr/src/sys/modules/vinum), stop and restart vinum, and see if it helps: RCS file: /home/ncvs/src/sys/dev/vinum/vinumlock.c,v retrieving revision 1.18.2.2 diff -w -u -r1.18.2.2 vinumlock.c --- vinumlock.c 2001/03/13 02:59:43 1.18.2.2 +++ vinumlock.c 2001/04/02 00:09:53 @@ -169,7 +169,7 @@ #endif plex->lockwaits++; /* waited one more time */ tsleep(lock, PRIBIO, "vrlock", 0); - lock = plex->lock; /* start again */ + lock = &plex->lock[-1]; /* start again */ foundlocks = 0; pos = NULL; } Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message