From owner-freebsd-stable Sun Jan 27 15:46:14 2002 Delivered-To: freebsd-stable@freebsd.org Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80]) by hub.freebsd.org (Postfix) with ESMTP id 241C937B422 for ; Sun, 27 Jan 2002 15:43:01 -0800 (PST) Received: by wantadilla.lemis.com (Postfix, from userid 1004) id 4C948782D0; Sun, 27 Jan 2002 14:07:03 +1030 (CST) Date: Sun, 27 Jan 2002 14:07:03 +1030 From: Greg Lehey To: Thomas Moestl Cc: Martin Blapp , freebsd-stable@FreeBSD.org Subject: Re: double fault with vinum and 4.5 RC3 Message-ID: <20020127140703.A41130@wantadilla.lemis.com> References: <20020125131225.E4778@wantadilla.lemis.com> <20020125111624.S47234-100000@levais.imp.ch> <20020126233709.GA459@crow.dom2ip.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20020126233709.GA459@crow.dom2ip.de> User-Agent: Mutt/1.3.23i Organization: The FreeBSD Project Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-418-838-708 WWW-Home-Page: http://www.FreeBSD.org/ X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF 13 24 52 F8 6D A4 95 EF Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sunday, 27 January 2002 at 0:37:09 +0100, Thomas Moestl wrote: > On Fri, 2002/01/25 at 11:38:31 +0100, Martin Blapp wrote: >> vinum create /root/configfile twice ! makes STABLE 4.5RC3 crashing with a double >> fault. I guess we should fix that. I'm makeing now a debug kernel. > > I've had a quick look at Martin's crash dump; Well, for a quick look you've done some really good work. This bug has probably been chasing me for years. > the double fault is caused by a longjmp() that restores an invalid > %esp. By accessing the stack that is to be restored by longjmp > before actually setting %esp, I was able to get a usable trace > (showing the vinum-relevant parts only): > > longjmp(c034a120,0,0,c1c31af4,3) at longjmp+0x7 > throw_rude_remark(0,c02dbfc0,c1c41406,c1c31b84,c1c31af4) at > > > Apparently, this process slept in disk I/O initiated by > vinum_read_label(); meanwhile, the vinum process forked in > start_daemon() was scheduled and performed an ioctl() to find out > whether the daemon was running. In this case, it was, so it exited. > After some time, the other process is woken up because the I/O > finished. vinum_read_label() returned DL_WRONG_DRIVE, so eventually > throw_rude_remark() was called, which tried to longjmp() back to > vinumioctl(). However, because vinumioctl() was called by another > process in between, command_fail had been overwritten (it is a global > variable). The stack referenced in that jmpbuf was unmapped when > that process exited, so the result was a double fault. Exactly. The culprit is an extraneous call to setjmp in vinumioctl. The setjmp logic assumes that the process is holding the config lock, so it can use command_fail. Unfortunately, this call was outside the protected code, and that's what caused the problem. > Greg, Martin should the crash dump I'm talking about available if > you would like to take a look (the kernel in question has longjmp > debugging code and a few printf()s added). I tried looking, but the net connection was terrible. Anyway, I've been able to reproduce it here, and I have a patch which should make it go away: RCS file: /home/ncvs/src/sys/dev/vinum/vinumconfig.c,v retrieving revision 1.32.2.5 diff -w -u -r1.32.2.5 vinumconfig.c --- vinumconfig.c 28 May 2001 05:56:27 -0000 1.32.2.5 +++ vinumconfig.c 27 Jan 2002 03:31:33 -0000 @@ -99,6 +99,8 @@ static int finishing; /* don't recurse */ int was_finishing; + if ((vinum_conf.flags & VF_LOCKED) == 0) /* bug catcher */ + panic ("throw_rude_remark: called without config lock"); va_start(ap, msg); if ((ioctl_reply != NULL) /* we're called from the user */ &&(!(vinum_conf.flags & VF_READING_CONFIG))) { /* and not reading from disk: return msg */ Index: vinumioctl.c =================================================================== RCS file: /home/ncvs/src/sys/dev/vinum/vinumioctl.c,v retrieving revision 1.25.2.3 diff -w -u -r1.25.2.3 vinumioctl.c --- vinumioctl.c 13 Mar 2001 02:59:43 -0000 1.25.2.3 +++ vinumioctl.c 27 Jan 2002 03:31:06 -0000 @@ -82,9 +82,6 @@ switch (DEVTYPE(dev)) { case VINUM_SUPERDEV_TYPE: /* ordinary super device */ ioctl_reply = (struct _ioctl_reply *) data; /* save the address to reply to */ - error = setjmp(command_fail); /* come back here on error */ - if (error) /* bombed out */ - return 0; /* the reply will contain meaningful info */ switch (cmd) { #ifdef VINUMDEBUG case VINUM_DEBUG: The panic in throw_rude_remark is "just in case", since it's a lot easier to debug like that than after a double fault. Please try that and let me know if you still have problems. In view of the impending release of 4.5, it would be nice to get this fix in. This doesn't change the fact that Martin's approach to recovery is incorrect. I'll address that in a separate message. Greg -- See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message