Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 27 Jan 2002 14:07:03 +1030
From:      Greg Lehey <grog@FreeBSD.org>
To:        Thomas Moestl <tmoestl@gmx.net>
Cc:        Martin Blapp <mb@imp.ch>, freebsd-stable@FreeBSD.org
Subject:   Re: double fault with vinum and 4.5 RC3
Message-ID:  <20020127140703.A41130@wantadilla.lemis.com>
In-Reply-To: <20020126233709.GA459@crow.dom2ip.de>
References:  <20020125131225.E4778@wantadilla.lemis.com> <20020125111624.S47234-100000@levais.imp.ch> <20020126233709.GA459@crow.dom2ip.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sunday, 27 January 2002 at  0:37:09 +0100, Thomas Moestl wrote:
> On Fri, 2002/01/25 at 11:38:31 +0100, Martin Blapp wrote:
>> vinum create /root/configfile twice ! makes STABLE 4.5RC3 crashing with a double
>> fault. I guess we should fix that. I'm makeing now a debug kernel.
>
> I've had a quick look at Martin's crash dump;

Well, for a quick look you've done some really good work.  This bug
has probably been chasing me for years.

> the double fault is caused by a longjmp() that restores an invalid
> %esp. By accessing the stack that is to be restored by longjmp
> before actually setting %esp, I was able to get a usable trace
> (showing the vinum-relevant parts only):
>
>   longjmp(c034a120,0,0,c1c31af4,3) at longjmp+0x7
>   throw_rude_remark(0,c02dbfc0,c1c41406,c1c31b84,c1c31af4) at
>   <snip>
>
> Apparently, this process slept in disk I/O initiated by
> vinum_read_label(); meanwhile, the vinum process forked in
> start_daemon() was scheduled and performed an ioctl() to find out
> whether the daemon was running. In this case, it was, so it exited.
> After some time, the other process is woken up because the I/O
> finished. vinum_read_label() returned DL_WRONG_DRIVE, so eventually
> throw_rude_remark() was called, which tried to longjmp() back to
> vinumioctl(). However, because vinumioctl() was called by another
> process in between, command_fail had been overwritten (it is a global
> variable). The stack referenced in that jmpbuf was unmapped when
> that process exited, so the result was a double fault.

Exactly.  The culprit is an extraneous call to setjmp in vinumioctl.
The setjmp logic assumes that the process is holding the config lock,
so it can use command_fail.  Unfortunately, this call was outside the
protected code, and that's what caused the problem.

> Greg, Martin should the crash dump I'm talking about available if
> you would like to take a look (the kernel in question has longjmp
> debugging code and a few printf()s added).

I tried looking, but the net connection was terrible.  Anyway, I've
been able to reproduce it here, and I have a patch which should make
it go away:

RCS file: /home/ncvs/src/sys/dev/vinum/vinumconfig.c,v
retrieving revision 1.32.2.5
diff -w -u -r1.32.2.5 vinumconfig.c
--- vinumconfig.c       28 May 2001 05:56:27 -0000      1.32.2.5
+++ vinumconfig.c       27 Jan 2002 03:31:33 -0000
@@ -99,6 +99,8 @@
     static int finishing;                                  /* don't recurse */
     int was_finishing;
 
+    if ((vinum_conf.flags & VF_LOCKED) == 0)               /* bug catcher */
+       panic ("throw_rude_remark: called without config lock");
     va_start(ap, msg);
     if ((ioctl_reply != NULL)                              /* we're called from the user */
     &&(!(vinum_conf.flags & VF_READING_CONFIG))) {         /* and not reading from disk: return msg */
Index: vinumioctl.c
===================================================================
RCS file: /home/ncvs/src/sys/dev/vinum/vinumioctl.c,v
retrieving revision 1.25.2.3
diff -w -u -r1.25.2.3 vinumioctl.c
--- vinumioctl.c        13 Mar 2001 02:59:43 -0000      1.25.2.3
+++ vinumioctl.c        27 Jan 2002 03:31:06 -0000
@@ -82,9 +82,6 @@
     switch (DEVTYPE(dev)) {
     case VINUM_SUPERDEV_TYPE:                              /* ordinary super device */
        ioctl_reply = (struct _ioctl_reply *) data;         /* save the address to reply to */
-       error = setjmp(command_fail);                       /* come back here on error */
-       if (error)                                          /* bombed out */
-           return 0;                                       /* the reply will contain meaningful info */
        switch (cmd) {
 #ifdef VINUMDEBUG
        case VINUM_DEBUG:

The panic in throw_rude_remark is "just in case", since it's a lot
easier to debug like that than after a double fault.  Please try that
and let me know if you still have problems.  In view of the impending
release of 4.5, it would be nice to get this fix in.

This doesn't change the fact that Martin's approach to recovery is
incorrect.  I'll address that in a separate message.  

Greg
-- 
See complete headers for address and phone numbers

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020127140703.A41130>