From owner-freebsd-hackers  Wed Sep 26 18:43:54 2001
Delivered-To: freebsd-hackers@freebsd.org
Received: from mw1.texas.net (mw1.texas.net [206.127.30.11])
	by hub.freebsd.org (Postfix) with ESMTP
	id 87E9737B415; Wed, 26 Sep 2001 18:43:39 -0700 (PDT)
Received: from staff3.texas.net (staff3.texas.net [207.207.0.40])
	by mw1.texas.net (8.11.6/8.11.6) with ESMTP id f8R1hcQ02840;
	Wed, 26 Sep 2001 20:43:38 -0500 (CDT)
Received: (from doug@localhost)
	by staff3.texas.net (8.11.6/8.11.6) id f8R1hYw15926;
	Wed, 26 Sep 2001 20:43:34 -0500 (CDT)
	(envelope-from doug)
Date: Wed, 26 Sep 2001 20:43:33 -0500
From: Douglas Swarin <doug@staff.texas.net>
To: hackers@freebsd.org
Cc: dillon@earth.backplane.com, tmoestl@gmx.net, bp@freebsd.org
Subject: Re: more on Re: Please review: bugfix for vinvalbuf()
Message-ID: <20010926204333.A15865@staff.texas.net>
References: <20010711003926.B8799@crow.dom2ip.de> <200107110643.f6B6hTB24707@earth.backplane.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <200107110643.f6B6hTB24707@earth.backplane.com>; from dillon@earth.backplane.com on Tue, Jul 10, 2001 at 11:43:29PM -0700
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG

On Tue, Jul 10, 2001 at 11:43:29PM -0700, Matt Dillon wrote:
> 
> :Hi,
> :
> :I've just tripped over an obviously long-standing (since about
> :Jan. 1998) bug in vinvalbuf while looking into PR kern/26224. The
> :problematic code looks like (on -CURRENT):
> :
> :	/*
> :	 * Destroy the copy in the VM cache, too.
> :	 */
> :	mtx_lock(&vp->v_interlock);
> :	if (VOP_GETVOBJECT(vp, &object) == 0) {
> :		vm_object_page_remove(object, 0, 0,
> :			(flags & V_SAVE) ? TRUE : FALSE);
> :	}
> :	mtx_unlock(&vp->v_interlock);
> :
> :The locks seem to be needed for file systems that don't perform real
> :locking (on -STABLE, they are simplelocks).
> :This, however, is incorrect because vm_object_page_remove may sleep.
> :I've attached a patch that passes the interlock to
> :vm_object_page_remove, which in turn passes it to a modified version
> :of vm_page_sleep, which unlocks it around the sleep.
> :I think that this is correct, because the object should be in a valid
> :state when we sleep (and all checks are reexecuted in that case).
> :
> :Since I'm not very experienced with vfs and vm stuff, I'd be glad if
> :this patch could get some review. In particular, is the lock/unlock
> :pair really still needed, and are there notable differeces in -STABLE
> :(because the fix would need the be MFC'ed)?
> :
> :Thanks,
> :	- thomas
> 
>     Ok, I've looked at this more.  What is supposed to happen is that
>     the 'while (vp->v_numoutput) ...' code just above the section you
>     cited is supposed to prevent the deadlock.  The while code looks like
>     this:
> 
>         while (vp->v_numoutput > 0) {
>                 vp->v_flag |= VBWAIT;
>                 tsleep(&vp->v_numoutput, PVM, "vnvlbv", 0);
>         }
> 
>     However, as Ian points out in his followup, it doesn't work:
> 
> :...
> :I've seen a related deadlock here in 4.x with NFS; vm_fault locks
> :a page and then calls vput which aquires the v_interlock. This code
> :in vinvalbuf locks the v_interlock first, and then vm_object_page_remove()
> :locks the page. That's a simple lock-order reversal deadlock which I
> :guess would still exist even with this patch.
> 
>     It doesn't work for the simple reason that vm_fault isn't busying the
>     page for writing, it's busying it for reading, so v_numoutput will be
>     0.
> 
>     I think the best solution is to change vinvalbuf's wait loop to
>     wait on vm_object->paging_in_progress instead of vp->v_numoutput,
>     or perhaps to wait for both conditions to be satisfied.
> 
>     vm_object->paging_in_progress applies to reads and writes, while
>     vp->v_numoutput only applies to writes.
> 
>     I know this isn't the most correct solution... there is still the
>     lock reversal if vm_object_remove_pages() were ever to sleep.  The
>     idea is that it won't sleep if there is no paging in progress because
>     nobody will have any of the object's pages locked.  I think it is
>     the best we can do for the moment.  There are several places in
>     vm/vm_object.c where the same assumption is made (testing against
>     vm_object->paging_in_progress, though, which works, not vp->v_numoutput).
> 
>     -
> 
>     Thomas, if you are interested this could be a little project for you.
>     See if you can code-up another while() loop to also wait for
>     the object's paging_in_progress count to become 0 in the vinvalbuf()
>     code.  Look in vm/vm_object.c for examples of how to construct such
>     a loop.  I will be happy to review and test whatever you come up with
>     and I'm sure Ian will too!
> 
> 						-Matt
> 
> 

I recently mentioned on freebsd-stable in message

  <20010923124824.A16392@staff.texas.net>

a recurring rslock panic which I believe has been caused by the below
mentioned problem in vinvalbuf(). I have worked up a patch for -STABLE
(which should also apply to -CURRENT if there have not been major changes
to this code). The patch is appended to this message for review.

Data from the crash dump revealed the following:

SMP 2 cpus
IdlePTD 3555328
initial pcb at 2cf280
panicstr: rslock: cpu: 0, addr: 0xd7be66ec, lock: 0x00000001
panic messages:
---
panic: rslock: cpu: 0, addr: 0xd7be66ec, lock: 0x00000001
mp_lock = 00000001; cpuid = 0; lapic.id = 01000000
boot() called on cpu#0

---

#0  dumpsys () at /usr/src/sys/kern/kern_shutdown.c:473
#1  0xc016cf8f in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:313
#2  0xc016d3a9 in panic (fmt=0xc025bcce "rslock: cpu: %d, addr: 0x%08x, lock: 0x%08x")
    at /usr/src/sys/kern/kern_shutdown.c:581
#3  0xc025bcce in bsl1 ()
#4  0xc021eeac in _unlock_things (fs=0xd7a6dec4, dealloc=0)
    at /usr/src/sys/vm/vm_fault.c:147
#5  0xc021f8c7 in vm_fault (map=0xd7a6bf40, vaddr=673968128, fault_type=1 '\001',
      fault_flags=0) at /usr/src/sys/vm/vm_fault.c:826
#6  0xc025d016 in trap_pfault (frame=0xd7a6dfa8, usermode=1, eva=673972223)
    at /usr/src/sys/i386/i386/trap.c:829
#7  0xc025ca8b in trap (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 99049,
      tf_esi = 0, tf_ebp = -1077937884, tf_isp = -676929580, tf_ebx = 48110729,
      tf_edx = 0, tf_ecx = 1835007, tf_eax = 672137216, tf_trapno = 12, tf_err = 4,
      tf_eip = 134517190, tf_cs = 31, tf_eflags = 66070, tf_esp = -1077937940,
      tf_ss = 47})
    at /usr/src/sys/i386/i386/trap.c:359
#8  0x80491c6 in ?? ()
#9  0x8048d9e in ?? ()
#10 0x804a078 in ?? ()
#11 0x8048b45 in ?? ()

---

A quick review of processes revealed a process stuck in vmopar:

(kgdb) ps
...
46479 d7ffc560 d806e000 235886     1 46394  004006  3  tail         vmopar c09120c8
...

which was sleeping in vm_object_page_remove() in vinvalbuf():
      
(kgdb) btp 46479
 frame 0 at 0xd806fc8c: ebp d806fcb8, eip 0xc0170051 <tsleep+429>:      mov    0x141(%ebx),%al
 frame 1 at 0xd806fcb8: ebp d806fce0, eip 0xc02262e8 <vm_object_page_remove+268>:       add    $0x10,%esp
 frame 2 at 0xd806fce0: ebp d806fd2c, eip 0xc019a667 <vinvalbuf+803>:   add    $0x10,%esp
 frame 3 at 0xd806fd2c: ebp d806fd60, eip 0xc01d0aa0 <nfs_vinvalbuf+264>:       add    $0x18,%esp
 frame 4 at 0xd806fd60: ebp d806fe2c, eip 0xc01cf5d8 <nfs_bioread+540>: mov    %eax,0xffffff74(%ebp)
 frame 5 at 0xd806fe2c: ebp d806fe44, eip 0xc01f6842 <nfs_read+30>:     jmp    0xc01f6849 <nfs_read+37>
 frame 6 at 0xd806fe44: ebp d806fe78, eip 0xc01a22cc <vn_read+280>:     mov    %eax,0xffffffe8(%ebp)
 frame 7 at 0xd806fe78: ebp d806fef4, eip 0xc017b690 <dofileread+176>:  mov    %eax,%esi
 frame 8 at 0xd806fef4: ebp d806ff28, eip 0xc017b556 <read+54>: mov    %eax,%esi
 frame 9 at 0xd806ff28: ebp d806ffa0, eip 0xc025d745 <syscall2+537>:    mov    %eax,0xffffffb8(%ebp)


The patch is below, against vfs_subr.c 1.249.2.11 2001/09/11

--- vfs_subr.c  Tue Sep 11 04:49:53 2001
+++ vfs_subr.c.new      Wed Sep 26 15:23:09 2001
@@ -721,9 +721,9 @@
		}
	}

-	while (vp->v_numoutput > 0) {
-		vp->v_flag |= VBWAIT;
-		tsleep(&vp->v_numoutput, PVM, "vnvlbv", 0);
+	if (VOP_GETVOBJECT(vp, &object) == 0) {
+		while (object->paging_in_progress)
+			vm_object_pip_sleep(object, "vnvlbv");
	}

	splx(s);

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message