From owner-freebsd-current  Thu Sep 24 06:33:09 1998
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id GAA29226
          for freebsd-current-outgoing; Thu, 24 Sep 1998 06:33:09 -0700 (PDT)
          (envelope-from owner-freebsd-current@FreeBSD.ORG)
Received: from lor.watermarkgroup.com (lor.watermarkgroup.com [207.202.73.33])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id GAA29219
          for <current@freebsd.org>; Thu, 24 Sep 1998 06:33:01 -0700 (PDT)
          (envelope-from luoqi@watermarkgroup.com)
Received: (from luoqi@localhost)
	by lor.watermarkgroup.com (8.8.8/8.8.8) id JAA23662;
	Thu, 24 Sep 1998 09:32:27 -0400 (EDT)
	(envelope-from luoqi)
Date: Thu, 24 Sep 1998 09:32:27 -0400 (EDT)
From: Luoqi Chen <luoqi@watermarkgroup.com>
Message-Id: <199809241332.JAA23662@lor.watermarkgroup.com>
To: archer@lucky.net, luoqi@watermarkgroup.com
Subject: Re: deadlock in vm_fault()
Cc: current@FreeBSD.ORG
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> In article <199809232224.SAA17260@lor.watermarkgroup.com> you wrote:
> LC> I ran into a deadlock in vm_fault code today while making -j12 world.
> LC> It's caused by a reversed lock acquisition order. The normal order of
> LC> acquisition is vm map lock first and vnode lock next (if the fault is in
> LC> a vnode backed object). During the course of the fault handling, lock on
> LC> the vm map is released prior to paging io and has to be reacquired if it's
> LC> modified by another process during the io. Before reacquiring the lock of
> LC> vm map, we have to release the vnode lock we still hold, otherwise another
> LC> page fault in the same map/vnode would send us into a deadlock.
> 
> LC> Attached is a fix for this problem. Would any of the vm/lock experts out
> LC> there review this? Thanks.
> 
> I've seen something strange during the same -j12 buildworld. In fact,
> it was just that ld hang, apparently not doing anything. The rest
> of the system seemed to be alive. Though in some 3 or 4 hours machine
> rebooted (it had kernel with broken crash dump generation, so I do
> not know what actually happened).
> 
> May it be related?
> 
> LC> -lq
> 
> ---
> Reality is an obstacle to hallucination.
> 
It could very well be. What I saw on my machine was a deadlock between the
exec_map and sh inode, which means all existing processes were running fine,
but to fork() and then exec() a new image hang waiting for the exec_map.
Children of cron piled up as time went by, and eventually that could kill
the system (it's not clear how. I didn't wait for it to happen, I went into
the debugger, took a dump and rebooted).

-lq

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message