From owner-freebsd-hackers  Tue Apr 16 22:18: 3 2002
Delivered-To: freebsd-hackers@freebsd.org
Received: from imr1.aus.deuba.com (bagheera.aus.deuba.com [203.0.62.7])
	by hub.freebsd.org (Postfix) with ESMTP id 0181F37B416
	for <hackers@FreeBSD.ORG>; Tue, 16 Apr 2002 22:17:57 -0700 (PDT)
Received: from imr1.aus.deuba.com by imr1.aus.deuba.com 
         id g3H5HpYw026192; Wed, 17 Apr 2002 15:17:51 +1000 (EST)
Received: from merton.aus.deuba.com by imr1.aus.deuba.com 
         id g3H5HoYr026186; Wed, 17 Apr 2002 15:17:50 +1000 (EST)
Received: (qmail 14715 invoked by uid 107); 17 Apr 2002 05:17:50 -0000
Message-ID: <20020417051750.14714.qmail@merton.aus.deuba.com>
From: callum.gibson@db.com
Subject: Re: ipcrm/shmctl failure (fix NOT found)
To: tlambert2@mindspring.com
Date: Wed, 17 Apr 2002 15:17:50 +1000 (EST)
Cc: hackers@FreeBSD.ORG
In-Reply-To: <3CBCFF0E.56972E35@mindspring.com> from "tlambert2@mindspring.com" at Apr 16, 2002 09:50:22 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG

tlambert2@mindspring.com writes:
}> I didn't know if you were talking about "not incrementing" when the
}> process exits or when it rforked. If you rfork(RFMEM), you'd want to
}> increment the vm_refcnt I'm pretty sure (and it does).
}
}No, you really don't.

I don't know or we don't want to increment the vm_refcnt when rforking?

}You have a number of references on the vm (one per RFMEM) process.
}The correct translation of these references is to have a *single*
}reference count instance to the shared memory segment itself,
}rather than incrementing the segment references, shmseg->shm_nattch.

Ok - so shmfork can not increment shm_nattch. But you still want
to increment vm_refcnt when you rfork or your second sentence is a
contradiction (one ref per RFMEM). But you are saying there is a single
vm (albeit with multiple references to it) but because it's only one vm
there is in effect a _single_ reference to the shmseg from that.
Do I understand you correctly?

}If the VM reference counting on normal segments weren't working,
}then there'd be a huge-and-obvious-to-everyone problem.  I think
}that incrementing the shmseg->shm_nattch on the vfork is definitely
}the wrong thing to do.

It's surprising what people don't notice.

}Since your problem is a symptom of increment of shmseg->shm_nattch
}without a corresponding decrement, then the *only* code that can be
}involved is shmat() and shmfork() for the increment, and for the
}delete, shm_delete_mapping(), which is called from shmexit() and
}shmdt().

No, I don't think I said that - all I know is that shmexit never gets
called and that seems to be because vm_refcnt is incremented.

}That basically impies that RFMEM is not set when vm_fork() is called
}from the Linux ABI code, since that's the only place that calls the
}shmfork() code.

Nah, I checked that. It does a clone(CLONEVM) in the linux threads lib
which translates to a rfork(RFMEM) in i386/linux/linux_machdep.c .

}> The whole bug is
}> the point that vm_refcnt is never decremented and the shm_nattch is
}> therefore only decremented if you explicitly detach from memory (which
}> will call shm_delete_mapping). So if an rfork'd program uses shared mem
}> and crashes, the vm_refcnt stays > 1, the shared mem is never freed
}> because shmexit -> shm_delete_mapping is never called.  Hopefully this
}> only affects shared mem, as there is more stuff inside the if statement
}> you include below other than the shmexit.
}It should not be incremented in the first place.  It is erroneously
}incremented, IMO.

You mean shm_nattch is erroneously incremented, not vm_refcnt I think?

}> }...in other words, the resource track exit does not occur until
}> }the reference count is about to go from 1->0.  Note that there
}> }is an implicit race here, actually, between the reference and
}> }the detach, in which another instance could conceivably be
}> }created.  8-(.
}>
}> Don't know about the race, although one is mentioned in the cvs logs on
}> the current branch. I presume you're talking SMP only though?
}> As a side note, in current this reads:
}>         if (--vmspace->vm_refcnt == 0) {
}
}
}Yes.  This doesn't have the race, because there isn't a window between
}the time of the compare and the decrement.

Perhaps what I'm really seeing is the race then? I do have a single vm
with a single ref to a shmseg, but when the process crashes all the
rforked processes exit and clobber the vm_refcnt so that shmexit never
gets called to decrement shm_nattch to zero? A new theory...

}> without doing the final decrement to zero. There is a comment just above
}> cpu_exit which says:
}>
}>          * The address space is released by "vmspace_free(p->p_vmspace)";
}>
}> but I don't know who calls that unless it somehow happens from cpu_exit.
}The reference is initialized to 1 when it is created.  See vmspace_alloc()
}in vm_map.c.

But where does vm_refcnt go to zero (in 4.5)?

}> This is not limited to linux threads, it should affect anything which
}> increments vm_refcnt and allocates shared mem. It's obvious what should
}> happen, just not obvious how to implement it without causing a side
}effect.
}> Not sure that seeing how linux does it would help in this regard.
}I think it is Linux specific.  I think it is related to RFMEM not
}being set in flags when the vm_fork() is called.

As best I could tell, RFMEM is, in fact, set by the library and by the
kernel.

Callum Gibson                               callum.gibson@db.com
Global Markets IT, Deutsche Bank, Australia       61 2 9258 1620
### The opinions in this message are mine and not Deutsche's ###

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message