From owner-freebsd-hackers Tue Apr 16 22:18: 3 2002 Delivered-To: freebsd-hackers@freebsd.org Received: from imr1.aus.deuba.com (bagheera.aus.deuba.com [203.0.62.7]) by hub.freebsd.org (Postfix) with ESMTP id 0181F37B416 for ; Tue, 16 Apr 2002 22:17:57 -0700 (PDT) Received: from imr1.aus.deuba.com by imr1.aus.deuba.com id g3H5HpYw026192; Wed, 17 Apr 2002 15:17:51 +1000 (EST) Received: from merton.aus.deuba.com by imr1.aus.deuba.com id g3H5HoYr026186; Wed, 17 Apr 2002 15:17:50 +1000 (EST) Received: (qmail 14715 invoked by uid 107); 17 Apr 2002 05:17:50 -0000 Message-ID: <20020417051750.14714.qmail@merton.aus.deuba.com> From: callum.gibson@db.com Subject: Re: ipcrm/shmctl failure (fix NOT found) To: tlambert2@mindspring.com Date: Wed, 17 Apr 2002 15:17:50 +1000 (EST) Cc: hackers@FreeBSD.ORG In-Reply-To: <3CBCFF0E.56972E35@mindspring.com> from "tlambert2@mindspring.com" at Apr 16, 2002 09:50:22 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG tlambert2@mindspring.com writes: }> I didn't know if you were talking about "not incrementing" when the }> process exits or when it rforked. If you rfork(RFMEM), you'd want to }> increment the vm_refcnt I'm pretty sure (and it does). } }No, you really don't. I don't know or we don't want to increment the vm_refcnt when rforking? }You have a number of references on the vm (one per RFMEM) process. }The correct translation of these references is to have a *single* }reference count instance to the shared memory segment itself, }rather than incrementing the segment references, shmseg->shm_nattch. Ok - so shmfork can not increment shm_nattch. But you still want to increment vm_refcnt when you rfork or your second sentence is a contradiction (one ref per RFMEM). But you are saying there is a single vm (albeit with multiple references to it) but because it's only one vm there is in effect a _single_ reference to the shmseg from that. Do I understand you correctly? }If the VM reference counting on normal segments weren't working, }then there'd be a huge-and-obvious-to-everyone problem. I think }that incrementing the shmseg->shm_nattch on the vfork is definitely }the wrong thing to do. It's surprising what people don't notice. }Since your problem is a symptom of increment of shmseg->shm_nattch }without a corresponding decrement, then the *only* code that can be }involved is shmat() and shmfork() for the increment, and for the }delete, shm_delete_mapping(), which is called from shmexit() and }shmdt(). No, I don't think I said that - all I know is that shmexit never gets called and that seems to be because vm_refcnt is incremented. }That basically impies that RFMEM is not set when vm_fork() is called }from the Linux ABI code, since that's the only place that calls the }shmfork() code. Nah, I checked that. It does a clone(CLONEVM) in the linux threads lib which translates to a rfork(RFMEM) in i386/linux/linux_machdep.c . }> The whole bug is }> the point that vm_refcnt is never decremented and the shm_nattch is }> therefore only decremented if you explicitly detach from memory (which }> will call shm_delete_mapping). So if an rfork'd program uses shared mem }> and crashes, the vm_refcnt stays > 1, the shared mem is never freed }> because shmexit -> shm_delete_mapping is never called. Hopefully this }> only affects shared mem, as there is more stuff inside the if statement }> you include below other than the shmexit. }It should not be incremented in the first place. It is erroneously }incremented, IMO. You mean shm_nattch is erroneously incremented, not vm_refcnt I think? }> }...in other words, the resource track exit does not occur until }> }the reference count is about to go from 1->0. Note that there }> }is an implicit race here, actually, between the reference and }> }the detach, in which another instance could conceivably be }> }created. 8-(. }> }> Don't know about the race, although one is mentioned in the cvs logs on }> the current branch. I presume you're talking SMP only though? }> As a side note, in current this reads: }> if (--vmspace->vm_refcnt == 0) { } } }Yes. This doesn't have the race, because there isn't a window between }the time of the compare and the decrement. Perhaps what I'm really seeing is the race then? I do have a single vm with a single ref to a shmseg, but when the process crashes all the rforked processes exit and clobber the vm_refcnt so that shmexit never gets called to decrement shm_nattch to zero? A new theory... }> without doing the final decrement to zero. There is a comment just above }> cpu_exit which says: }> }> * The address space is released by "vmspace_free(p->p_vmspace)"; }> }> but I don't know who calls that unless it somehow happens from cpu_exit. }The reference is initialized to 1 when it is created. See vmspace_alloc() }in vm_map.c. But where does vm_refcnt go to zero (in 4.5)? }> This is not limited to linux threads, it should affect anything which }> increments vm_refcnt and allocates shared mem. It's obvious what should }> happen, just not obvious how to implement it without causing a side }effect. }> Not sure that seeing how linux does it would help in this regard. }I think it is Linux specific. I think it is related to RFMEM not }being set in flags when the vm_fork() is called. As best I could tell, RFMEM is, in fact, set by the library and by the kernel. Callum Gibson callum.gibson@db.com Global Markets IT, Deutsche Bank, Australia 61 2 9258 1620 ### The opinions in this message are mine and not Deutsche's ### To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message