Date: Tue, 16 Apr 2002 23:43:09 -0700 From: Terry Lambert <tlambert2@mindspring.com> To: callum.gibson@db.com Cc: hackers@FreeBSD.ORG Subject: Re: ipcrm/shmctl failure (fix NOT found) Message-ID: <3CBD197D.438AE406@mindspring.com> References: <20020417051750.14714.qmail@merton.aus.deuba.com>
next in thread | previous in thread | raw e-mail | index | archive | help
callum.gibson@db.com wrote: > tlambert2@mindspring.com writes: > }> I didn't know if you were talking about "not incrementing" when the > }> process exits or when it rforked. If you rfork(RFMEM), you'd want to > }> increment the vm_refcnt I'm pretty sure (and it does). > } > }No, you really don't. > > I don't know or we don't want to increment the vm_refcnt when rforking? Yopu don't want to increment the refcnt on the shared memory segment twice. > }You have a number of references on the vm (one per RFMEM) process. > }The correct translation of these references is to have a *single* > }reference count instance to the shared memory segment itself, > }rather than incrementing the segment references, shmseg->shm_nattch. > > Ok - so shmfork can not increment shm_nattch. But you still want > to increment vm_refcnt when you rfork or your second sentence is a > contradiction (one ref per RFMEM). But you are saying there is a single > vm (albeit with multiple references to it) but because it's only one vm > there is in effect a _single_ reference to the shmseg from that. > Do I understand you correctly? Yes. Otherwise, for N processes, you end up with N^2 references, which is clearly incorrect. > }If the VM reference counting on normal segments weren't working, > }then there'd be a huge-and-obvious-to-everyone problem. I think > }that incrementing the shmseg->shm_nattch on the vfork is definitely > }the wrong thing to do. > > It's surprising what people don't notice. Use of rfork() in combination with shmat() is uncommenm at best, and never done in any code that ships with FreeBSD, including not being done by any ports, at worst. However, I can say from personal experience with a commercial product that used both of these at the same time in the code, that there is no bad interaction in the FreeBSD-only case, so long as there is not an RFMEM-without-RFPROC or RFPROC-without-RFMEM, which is the Linux threads case. > }Since your problem is a symptom of increment of shmseg->shm_nattch > }without a corresponding decrement, then the *only* code that can be > }involved is shmat() and shmfork() for the increment, and for the > }delete, shm_delete_mapping(), which is called from shmexit() and > }shmdt(). > > No, I don't think I said that - all I know is that shmexit never gets > called and that seems to be because vm_refcnt is incremented. So you need to instrument the code, and print out the pric ID, which will be different, at each increment and decrement. When you see an increment without a corresponding decrement, you will have found your problem. > }That basically impies that RFMEM is not set when vm_fork() is called > }from the Linux ABI code, since that's the only place that calls the > }shmfork() code. > > Nah, I checked that. It does a clone(CLONEVM) in the linux threads lib > which translates to a rfork(RFMEM) in i386/linux/linux_machdep.c . Again: you need to instrument the code. What the code says it does, and what the code actually does may be two different things; for all I know, you are loading the Linux ABI and the Linux threads as a kernel module, and the header file manifest constants between the module(s) and the kernel code are different. > }> The whole bug is > }> the point that vm_refcnt is never decremented and the shm_nattch is > }> therefore only decremented if you explicitly detach from memory (which > }> will call shm_delete_mapping). So if an rfork'd program uses shared mem > }> and crashes, the vm_refcnt stays > 1, the shared mem is never freed > }> because shmexit -> shm_delete_mapping is never called. Hopefully this > }> only affects shared mem, as there is more stuff inside the if statement > }> you include below other than the shmexit. > }It should not be incremented in the first place. It is erroneously > }incremented, IMO. > > You mean shm_nattch is erroneously incremented, not vm_refcnt I think? Yes. > }> Don't know about the race, although one is mentioned in the cvs logs on > }> the current branch. I presume you're talking SMP only though? > }> As a side note, in current this reads: > }> if (--vmspace->vm_refcnt == 0) { > } > } > }Yes. This doesn't have the race, because there isn't a window between > }the time of the compare and the decrement. > > Perhaps what I'm really seeing is the race then? I do have a single vm > with a single ref to a shmseg, but when the process crashes all the > rforked processes exit and clobber the vm_refcnt so that shmexit never > gets called to decrement shm_nattch to zero? A new theory... It's possible. You need to instrument the code, and count the kernel printfs for the increment and decrement. If they are equal, then it there should be zero references; if there are not, then you know that you lost a race. I expect that they will not be equal. A big-time crash like this is really an extreme condition, but you pointed out that the -current code doesn't have the race. I had assumed you were running -current because of this. > }> but I don't know who calls that unless it somehow happens from cpu_exit. > }> > }The reference is initialized to 1 when it is created. See vmspace_alloc() > }in vm_map.c. > > But where does vm_refcnt go to zero (in 4.5)? It doesn't. It gets compared to 1, and destroyed before it gets decremented. > }> Not sure that seeing how linux does it would help in this regard. > }I think it is Linux specific. I think it is related to RFMEM not > }being set in flags when the vm_fork() is called. > > As best I could tell, RFMEM is, in fact, set by the library and by the > kernel. Changing: if (flags & RFMEM) { p2->p_vmspace = p1->p_vmspace; p1->p_vmspace->vm_refcnt++; } To: printf( "vm_fork(%d): RFMEM %sset\r\n", p2->p_pid, (flags & RFMEM)?"":"not "); if (flags & RFMEM) { p2->p_vmspace = p1->p_vmspace; p1->p_vmspace->vm_refcnt++; } Would let you know for sure... 8-). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3CBD197D.438AE406>