From owner-freebsd-arch Wed Apr 18 16:23:30 2001 Delivered-To: freebsd-arch@freebsd.org Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134]) by hub.freebsd.org (Postfix) with ESMTP id CF9E337B423 for ; Wed, 18 Apr 2001 16:23:26 -0700 (PDT) (envelope-from tlambert@usr07.primenet.com) Received: (from daemon@localhost) by smtp04.primenet.com (8.9.3/8.9.3) id QAA08396; Wed, 18 Apr 2001 16:23:22 -0700 (MST) Received: from usr07.primenet.com(206.165.6.207) via SMTP by smtp04.primenet.com, id smtpdAAAvaaqxq; Wed Apr 18 16:23:19 2001 Received: (from tlambert@localhost) by usr07.primenet.com (8.8.5/8.8.5) id QAA22635; Wed, 18 Apr 2001 16:23:59 -0700 (MST) From: Terry Lambert Message-Id: <200104182323.QAA22635@usr07.primenet.com> Subject: Re: Found BAD BUG: squashed To: dillon@earth.backplane.com (Matt Dillon) Date: Wed, 18 Apr 2001 23:23:54 +0000 (GMT) Cc: tlambert@primenet.com (Terry Lambert), arch@FreeBSD.ORG In-Reply-To: <200104182212.f3IMCPD45159@earth.backplane.com> from "Matt Dillon" at Apr 18, 2001 03:12:25 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > :I have identified the bug. > : > :It turns out that the bug causes memory corruption by freeing > :but continuing to use a credential, and it only occurs in big > :resource usage cases, and then seemingly at random. > : > :It is the fact that there are two credentials per socket, one > :for the socket, and one for the descriptor itself. > : > :The cr_ref is an unsigned short, and ... > > Ahhh.. Excellent find Terry! > > Why do you want to cycle a new credential when the unsigned short > ref count reaches 65535 (what you call the right fix) verses simply > changing the ref count to an unsigned long (what you call the quick > and dirty fix)? That seems kinda reversed to me. Because my use is not the common use, and it bloats the cred structure to a non-4-byte boundary, which I thought might end up being problematic for some people. If "the right fix" is implemented, it will end up giving out a duplicate cred every 32767 sockets (2 refs per socket: sock, fdp). The use of the & (address of) lets it replace the nearly exhausted cred where it's being cloned from; going the other direction would mean one duplicate cred per, on overflow. There would still be a new cred per dup() of an FD with an "exhausted" credential, but that's not really a problem. The stylistic change would allow this, but I admit it's not necessary, if everyone is willing to live with a bloated cred (I dent a "bloat the cred struct" patch to -current just now). The major thing the stylistic change would have done is allow me to change where crhold() got its credential. I could then #define it, and use __FILE__ and __LINE__, and store that away in the cred data as a "who allocated me?" cookie. Then when I did the duplicate free later, it would tell me the exact line of code where the allocation ("hold") came from. Really, this is a more general case of fixing up the way reference counts are done, to turn the new reference into an rvalue, so it can be replaced without telling the code about it (other than a recompile). In practice, you'd use the fast code, but you would always have the option of substituting the code (everywhere at once) to fix the problem, or, more importantly, future problems. Actually, this particular bug was a case of "When you hear hoof-beats DON'T look for horses, look for Emu!"... all the style change would have done is prove that it _wasn't_ a reference hold/release imbalance. That would have been a lot, but it would still have required me staring at the code (which it did). So the second part would be a generic reference count macro system that could be made to also do bounds checking on the count. The best way to do that is (again), a macro. Really, this wants: #define CRHOLD(x) GENERICHOLD(struct ucred,(x), poolname, MAX_USHORT) #define CRFREE(x) GENERICFREE(struct ucred,(x), poolanme) ...etc.. Then you vary the GENERIC macros based on debugging level. I'm tempted to suggest that the INVARIANTS code wants to seperately reference count objects, after allocating them larger than they are, and passing a pointer into itself that can be backed out (like the old malloc hack to do the same thing), to keep it from spamming reference counts. But that's a lot more work. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message