Date: Sun, 25 Feb 1996 11:28:15 -0800 From: Paul Traina <pst@shockwave.com> To: Bruce Evans <bde@zeta.org.au> Cc: freebsd-current@freebsd.org, jhay@mikom.csir.co.za Subject: Re: Bug in libc/db/hash/hash.c??? Message-ID: <199602251928.LAA00548@precipice.shockwave.com> In-Reply-To: Your message of "Mon, 26 Feb 1996 06:11:11 %2B1100." <199602251911.GAA31346@godzilla.zeta.org.au>
next in thread | previous in thread | raw e-mail | index | archive | help
From: Bruce Evans <bde@zeta.org.au> Subject: Re: Bug in libc/db/hash/hash.c??? > I'm not sure how postponing the stat helps. The problem seems to be > with concurrent accesses to the database. First cap_mkdb opens it and > it gets initialized. This hasn't changed. Then the getcap library > opens it and it gets initialized again because the file length is 0. > Oops. >I'm not sure what code you're looking at, but that doesn't match my version >of cap_mkdb. There is no getcap library code linked with this file, it's >merely opened once, with flags O_CREAT | O_TRUNC, no more. I'm looking at the standard version of cap_mkdb.c, which hasn't changed since 4.4lite. It calls cgetnext(). Yep, sorry, missed the implicit open there. > I noticed a(nother) Heisenbug in the old code. statbuf.st_size isn't > initialized if stat() fails. This only matters if stat() fails with > an error other than ENOENT. (There is a similar bug involving errno.) >Yes, which is why I changed it to a fstat and I only check statbuf.st_size >if the fstat succeeded. Again, I did not save/restore errno because a >perusal of the surrounding code shows that it's in an indeterminate state >at that point (that is, there are calls immediately following it that would >change the state). I think the fix works because the O_CREAT flag is now honoured (perhaps it should check O_TRUNC too?). I think the database was messed up when cgetnext() opened it without (O_CREAT | O_TRUNC). Do you mean it should require both O_CREAT and O_TRUNC or either? If either, that case is covered earlier. I'm not certain both should be required if the file is already 0 sized. However, I'm easy going, whichever you want is fine by me, just as long as you clarify what you want. :-) >Now, the big question: "Is there still a bug with this?" Even if cap_mkdb >doesn't do what you suggest, what happens if someone /does/ do concurrent >opens of a file? You're correct, there -is- a race condition for the window >between the open and the first hash_sync. We could either reduce that windo >>w >by doing an initial hash_sync immediately after the table is initialized >(yuck for two reasons), or toss this entire idea as being bad and revert >back to pre-pst code. I doubt that the old way survived concurrent opens. The second opener got an empty database if the first opener hasn't synced anything. How could that work? I think it usually gets read error early, so it usually fails safely. Worse can probably happen if the first opener the database is half written. You've certainly introduced a new race, but I wouldn't worry about it especially. There must be more opportunities to read inconsistent data while the database is being updated. Yeah, it looks like paging to disk is rather asynchronous in nature. In my opinion, the whole damn thing should internally be protected with advisory locks, but that completely breaks ndbm semantics where the callers expect to have to lock the database themselves. Sigh.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199602251928.LAA00548>