From owner-freebsd-current Sun Feb 25 11:16:28 1996 Return-Path: owner-current Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id LAA12858 for current-outgoing; Sun, 25 Feb 1996 11:16:28 -0800 (PST) Received: from godzilla.zeta.org.au (godzilla.zeta.org.au [203.2.228.19]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id LAA12845 for ; Sun, 25 Feb 1996 11:16:19 -0800 (PST) Received: (from bde@localhost) by godzilla.zeta.org.au (8.6.12/8.6.9) id GAA31346; Mon, 26 Feb 1996 06:11:11 +1100 Date: Mon, 26 Feb 1996 06:11:11 +1100 From: Bruce Evans Message-Id: <199602251911.GAA31346@godzilla.zeta.org.au> To: bde@zeta.org.au, pst@shockwave.com Subject: Re: Bug in libc/db/hash/hash.c??? Cc: freebsd-current@freebsd.org, jhay@mikom.csir.co.za Sender: owner-current@freebsd.org Precedence: bulk > I'm not sure how postponing the stat helps. The problem seems to be > with concurrent accesses to the database. First cap_mkdb opens it and > it gets initialized. This hasn't changed. Then the getcap library > opens it and it gets initialized again because the file length is 0. > Oops. >I'm not sure what code you're looking at, but that doesn't match my version >of cap_mkdb. There is no getcap library code linked with this file, it's >merely opened once, with flags O_CREAT | O_TRUNC, no more. I'm looking at the standard version of cap_mkdb.c, which hasn't changed since 4.4lite. It calls cgetnext(). > I noticed a(nother) Heisenbug in the old code. statbuf.st_size isn't > initialized if stat() fails. This only matters if stat() fails with > an error other than ENOENT. (There is a similar bug involving errno.) >Yes, which is why I changed it to a fstat and I only check statbuf.st_size >if the fstat succeeded. Again, I did not save/restore errno because a >perusal of the surrounding code shows that it's in an indeterminate state >at that point (that is, there are calls immediately following it that would >change the state). I think the fix works because the O_CREAT flag is now honoured (perhaps it should check O_TRUNC too?). I think the database was messed up when cgetnext() opened it without (O_CREAT | O_TRUNC). >Now, the big question: "Is there still a bug with this?" Even if cap_mkdb >doesn't do what you suggest, what happens if someone /does/ do concurrent >opens of a file? You're correct, there -is- a race condition for the window >between the open and the first hash_sync. We could either reduce that window >by doing an initial hash_sync immediately after the table is initialized >(yuck for two reasons), or toss this entire idea as being bad and revert >back to pre-pst code. I doubt that the old way survived concurrent opens. The second opener got an empty database if the first opener hasn't synced anything. How could that work? I think it usually gets read error early, so it usually fails safely. Worse can probably happen if the first opener the database is half written. You've certainly introduced a new race, but I wouldn't worry about it especially. There must be more opportunities to read inconsistent data while the database is being updated. Bruce