From owner-freebsd-current  Sun Feb 25 11:16:28 1996
Return-Path: owner-current
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.3/8.7.3) id LAA12858
          for current-outgoing; Sun, 25 Feb 1996 11:16:28 -0800 (PST)
Received: from godzilla.zeta.org.au (godzilla.zeta.org.au [203.2.228.19])
          by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id LAA12845
          for <freebsd-current@freebsd.org>; Sun, 25 Feb 1996 11:16:19 -0800 (PST)
Received: (from bde@localhost) by godzilla.zeta.org.au (8.6.12/8.6.9) id GAA31346; Mon, 26 Feb 1996 06:11:11 +1100
Date: Mon, 26 Feb 1996 06:11:11 +1100
From: Bruce Evans <bde@zeta.org.au>
Message-Id: <199602251911.GAA31346@godzilla.zeta.org.au>
To: bde@zeta.org.au, pst@shockwave.com
Subject: Re: Bug in libc/db/hash/hash.c???
Cc: freebsd-current@freebsd.org, jhay@mikom.csir.co.za
Sender: owner-current@freebsd.org
Precedence: bulk

>  I'm not sure how postponing the stat helps.  The problem seems to be
>  with concurrent accesses to the database.  First cap_mkdb opens it and
>  it gets initialized.  This hasn't changed.  Then the getcap library
>  opens it and it gets initialized again because the file length is 0.
>  Oops.

>I'm not sure what code you're looking at, but that doesn't match my version
>of cap_mkdb.  There is no getcap library code linked with this file, it's
>merely opened once, with flags O_CREAT | O_TRUNC, no more.

I'm looking at the standard version of cap_mkdb.c, which hasn't changed
since 4.4lite.  It calls cgetnext().

>  I noticed a(nother) Heisenbug in the old code.  statbuf.st_size isn't
>  initialized if stat() fails.  This only matters if stat() fails with
>  an error other than ENOENT.  (There is a similar bug involving errno.)

>Yes, which is why I changed it to a fstat and I only check statbuf.st_size
>if the fstat succeeded.  Again, I did not save/restore errno because a
>perusal of the surrounding code shows that it's in an indeterminate state
>at that point (that is, there are calls immediately following it that would
>change the state).

I think the fix works because the O_CREAT flag is now honoured (perhaps
it should check O_TRUNC too?).  I think the database was messed up when
cgetnext() opened it without (O_CREAT | O_TRUNC).

>Now, the big question: "Is there still a bug with this?"  Even if cap_mkdb
>doesn't do what you suggest,  what happens if someone /does/ do concurrent
>opens of a file?  You're correct, there -is- a race condition for the window
>between the open and the first hash_sync.  We could either reduce that window
>by doing an initial hash_sync immediately after the table is initialized
>(yuck for two reasons), or toss this entire idea as being bad and revert
>back to pre-pst code.

I doubt that the old way survived concurrent opens.  The second opener
got an empty database if the first opener hasn't synced anything.  How
could that work?  I think it usually gets read error early, so it usually
fails safely.  Worse can probably happen if the first opener the database
is half written.

You've certainly introduced a new race, but I wouldn't worry about it
especially.  There must be more opportunities to read inconsistent
data while the database is being updated.

Bruce