Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 26 Feb 1996 06:11:11 +1100
From:      Bruce Evans <bde@zeta.org.au>
To:        bde@zeta.org.au, pst@shockwave.com
Cc:        freebsd-current@freebsd.org, jhay@mikom.csir.co.za
Subject:   Re: Bug in libc/db/hash/hash.c???
Message-ID:  <199602251911.GAA31346@godzilla.zeta.org.au>

next in thread | raw e-mail | index | archive | help
>  I'm not sure how postponing the stat helps.  The problem seems to be
>  with concurrent accesses to the database.  First cap_mkdb opens it and
>  it gets initialized.  This hasn't changed.  Then the getcap library
>  opens it and it gets initialized again because the file length is 0.
>  Oops.

>I'm not sure what code you're looking at, but that doesn't match my version
>of cap_mkdb.  There is no getcap library code linked with this file, it's
>merely opened once, with flags O_CREAT | O_TRUNC, no more.

I'm looking at the standard version of cap_mkdb.c, which hasn't changed
since 4.4lite.  It calls cgetnext().

>  I noticed a(nother) Heisenbug in the old code.  statbuf.st_size isn't
>  initialized if stat() fails.  This only matters if stat() fails with
>  an error other than ENOENT.  (There is a similar bug involving errno.)

>Yes, which is why I changed it to a fstat and I only check statbuf.st_size
>if the fstat succeeded.  Again, I did not save/restore errno because a
>perusal of the surrounding code shows that it's in an indeterminate state
>at that point (that is, there are calls immediately following it that would
>change the state).

I think the fix works because the O_CREAT flag is now honoured (perhaps
it should check O_TRUNC too?).  I think the database was messed up when
cgetnext() opened it without (O_CREAT | O_TRUNC).

>Now, the big question: "Is there still a bug with this?"  Even if cap_mkdb
>doesn't do what you suggest,  what happens if someone /does/ do concurrent
>opens of a file?  You're correct, there -is- a race condition for the window
>between the open and the first hash_sync.  We could either reduce that window
>by doing an initial hash_sync immediately after the table is initialized
>(yuck for two reasons), or toss this entire idea as being bad and revert
>back to pre-pst code.

I doubt that the old way survived concurrent opens.  The second opener
got an empty database if the first opener hasn't synced anything.  How
could that work?  I think it usually gets read error early, so it usually
fails safely.  Worse can probably happen if the first opener the database
is half written.

You've certainly introduced a new race, but I wouldn't worry about it
especially.  There must be more opportunities to read inconsistent
data while the database is being updated.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199602251911.GAA31346>