Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 15 Dec 2012 11:53:24 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        David Noel <david.i.noel@gmail.com>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: postgres, initdb, FreeBSD bug?
Message-ID:  <20121215095324.GU71906@kib.kiev.ua>
In-Reply-To: <CAHAXwYA6o_hBpAvm=H4bpXNOKr7ec10zH30G3KWQXg7JEj0mDQ@mail.gmail.com>
References:  <CAHAXwYA6o_hBpAvm=H4bpXNOKr7ec10zH30G3KWQXg7JEj0mDQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--wu6d9FdQ4ohoCGf7
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Dec 14, 2012 at 10:43:00PM -0600, David Noel wrote:
> I've been fighting with a bug I can't quite seem to figure out and was
> told that this might be the place to come. I'm running
> postgresql-9.2.2 on FreeBSD 8.3-RELEASE-p5 and am having things break
> down when I try to run initdb. I got in contact with the pgsql-general
> mailing list and we debugged the issue to the point where it seemed
> that this might be a FreeBSD-related error. Relevant excerpts from
> several email are below that piece together the error:
>=20
> I'm running into the following error message when running initdb (FreeBSD=
 host):
>=20
>    ygg# /usr/local/etc/rc.d/postgresql initdb -D /zdb/pgsql/data --debug
>    The files belonging to this database system will be owned by user "pgs=
ql".
>    This user must also own the server process.
>=20
>    The database cluster will be initialized with locales
>      COLLATE:  C
>      CTYPE:    en_US.UTF-8
>      MESSAGES: en_US.UTF-8
>      MONETARY: en_US.UTF-8
>      NUMERIC:  en_US.UTF-8
>      TIME:     en_US.UTF-8
>    The default text search configuration will be set to "english".
>=20
>    creating directory /zdb/pgsql/data ... ok
>    creating subdirectories ... ok
>    selecting default max_connections ... 100
>    selecting default shared_buffers ... 32MB
>    creating configuration files ... ok
>    creating template1 database in /zdb/pgsql/data/base/1 ... FATAL:
>    could not open file "pg_xlog/000000010000000000000001" (log file 0,
>    segment 1): No such file or directory
>    child process exited with exit code 1
>    initdb: removing data directory "/zdb/pgsql/data"
>=20
>  ...
>=20
>  Interestingly, I have a second--virtually identical--server that I
>   just tried initdb on. FreeBSD 8.3-RELEASE-p5, postgresql-server-9.2.2.
>   Exact same "FATAL: could not open file pg_xlog" error. So it is
>   reproducible.
>=20
>  ...
>=20
>  The relevant part of the ktrace output is
>=20
>    71502 postgres CALL  unlink(0x7fffffffc130)
>    71502 postgres NAMI  "pg_xlog/xlogtemp.71502"
>    71502 postgres RET   unlink -1 errno 2 No such file or directory
>    71502 postgres CALL
>  open(0x7fffffffc130,O_RDWR|O_CREAT|O_EXCL,S_IRUSR|S_IWUSR)
>    71502 postgres NAMI  "pg_xlog/xlogtemp.71502"
>    71502 postgres RET   open 3
>    71502 postgres CALL  write(0x3,0x801a56030,0x2000)
>    71502 postgres GIO   fd 3 wrote 4096 bytes
>    .... a lot of uninteresting write() calls snipped ...
>    71502 postgres RET   write 8192/0x2000
>    71502 postgres CALL  close(0x3)
>    71502 postgres RET   close 0
>    71502 postgres CALL  unlink(0x7fffffffbc60)
>    71502 postgres NAMI  "pg_xlog/000000010000000000000001"
>    71502 postgres RET   unlink -1 errno 2 No such file or directory
>    71502 postgres CALL  link(0x7fffffffc130,0x7fffffffbc60)
>    71502 postgres NAMI  "pg_xlog/xlogtemp.71502"
>    71502 postgres NAMI  "pg_xlog/000000010000000000000001"
>    71502 postgres RET   link -1 errno 1 Operation not permitted
>    71502 postgres CALL  unlink(0x7fffffffc130)
>    71502 postgres NAMI  "pg_xlog/xlogtemp.71502"
>    71502 postgres RET   unlink 0
>    71502 postgres CALL  open(0x7fffffffc530,O_RDWR,<unused>0x180)
>    71502 postgres NAMI  "pg_xlog/000000010000000000000001"
>    71502 postgres RET   open -1 errno 2 No such file or directory
>=20
>   This corresponds to the execution of XLogFileInit(), and what's
>   evidently happening is that we successfully create and zero-fill
>   the first xlog segment file under a temporary name, but then
>   the attempt to rename it into place with link() fails with EPERM.
>=20
>   This is really a WTF kind of failure, I think.  The directory is
>   certainly writable --- it was made under our own UID, and what's
>   more we just managed to create the file there under its temp name.
>   So how can we get an EPERM failure from link()?
>=20
>   I think this is a kernel bug.
>=20
>                           regards, tom lane
>=20
>   PS: one odd thing here is that the ereport(LOG) in
>   InstallXLogFileSegment isn't doing anything; otherwise we'd have gotten
>   a much more helpful error report about "could not link file".  I don't
>   think we run the bootstrap mode with log_min_messages set high enough to
>   disable LOG messages, so why isn't it printing?  Nonetheless, this error
>   shouldn't have occurred.
>=20
>  ...
>=20
>  Where to from here? The freebsd-database@freebsd.org mailing list? The
>  postgresql port maintainer? Who should I be in touch with?
>=20
>  ...
>=20
>  You need to talk to some FreeBSD kernel hackers about why link()
>   might be failing here.  Since you see it on UFS too, we can probably
>   exonerate the ZFS filesystem-specific code.
>=20
>   I did some googling and found that EPERM can be issued if the filesystem
>   doesn't support hard links (which shouldn't apply to ZFS I trust).
>   Also, Linux has a "protected_hardlinks" option that causes certain
>   attempts at creating hard links to fail --- but our use-case here
>   doesn't fall foul of any of those restrictions AFAICS, and of course
>   FreeBSD isn't Linux.  Still, I wonder if you're running into some
>   misdesigned or misimplemented security restriction.  You might want
>   to look at your kernel parameters and see if any of them look like
>   they might have to do with restricting hard-link operations.
>=20
>   Also, since Amitabh failed to duplicate the failure on both earlier
>   and later FreeBSD kernels, and we've not heard reports of this from
>   anybody else either, it seems more than possible that it's a plain
>   old bug in the specific kernel version you're using.
>=20
>   As a short-term workaround, I'd suggest rebuilding with
>   HAVE_WORKING_LINK disabled.  (Just remove that #define from
>   src/include/pg_config_manual.h and rebuild.)
>=20
>                           regards, tom lane
>=20
>  ...
>=20
> Does this make any sense to anyone?

Show the ktrace from the same error on UFS.
Show the security.bsd sysctl settings, in particular, harlink_check_{u,g}id.
Show the ls -la output for the pg_xlog directory.

--wu6d9FdQ4ohoCGf7
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iQIcBAEBAgAGBQJQzEiTAAoJEJDCuSvBvK1BUucP/ieTa+3u54nQ6WOpQ696Kjym
jj9wdgonQtUDc1/AR45xKTmq+VaIMox2G7edxI51kO+XAhYQJXf9t3a15UX3FqN5
dzPpnaDbn8bwYg1ppKMeOMF8txi7ERdfzZIcE0ScNwIDJxOyH66TXHV8eEvBg1g/
HMWM8HtWnB9eSjPtIRB6NpMIo/GqIxeCL4QJMDIBoZ2GBuf1JdUdaHcjDkXu/2J3
a7DuvXLIDJ77FnIKzg98V92NFv7rMr/EJSPJGdDWIW/hKfonIgnApcRiYreyjBb4
GnsHRPqTIX/OnjS3pyvfwPypzlK5Dfn2GEvkuKpaPoOotz2OdeKDni5gbDNbBSzx
YhmIsrZOZqTAC7RGufSwFJzoFWuqI0SoFtpU+mNmB4uatea09IIyc4Kkmt2bsj0e
N+L5CrcmIuM1Oj3IJHZQ18GELHbe3LhAWtaeE5XgVivPbR9z4uzwgjO1dUnDR4nU
gdLC//5aUIqQjRDgnY+Gb4F8nusrEhACnwsAJtERWOudAPel3TBTsDcLSp1qyluf
8J/hOmEEhlFs76sZPIz13YV1QjtLDcrxDksifu5jjNFgwez5DJI2uU8qq9OlQAXg
vOZga1YDpcwmUvWUmVuFWc5lg983T3yqC7dQoPZQ8Nq02BQWmQ51rN20ChkEJosq
WpNMNS5IgwbXVuoatKco
=3Mjl
-----END PGP SIGNATURE-----

--wu6d9FdQ4ohoCGf7--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121215095324.GU71906>