Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 14 Dec 2012 22:43:00 -0600
From:      David Noel <david.i.noel@gmail.com>
To:        freebsd-hackers@freebsd.org
Subject:   postgres, initdb, FreeBSD bug?
Message-ID:  <CAHAXwYA6o_hBpAvm=H4bpXNOKr7ec10zH30G3KWQXg7JEj0mDQ@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
I've been fighting with a bug I can't quite seem to figure out and was
told that this might be the place to come. I'm running
postgresql-9.2.2 on FreeBSD 8.3-RELEASE-p5 and am having things break
down when I try to run initdb. I got in contact with the pgsql-general
mailing list and we debugged the issue to the point where it seemed
that this might be a FreeBSD-related error. Relevant excerpts from
several email are below that piece together the error:

I'm running into the following error message when running initdb (FreeBSD host):

   ygg# /usr/local/etc/rc.d/postgresql initdb -D /zdb/pgsql/data --debug
   The files belonging to this database system will be owned by user "pgsql".
   This user must also own the server process.

   The database cluster will be initialized with locales
     COLLATE:  C
     CTYPE:    en_US.UTF-8
     MESSAGES: en_US.UTF-8
     MONETARY: en_US.UTF-8
     NUMERIC:  en_US.UTF-8
     TIME:     en_US.UTF-8
   The default text search configuration will be set to "english".

   creating directory /zdb/pgsql/data ... ok
   creating subdirectories ... ok
   selecting default max_connections ... 100
   selecting default shared_buffers ... 32MB
   creating configuration files ... ok
   creating template1 database in /zdb/pgsql/data/base/1 ... FATAL:
   could not open file "pg_xlog/000000010000000000000001" (log file 0,
   segment 1): No such file or directory
   child process exited with exit code 1
   initdb: removing data directory "/zdb/pgsql/data"

 ...

 Interestingly, I have a second--virtually identical--server that I
  just tried initdb on. FreeBSD 8.3-RELEASE-p5, postgresql-server-9.2.2.
  Exact same "FATAL: could not open file pg_xlog" error. So it is
  reproducible.

 ...

 The relevant part of the ktrace output is

   71502 postgres CALL  unlink(0x7fffffffc130)
   71502 postgres NAMI  "pg_xlog/xlogtemp.71502"
   71502 postgres RET   unlink -1 errno 2 No such file or directory
   71502 postgres CALL
 open(0x7fffffffc130,O_RDWR|O_CREAT|O_EXCL,S_IRUSR|S_IWUSR)
   71502 postgres NAMI  "pg_xlog/xlogtemp.71502"
   71502 postgres RET   open 3
   71502 postgres CALL  write(0x3,0x801a56030,0x2000)
   71502 postgres GIO   fd 3 wrote 4096 bytes
   .... a lot of uninteresting write() calls snipped ...
   71502 postgres RET   write 8192/0x2000
   71502 postgres CALL  close(0x3)
   71502 postgres RET   close 0
   71502 postgres CALL  unlink(0x7fffffffbc60)
   71502 postgres NAMI  "pg_xlog/000000010000000000000001"
   71502 postgres RET   unlink -1 errno 2 No such file or directory
   71502 postgres CALL  link(0x7fffffffc130,0x7fffffffbc60)
   71502 postgres NAMI  "pg_xlog/xlogtemp.71502"
   71502 postgres NAMI  "pg_xlog/000000010000000000000001"
   71502 postgres RET   link -1 errno 1 Operation not permitted
   71502 postgres CALL  unlink(0x7fffffffc130)
   71502 postgres NAMI  "pg_xlog/xlogtemp.71502"
   71502 postgres RET   unlink 0
   71502 postgres CALL  open(0x7fffffffc530,O_RDWR,<unused>0x180)
   71502 postgres NAMI  "pg_xlog/000000010000000000000001"
   71502 postgres RET   open -1 errno 2 No such file or directory

  This corresponds to the execution of XLogFileInit(), and what's
  evidently happening is that we successfully create and zero-fill
  the first xlog segment file under a temporary name, but then
  the attempt to rename it into place with link() fails with EPERM.

  This is really a WTF kind of failure, I think.  The directory is
  certainly writable --- it was made under our own UID, and what's
  more we just managed to create the file there under its temp name.
  So how can we get an EPERM failure from link()?

  I think this is a kernel bug.

                          regards, tom lane

  PS: one odd thing here is that the ereport(LOG) in
  InstallXLogFileSegment isn't doing anything; otherwise we'd have gotten
  a much more helpful error report about "could not link file".  I don't
  think we run the bootstrap mode with log_min_messages set high enough to
  disable LOG messages, so why isn't it printing?  Nonetheless, this error
  shouldn't have occurred.

 ...

 Where to from here? The freebsd-database@freebsd.org mailing list? The
 postgresql port maintainer? Who should I be in touch with?

 ...

 You need to talk to some FreeBSD kernel hackers about why link()
  might be failing here.  Since you see it on UFS too, we can probably
  exonerate the ZFS filesystem-specific code.

  I did some googling and found that EPERM can be issued if the filesystem
  doesn't support hard links (which shouldn't apply to ZFS I trust).
  Also, Linux has a "protected_hardlinks" option that causes certain
  attempts at creating hard links to fail --- but our use-case here
  doesn't fall foul of any of those restrictions AFAICS, and of course
  FreeBSD isn't Linux.  Still, I wonder if you're running into some
  misdesigned or misimplemented security restriction.  You might want
  to look at your kernel parameters and see if any of them look like
  they might have to do with restricting hard-link operations.

  Also, since Amitabh failed to duplicate the failure on both earlier
  and later FreeBSD kernels, and we've not heard reports of this from
  anybody else either, it seems more than possible that it's a plain
  old bug in the specific kernel version you're using.

  As a short-term workaround, I'd suggest rebuilding with
  HAVE_WORKING_LINK disabled.  (Just remove that #define from
  src/include/pg_config_manual.h and rebuild.)

                          regards, tom lane

 ...

Does this make any sense to anyone?

-David



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAHAXwYA6o_hBpAvm=H4bpXNOKr7ec10zH30G3KWQXg7JEj0mDQ>