From owner-freebsd-hackers@FreeBSD.ORG Sat Dec 15 04:27:14 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4A22E9A8 for ; Sat, 15 Dec 2012 04:27:14 +0000 (UTC) (envelope-from david.i.noel@gmail.com) Received: from mail-ob0-f182.google.com (mail-ob0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 0B2388FC19 for ; Sat, 15 Dec 2012 04:27:13 +0000 (UTC) Received: by mail-ob0-f182.google.com with SMTP id 16so4106294obc.13 for ; Fri, 14 Dec 2012 20:27:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:date:message-id:subject:from:to:content-type; bh=ZThbgbeejs+J6X09dd8BuSUbZucvlNqgaA7isIZfU2k=; b=mQMvpm17+mFAmco/S4R1doDRqFpCWpYK6rEDZaytOOd30GySq7RfkPd0on5pIh6dcd 8oO2LNKUUM6EfEY2mkesWJlYozbAHGyuYV06eMgo2GdUyTYkCNduxrQ/8EZZUJfNDtxH nRF7nxGufPMfmS9EbtSRJiG+kG5J1MpykFgZEtIn6NrQh5fSV7sUIedlpI/ncPrtsqdq 6qp7wpbczFOqvdc56b01Gn1swcHiyXJwSJxACREkZkMKOgXtAB6hhpnl6tXnTdxiOlkJ s7s4wGQDyVQ1DcxLK5VxRNxAS/+zhNGLCHrFfWcVq+1PqujOPDMXbcQj5VLLdZo/pciZ Gj8g== MIME-Version: 1.0 Received: by 10.60.32.50 with SMTP id f18mr6609321oei.8.1355545633148; Fri, 14 Dec 2012 20:27:13 -0800 (PST) Received: by 10.76.172.98 with HTTP; Fri, 14 Dec 2012 20:27:13 -0800 (PST) Date: Fri, 14 Dec 2012 22:27:13 -0600 Message-ID: Subject: postgres, initdb, FreeBSD bug? From: David Noel To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Mailman-Approved-At: Sat, 15 Dec 2012 05:08:30 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: David.I.Noel@gmail.com List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Dec 2012 04:27:14 -0000 I've been fighting with a bug I can't quite seem to figure out and was told that this might be the place to come. I'm running postgresql-9.2.2 on FreeBSD 8.3-RELEASE-p5 and am having things break down when I try to run initdb. I got in contact with the pgsql-general mailing list and we debugged the issue to the point where it seemed that this might be a FreeBSD-related error. Relevant excerpts from several email are below that piece together the error: I'm running into the following error message when running initdb (FreeBSD host): ygg# /usr/local/etc/rc.d/postgresql initdb -D /zdb/pgsql/data --debug The files belonging to this database system will be owned by user "pgsql". This user must also own the server process. The database cluster will be initialized with locales COLLATE: C CTYPE: en_US.UTF-8 MESSAGES: en_US.UTF-8 MONETARY: en_US.UTF-8 NUMERIC: en_US.UTF-8 TIME: en_US.UTF-8 The default text search configuration will be set to "english". creating directory /zdb/pgsql/data ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 32MB creating configuration files ... ok creating template1 database in /zdb/pgsql/data/base/1 ... FATAL: could not open file "pg_xlog/000000010000000000000001" (log file 0, segment 1): No such file or directory child process exited with exit code 1 initdb: removing data directory "/zdb/pgsql/data" ... Interestingly, I have a second--virtually identical--server that I just tried initdb on. FreeBSD 8.3-RELEASE-p5, postgresql-server-9.2.2. Exact same "FATAL: could not open file pg_xlog" error. So it is reproducible. ... The relevant part of the ktrace output is 71502 postgres CALL unlink(0x7fffffffc130) 71502 postgres NAMI "pg_xlog/xlogtemp.71502" 71502 postgres RET unlink -1 errno 2 No such file or directory 71502 postgres CALL open(0x7fffffffc130,O_RDWR|O_CREAT|O_EXCL,S_IRUSR|S_IWUSR) 71502 postgres NAMI "pg_xlog/xlogtemp.71502" 71502 postgres RET open 3 71502 postgres CALL write(0x3,0x801a56030,0x2000) 71502 postgres GIO fd 3 wrote 4096 bytes .... a lot of uninteresting write() calls snipped ... 71502 postgres RET write 8192/0x2000 71502 postgres CALL close(0x3) 71502 postgres RET close 0 71502 postgres CALL unlink(0x7fffffffbc60) 71502 postgres NAMI "pg_xlog/000000010000000000000001" 71502 postgres RET unlink -1 errno 2 No such file or directory 71502 postgres CALL link(0x7fffffffc130,0x7fffffffbc60) 71502 postgres NAMI "pg_xlog/xlogtemp.71502" 71502 postgres NAMI "pg_xlog/000000010000000000000001" 71502 postgres RET link -1 errno 1 Operation not permitted 71502 postgres CALL unlink(0x7fffffffc130) 71502 postgres NAMI "pg_xlog/xlogtemp.71502" 71502 postgres RET unlink 0 71502 postgres CALL open(0x7fffffffc530,O_RDWR,0x180) 71502 postgres NAMI "pg_xlog/000000010000000000000001" 71502 postgres RET open -1 errno 2 No such file or directory This corresponds to the execution of XLogFileInit(), and what's evidently happening is that we successfully create and zero-fill the first xlog segment file under a temporary name, but then the attempt to rename it into place with link() fails with EPERM. This is really a WTF kind of failure, I think. The directory is certainly writable --- it was made under our own UID, and what's more we just managed to create the file there under its temp name. So how can we get an EPERM failure from link()? I think this is a kernel bug. regards, tom lane PS: one odd thing here is that the ereport(LOG) in InstallXLogFileSegment isn't doing anything; otherwise we'd have gotten a much more helpful error report about "could not link file". I don't think we run the bootstrap mode with log_min_messages set high enough to disable LOG messages, so why isn't it printing? Nonetheless, this error shouldn't have occurred. ... Where to from here? The freebsd-database@freebsd.org mailing list? The postgresql port maintainer? Who should I be in touch with? ... You need to talk to some FreeBSD kernel hackers about why link() might be failing here. Since you see it on UFS too, we can probably exonerate the ZFS filesystem-specific code. I did some googling and found that EPERM can be issued if the filesystem doesn't support hard links (which shouldn't apply to ZFS I trust). Also, Linux has a "protected_hardlinks" option that causes certain attempts at creating hard links to fail --- but our use-case here doesn't fall foul of any of those restrictions AFAICS, and of course FreeBSD isn't Linux. Still, I wonder if you're running into some misdesigned or misimplemented security restriction. You might want to look at your kernel parameters and see if any of them look like they might have to do with restricting hard-link operations. Also, since Amitabh failed to duplicate the failure on both earlier and later FreeBSD kernels, and we've not heard reports of this from anybody else either, it seems more than possible that it's a plain old bug in the specific kernel version you're using. As a short-term workaround, I'd suggest rebuilding with HAVE_WORKING_LINK disabled. (Just remove that #define from src/include/pg_config_manual.h and rebuild.) regards, tom lane ... Does this make any sense to anyone? -David