From owner-freebsd-stable@FreeBSD.ORG Fri Jul 21 13:00:31 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 47F4116A4E2 for ; Fri, 21 Jul 2006 13:00:13 +0000 (UTC) (envelope-from feargal@fbi.ie) Received: from mail03.svc.cra.dublin.eircom.net (mail03.svc.cra.dublin.eircom.net [159.134.118.19]) by mx1.FreeBSD.org (Postfix) with SMTP id 0D3CF43D45 for ; Fri, 21 Jul 2006 13:00:11 +0000 (GMT) (envelope-from feargal@fbi.ie) Received: (qmail 53633 messnum 275578 invoked from network[82.141.233.46/unknown]); 21 Jul 2006 13:00:10 -0000 Received: from unknown (HELO alatar.edhellond.fbi.ie) (82.141.233.46) by mail03.svc.cra.dublin.eircom.net (qp 53633) with SMTP; 21 Jul 2006 13:00:10 -0000 Received: from mablung.edhellond.fbi.ie (mablung.edhellond.fbi.ie [192.168.0.14]) by alatar.edhellond.fbi.ie (8.13.1/8.13.1) with ESMTP id k6LD05DC081440 for ; Fri, 21 Jul 2006 13:00:10 GMT (envelope-from feargal@fbi.ie) Date: Fri, 21 Jul 2006 14:00:05 +0100 From: Feargal Reilly To: freebsd-stable@freebsd.org Message-ID: <20060721140005.5365e4b7@mablung.edhellond.fbi.ie> Organization: FBI X-Mailer: Sylpheed-Claws 2.1.1 (GTK+ 2.8.7; i386-portbld-freebsd5.4) Mime-Version: 1.0 Content-Type: multipart/signed; boundary=Sig_hglNb74cQOwLb8iTICNIkiC; protocol="application/pgp-signature"; micalg=PGP-SHA1 Subject: filesystem full error with inumber X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Jul 2006 13:00:31 -0000 --Sig_hglNb74cQOwLb8iTICNIkiC Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable The following error is being logged in /var/log/messages on FreeBSD 5.4: Jul 21 09:58:44 arwen kernel: pid 615 (postgres), uid 1001 inumber 6166128 on /data0: filesystem full However, this does not appear to be a case of being out of disk space, or running out of inodes: ttyp2$ df -hi Filesystem Size Used Avail Capacity iused ifree %iused Mounted on /dev/amrd0s1f 54G 44G 5.4G 89% 4104458 3257972 56% /data0 Nor does it appear to be a file limit: ttyp2$ sysctl kern.maxfiles kern.openfiles kern.maxfiles: 20000 kern.openfiles: 3582 These reading were not taken at exactly the same time as the error occured, but close to it. Here's the head of dumpfs: magic 19540119 (UFS2) time Fri Jul 21 09:38:40 2006 superblock location 65536 id [ 42446884 99703062 ] ncg 693 size 29360128 blocks 28434238 bsize 8192 shift 13 mask 0xffffe000 fsize 2048 shift 11 mask 0xfffff800 frag 4 shift 2 fsbtodb 2 minfree 8% optim time symlinklen 120 maxbsize 8192 maxbpg 1024 maxcontig 16 contigsumsize 16 nbfree 563891 ndir 495168 nifree 3245588 nffree 19898 bpg 10597 fpg 42388 ipg 10624 nindir 1024 inopb 32 maxfilesize 8804691443711 sbsize 2048 cgsize 8192 csaddr 1372 cssize 12288 sblkno 36 cblkno 40 iblkno 44 dblkno 1372 cgrotor 322 fmod 0 ronly 0 clean 0 avgfpdir 64 avgfilesize 16384 flags soft-updates=20 fsmnt /data0 volname swuid 0 Now the server's main function in life is running postgres. I first noticed this error during a maintainence run which sequentially dumps and vacuums each individual database. The are currently 117 databases, most of which are no more than 20M in size, but there are a few outliers, the largest of which is 792M in size. The chunk of this is stored in a single 500+M file, so I can't see this consuming all my inodes, even if soft-updates weren't cleaning up, perhaps I'm wrong. It has since been happening outside of those runs as well. I have searched through various forums and list archives, and while I have found a few references to this error, I have not been able to find a cause and subsequent solution posted. Looking through the source, the error is being logged by ffs_fserr in sys/ufs/ffs/ffs_alloc.c It is being called either by ffs_alloc or by ffs_realloccg after either of the following conditions: ffs_alloc { ... retry: if (size =3D=3D fs->fs_bsize && fs->fs_cstotal.cs_nbfree =3D=3D 0) goto nospace; freespace(fs, fs->fs_minfree) - numfrags(fs, size) < 0) goto nospace; ... nospace: if (fs->fs_pendingblocks > 0 && reclaimed =3D=3D 0) { reclaimed =3D 1; softdep_request_cleanup(fs, ITOV(ip)); goto retry; } ffs_fserr(fs, ip->i_number, "filesystem full"); } My uninformed and uneducated reading of this is that it does not think there are enough blocks free, yet that does not tally with what df is telling me. Looking again at dumpfs, it appears to say that this is formatted with a block size of 8K, and a fragment size of 2K, but tuning(7) says: FreeBSD performs best when using 8K or 16K file system block sizes. The default file system block size is 16K, which provides best performance for most applications, with the exception of those that perform random access on large files (such as database server software). Such applica- tions tend to perform better with a smaller block size, although modern disk characteristics are such that the performance gain from using a smaller block size may not be worth consideration. Using a block size larger than 16K can cause fragmentation of the buffer cache and lead to lower performance. The defaults may be unsuitable for a file system that requires a very large number of i-nodes or is intended to hold a large number of very small files. Such a file system should be created with an 8K or 4K block size. This also requires you to specify a smaller fragment size. We recommend always using a fragment size that is 1/8 the block size (less testing has been done on other fragment size factors). Reading this makes me think that when this server was installed, the block size was dropped from the 16K default to 8K for performance reasons, but the fragment size was not modified accordingly. Would this be the root of my problem? If so, is my only option to back everything up and newfs the disk, or is there something else I can do that will minimise my downtime? Any help and advice would be greatly appreciated. -Feargal. --=20 Feargal Reilly, Chief Techie, FBI. PGP Key: 0x105D7168 (expires: 2006-11-30) Web: http://www.fbi.ie/ | Tel: +353.14988588 | Fax: +353.14988489 Communications House, 11 Sallymount Avenue, Ranelagh, Dublin 6. --Sig_hglNb74cQOwLb8iTICNIkiC Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (FreeBSD) iD8DBQFEwM/VrrAVkRBdcWgRAvZaAJ4p/vatKGlHei+NMAp3kxHMixiGoACdHUNR oAC1HR5jhXUjJN2r0/phGys= =Qm1f -----END PGP SIGNATURE----- --Sig_hglNb74cQOwLb8iTICNIkiC--